DataGaffer – Model & Methods

1) High-Level Pipeline

Fetch fixtures & odds → fetch_fixtures.py builds fixtures.json and h2h_and_odds.json.
Assemble team stats → season/home/away splits in team_stats/*.json.
Compute features → league strength, optional team boosters, H2H venue means.
Simulate → match_simulator.py runs Poisson draws (20k) to estimate xG, corners, shots & probabilities.
Publish → UI pages read JSON and render tables/cards with fixed columns and gradients.

2) Data Inputs

Fixtures & Odds

Daily fixtures (by EST) and Bet365 1X2 odds.
Stored in fixtures.json with team IDs, names, and logos (prefer assets/logos/{id}.png|.svg).

Team Stats

Per-league files (EPL, La Liga, Bundesliga, Serie A, Ligue 1, Eredivisie, UCL, UEL).
Home/away splits for goals for/against, corners, shots. Overall-only teams fall back to season means.

Head-to-Head (Venue-Specific)

Last N meetings (currently 10), filtered so the current home was actually home.
Saved as h2h_and_odds.json with home_{homeId}_{awayId} keys.

3) Features & Adjustments

League Strength

Each team inherits a coefficient from league_coefficients.json. We apply a ratio home_coef / away_coef to nudge expected values.

Optional Team Boosters

Per-team multipliers (e.g., form/injuries) via team_boosters.json.

Venue-Specific H2H Blend

Weighted average with cap. Example: w = min(0.40, 0.10 * num_matches). Then: exp_home = exp_home*(1-w) + h_avg*w, exp_away = exp_away*(1-w) + a_avg*w.

4) Simulation Engine (Poisson)

Expected values seeded per category:
- Goals: averages from home/away GF/GA plus venue/H2H and league adjustments.
- Corners / Shots: analogous construction with small home uplift.
20,000 draws per fixture ⇒ means & probabilities:
- Home/away mean goals (xG), corners, shots
- Home/Draw/Away win %, Over 2.5 %, BTTS %, O1.5 per-team %

Reproducibility: Poisson RNG is seeded by fixture_id.

5) Site Outputs

Match Cards

Shows simulated xG/corners/shots, outcome %, top value markets, SOA% leaders, and H2H (venue-specific).

Outlooks

Fixture tables (BTTS, Over 2.5, etc.) — fixed column widths, white column separators.

Rankings

Top N lists (BTTS, Over 2.5, Team Totals, Win%) with two-column layout and logo fallbacks.

Goal Zone

Three synchronized tables (Match Total xG, Away xG, Home xG) with independent green→white→red gradients.

6) Head-to-Head Logic

Only matches where the current home team played at home are used.
We store wins/draws/losses and mean goals for the venue sample.
Blend weight grows with sample size and caps at the configured maximum.

7) Rankings Builder

build.py aggregates chosen leagues and derives per-team rates:

BTTS rate (proxy from GF/GA), Over 2.5 proxy, Team total proxy, Win share.
Sort high→low, assign rank, keep top N (20 or 30), format values as percents.
Logos resolved via assets/logos/{id}.png with SVG fallback on onerror.

8) Goal Zone (per-fixture xG)

Reads fixtures.json → sim_stats.xg.
Three tables:
- Match: fixture label + total xG (sorted high→low).
- Away and Home: logo, team name, team xG.
Each table uses its own min/max to color cells green→white→red.

9) Value / EV Plays

Convert sim % → decimal odds via 100 / pct.
Compare to bookmaker odds; show top positive gaps and highlight best/worst with badges.

10) Assumptions & Limitations

Poisson assumes independent goal processes; real matches have state/tempo effects.
Injuries, rotations, travel, weather, and tactical shifts are only reflected if encoded via boosters or data.
H2H is venue-specific and capped to avoid overfitting small samples.
Odds snapshots are from the fetch time; lines move.

11) FAQ

How often is data refreshed?

Fixtures and odds daily (EST). Simulations re-run at fetch. Rankings are rebuilt when team stats update.

Why doesn’t a team logo show?

We use assets/logos/{id}.png with an onerror switch to .svg. Add either format to fix.

Can I change the H2H weight?

Yes — edit match_simulator.py and adjust the max_weight and per-match slope.

DataGaffer.com