DataGaffer.com

1) High-Level Pipeline
  1. Fetch fixtures & oddsfetch_fixtures.py builds fixtures.json and h2h_and_odds.json.
  2. Assemble team stats → season/home/away splits in team_stats/*.json.
  3. Compute features → league strength, optional team boosters, H2H venue means.
  4. Simulatematch_simulator.py runs Poisson draws (20k) to estimate xG, corners, shots & probabilities.
  5. Publish → UI pages read JSON and render tables/cards with fixed columns and gradients.
2) Data Inputs

Fixtures & Odds

  • Daily fixtures (by EST) and Bet365 1X2 odds.
  • Stored in fixtures.json with team IDs, names, and logos (prefer assets/logos/{id}.png|.svg).

Team Stats

  • Per-league files (EPL, La Liga, Bundesliga, Serie A, Ligue 1, Eredivisie, UCL, UEL).
  • Home/away splits for goals for/against, corners, shots. Overall-only teams fall back to season means.

Head-to-Head (Venue-Specific)

  • Last N meetings (currently 10), filtered so the current home was actually home.
  • Saved as h2h_and_odds.json with home_{homeId}_{awayId} keys.
3) Features & Adjustments

League Strength

Each team inherits a coefficient from league_coefficients.json. We apply a ratio home_coef / away_coef to nudge expected values.

Optional Team Boosters

Per-team multipliers (e.g., form/injuries) via team_boosters.json.

Venue-Specific H2H Blend

Weighted average with cap. Example: w = min(0.40, 0.10 * num_matches). Then: exp_home = exp_home*(1-w) + h_avg*w, exp_away = exp_away*(1-w) + a_avg*w.

4) Simulation Engine (Poisson)
  • Expected values seeded per category:
    • Goals: averages from home/away GF/GA plus venue/H2H and league adjustments.
    • Corners / Shots: analogous construction with small home uplift.
  • 20,000 draws per fixture ⇒ means & probabilities:
    • Home/away mean goals (xG), corners, shots
    • Home/Draw/Away win %, Over 2.5 %, BTTS %, O1.5 per-team %

Reproducibility: Poisson RNG is seeded by fixture_id.

5) Site Outputs

Match Cards

  • Shows simulated xG/corners/shots, outcome %, top value markets, SOA% leaders, and H2H (venue-specific).

Outlooks

  • Fixture tables (BTTS, Over 2.5, etc.) — fixed column widths, white column separators.

Rankings

  • Top N lists (BTTS, Over 2.5, Team Totals, Win%) with two-column layout and logo fallbacks.

Goal Zone

  • Three synchronized tables (Match Total xG, Away xG, Home xG) with independent green→white→red gradients.
6) Head-to-Head Logic
  • Only matches where the current home team played at home are used.
  • We store wins/draws/losses and mean goals for the venue sample.
  • Blend weight grows with sample size and caps at the configured maximum.
7) Rankings Builder

build.py aggregates chosen leagues and derives per-team rates:

  • BTTS rate (proxy from GF/GA), Over 2.5 proxy, Team total proxy, Win share.
  • Sort high→low, assign rank, keep top N (20 or 30), format values as percents.
  • Logos resolved via assets/logos/{id}.png with SVG fallback on onerror.
8) Goal Zone (per-fixture xG)
  • Reads fixtures.json → sim_stats.xg.
  • Three tables:
    • Match: fixture label + total xG (sorted high→low).
    • Away and Home: logo, team name, team xG.
  • Each table uses its own min/max to color cells green→white→red.
9) Value / EV Plays
  • Convert sim % → decimal odds via 100 / pct.
  • Compare to bookmaker odds; show top positive gaps and highlight best/worst with badges.
10) Assumptions & Limitations
  • Poisson assumes independent goal processes; real matches have state/tempo effects.
  • Injuries, rotations, travel, weather, and tactical shifts are only reflected if encoded via boosters or data.
  • H2H is venue-specific and capped to avoid overfitting small samples.
  • Odds snapshots are from the fetch time; lines move.
11) FAQ

How often is data refreshed?

Fixtures and odds daily (EST). Simulations re-run at fetch. Rankings are rebuilt when team stats update.

Why doesn’t a team logo show?

We use assets/logos/{id}.png with an onerror switch to .svg. Add either format to fix.

Can I change the H2H weight?

Yes — edit match_simulator.py and adjust the max_weight and per-match slope.

© DataGaffer. For feedback or methods questions, ping us via the dashboard.