Skip to main content
AI-generated World Cup predictions showing model accuracy gaps, highlighting missed draws and team strength insights in a dat

Editorial illustration for ML models predict World Cup outcomes, but miss draws, capture team strength

ML models predict World Cup outcomes, but miss draws,...

ML models predict World Cup outcomes, but miss draws, capture team strength

2 min read

FIFA rolls out the first match of the 2026 World Cup on Thursday, June 11, at Mexico City’s new stadium, and a data‑driven fan decided to test how far machine learning can go. He gathered a massive archive—about 49,000 games spanning from a Baltic Cup in 1872 to the upcoming tournament—stacking Elo ratings, results and venue details into a single table. From there, he runs a probabilistic forecast, pitting three approaches against each other: a plain multinomial regression, a multinomial ridge/elastic‑net variant and a LightGBM gradient‑boosted model.

The aim isn’t flashiness; it’s calibration. By balancing raw performance, how well the odds line up with reality and computational heft, he hones a model that predicts home‑team victories 86 % of the time. That figure sits against soccer’s notorious low‑scoring nature—most matches finish under five goals, making each goal count. While the sport’s modest tally can feel “sleep‑inducing” to some, it also means a well‑tuned algorithm might capture the subtle strength differences teams carry into every kick‑off.

Many matches that actually ended in draws were assigned a confident home-win prediction, suggesting that the models capture team-strength direction better than match-level uncertainty or draw likelihood. To address this 'blindness' to the draw option, we can engineer features such as abs_rating_diff, home_draw_rate_last_5, form_draw_rate_mean_last_5, and binary context features like neutral, flag_is_world_cup, and flag_is_friendly, indicating whether the match is on neutral ground or at the World Cup. With these features, our model can now better discriminate between Home/Away wins and draws, as evidenced by a 3.3% increase in true-positive draw predictions.

Why this matters

We’ve seen that the models, trained on nearly 50 000 matches spanning more than a century, reliably rank teams by strength, yet they consistently overlook the draw outcome. That blind spot tells us our probabilistic pipelines still prioritize dominant signals—Elo differentials, home advantage—over the subtler variance that produces stalemates. For developers, the takeaway is clear: feature engineering matters; adding something like an absolute rating‑difference may nudge the classifier toward a more balanced probability distribution.

Founders should note that a model that “gets the winner right” can still mislead stakeholders if it inflates confidence in a single result, especially in betting or fan‑engagement products. Researchers are left with an open question—will richer contextual inputs or alternative loss functions capture draw likelihood without sacrificing overall accuracy? The evidence suggests we can improve, but it remains uncertain whether a modest tweak will close the gap or if a deeper redesign of the predictive framework is required.

As we iterate, we must keep testing against real‑world match outcomes, not just aggregate win rates.

Further Reading