Sports events are random, even the best players or teams can experience bad luck and bad
times. This randomness turns betting into estimating probabilities. The prediction of the event issue is a result of this estimation as you will predict the most probable outcome.
Everyone betting against bookmakers thinks that if you get better predictions you will make millions in the long run. Well, it is actually wrong and you will find why in this article and show a concrete example.
To have a better understanding of what follows you need to have basics knowledge of what are odds, fair odds, why are they moving, and how they can be turned into probabilities and predictions. If it is not the case check our article on the topic. If you want to learn more about model predictions check our website.
Making predictions is finding probabilities
It seems obvious but building prediction models, calculating implied probabilities from odds, or getting a percentage of votes in each outcome have all the same goal: estimating the chance of each possible issue of a random event. This is what predicting is about, assessing probabilities and deduce on which outcome you want to bet.
We just mentioned three ways of estimating probabilities so how can we compare them? Which one is the best? Let’s see why we would prefer one or the other.
The prediction model
Using statistics/machine learning to learn from the data. The main advantage is you can predict anything if you have enough data. The main drawback is you need a strong knowledge of computer science, deal with heterogeneous data, and know to code.
The bookmakers implied probabilities
It is basically turning odds into probabilities. The main advantage is accuracy when the market is liquid, meaning that when enough money is engaged you obtain a very good estimate of probabilities. There are drawbacks too. First, if there is no market there is no prediction possible. Second, accuracy depends on the bookmakers and how you remove the margins. And third, you can’t use these to beat the bookies.
The percentage of votes
Collected from websites, forums, or social media this is the simplest way to get probabilities. It is also very accurate if there are enough votes but the drawbacks are the lack of markets and it is hard to collect the data from many sources.
In order to compare the different techniques, we need a measure of the probability quality we are estimating. A good candidate is the log-loss. The loss measures the quality of estimated probabilities for an event. If you average the log-losses among several events you get an estimation of technique quality. Mathematically the average log loss is
If we take an example for a single event like a soccer match-winner with the following probabilities: home wins (60%), draw (30%), away wins (10%). If home wins, the log-loss is -0.51 but if away wins it is -2.30. As you can see if away wins our prediction is strongly wrong so the log-loss is very negative.
This measure is always negative. The closer to zeros the average log-loss is, the better the predictions are. This measure (or derived) is also used to train machine learning models. Now to compare models you just need to rank the average log-losses and pick the best one. In order to have an accurate answer, we will need a lot of events (large n).
Now, why can’t we use those predictions to make money? the answer is simple: because that’s not the goal. Assuming all methods give accurate probabilities then by the crowd-effect the odds will reflect this accuracy and converge to their fair value. Since bookmakers are taking margin, as you keep betting following these probabilities, you will exactly lose the margin. Even if the probabilities are good you won’t make money.
Having a good prediction model does not mean that you can beat the bookmaker, it just means you can predict accurate probabilities.
It is especially true as implied probabilities for bookmaker's odds are very accurate and most of the time in line with the two other methods and we know we can not use them to beat the bookmakers.
Beating the bookmakers is finding values
It is not a secret that value betting is the only way to win long-term against bookmakers. A value bet is simply odd that is overpriced by the bookmakers. Turn differently it is also the implied probability that is much lower than the true probability that the event happens. The true probability is unknown and has to be estimated using methods seen in the previous part for instance.
The problem is these techniques are not designed to find value. The machine learning models are trained to have the most accurate probabilities, implied odds are useless for value betting and votes are just the result of a survey.
In order to find value, we would either train a model using a detection algorithm or making users voting on odds they think are values. Doing that, you will probably end up selecting large odds and have a low strike rate.
A real-life example
A very good example of that is the startup Mercurius¹. They are using algorithms to find value bets on soccer. According to their website (as of June 21) they bet on an average odd of 3.97 and their strike rate is 34.35% while they are able to yield 2.46%. These figures show that:
It is hard but possible to beat bookmakers
A low accuracy does not mean that you lose money
You need precise and valid historical odds
Value bet detection is related to the uncertainty of the event
It is also interesting to remark that the average implied probability is 33.07% (assuming 0 margins) so it is effectively a value betting strategy focused on rare events. The strike rate being 34.35% an estimate of the yield is roughly 34.35%*(1/33.07%)-1 = 2.58%. The next figure shows the distribution of implied probabilities on values detected by their algorithm. As you can see most of the detection occurs on probabilities lower than 50%.
Basically, you need a strike rate (sr) larger than your average implied probability (aip) to have a positive yield.
To beat the bookmakers you need to focus on detecting value on events where the uncertainty is high.
These events with low accuracy have a high uncertainty level which can be related to the event itself or the number of bets taken making the odd higher than it should, creating an exploitable pricing mismatch.
How to use probabilities then?
We can use probabilities be we need to build the value detector. For instance, you can use these probabilities as an input to another model that is specifically trained to generate long-term profit.
Using historical odds you can simulate past strategies. For instance, we can test if there is more value in the opening odds or if the probabilities are better than those of the bookmakers on large odds. The most important part is to make the link between the probabilities that are designed to correctly predict and finding events where the bookmakers can be beaten in the long run.
In this article, we show that prediction and beating the bookmakers are two different ways of using probabilities. The goal of prediction is to generate accurate probabilities while beating the bookmaker is to generate long-term profit.
While they both connected, the probabilities need to be processed in a supplementary step with odds data in order to detect when bookmakers' prices are slightly wrong.
 We are not affiliated, associated by, or in any way connected with Mercurius.