Probabilistic Thinking
Probabilistic Thinking
Uncertainty
hbr.org/2020/02/develop-a-probabilistic-approach-to-managing-uncertainty
Mike Walsh
When faced with uncertainty, how should leaders react? Should they make a big bet,
hedge their position, or just wait and see? We naturally tend to see situations in one of
two ways: either events are certain and can therefore...
Our new world of sensors, smartphones, and connected devices means more data than
ever — but does it also mean that it’s getting easier to make well-informed decisions?
Quite the contrary, in fact. What’s more important than how much data you have is how it
frames the way you think. Too often, leaders under pressure to appear decisive attempt to
deal with complex issues with simple rules or analogies, selectively using data to justify
poor judgment calls. But what if rather than trying to be right, you could be less wrong
over time?
When faced with uncertainty, how should leaders react? Should they make a big bet,
hedge their position, or just wait and see? Investors and traders might be adept at
managing risk and unforeseen events, but in other industries, leaders can be blindsided
by the unknown. We naturally tend to see situations in one of two ways: either events are
certain and can therefore be managed by planning, processes, and reliable budgets; or
they are uncertain, and we cannot manage them well at all. Fortunately, there is another
approach.
1/4
Consider Thomas Bayes, an English statistician and clergyman, who proposed a theorem
in 1763 that would forever change the way we think about making decisions in ambiguous
conditions. Bayes was interested in how our beliefs about the world should evolve as we
accumulate new but unproven evidence. Specifically, he wondered how he could predict
the probability of a future event if he only knew how many times it had occurred, or not,
in the past. To answer that, he constructed a thought experiment.
Insight Center
The Data-Driven Mindset
Harnessing the power of your company’s data.
Imagine a billiard table. You put on a blindfold and your assistant randomly rolls a ball
across the table. They take note of where it stops rolling. Your job is to figure out where
the ball is. All you can really do at this point is make a random guess. Now imagine that
you ask your assistant to drop some more balls on the table and tell you whether they stop
to the left or right of the first ball. If all the balls stop to the right, what can you say about
the position of the first ball? If more balls are thrown, how does this improve your
knowledge of the position of the first ball? In fact, throw after throw, you should be able to
narrow down the area in which the first ball probably lies. Bayes figured out that even
when it comes to uncertain outcomes, we can update our knowledge by incorporating
new, relevant information as it becomes available.
You can find evidence of Bayesian thinking throughout modern history, from nineteenth-
century French and Russian artillery officers adjusting their cannons to account for
uncertainties about the enemies’ location, air density, wind direction, and more, to Alan
Turing cracking the German Enigma codes during the World War II. Bayes has even
influenced the design of AI and machine learning techniques, notably with naive Bayes
classifiers, which are a family of algorithms used to predict the category a data object
belongs in. They’re used in a wide range of applications from social media sentiment
analysis to spam filtering or movie recommendation systems.
For modern leaders, Bayesian thinking has also become increasingly influential. For
example, at Amazon, one of the 14 Leadership principles is “Have Backbone; Disagree and
Commit” — which, as explained by Jeff Bezos —is a strategy to encourage leaders to avoid
wasting time trying to secure universal agreement. Better to commit to a controversial
decision, and then gather data and adjust if necessary. At X, Alphabet’s moonshot factory,
they consciously celebrate failed projects as a data point that helps them narrow the range
of options, and in doing so, accelerate innovation. Similarly, at Spotify, they have
developed a framework for exploring the relationship between data and uncertainty that
they call DIBB (Data, Insights, Beliefs and Bets). They use it to explicitly identify success
metrics for new ideas and opportunities, and create a common language around judging
performance.
Data can be imperfect, incomplete, or uncertain. There is often more than one
explanation for why things happened the way they did; and by examining those
alternative explanations using probability, you can gain a better understanding of
2/4
causality and what is really going on.
However, thinking probabilistically takes some getting used to, as the human mind is
naturally deterministic. We generally believe that something is true or false. Either you
like someone or you don’t. There is rarely, for example, a situation when you can say that
there is a 46% probability that someone is your friend (unless you are a teenager with lots
of frenemies). Our instinct for determinism may well have been an evolutionary
innovation. To survive, we had to make snap judgments about the world and our response
to it. When a tiger is approaching you, there is really not a lot of time to consider whether
he’s approaching as a friend or a foe.
However, the deterministic approach that kept our ancestors alive while hunting in the
savannah won’t help you make good decisions in complex, unpredictable environments
when your natural mental shortcuts and heuristics start to fail you. One of the best ways
to embrace uncertainty and be more probabilistic in your approach is to learn to think like
a professional gambler. Take, for example, Rasmus Ankersen.
Ankersen, a Dane living in London, originally came to the UK to look for an English
publisher for his book on human performance, the writing of which had taken him from
Kenya to Korea in search of why great athletes, whether they are runners or golfers, tend
to come from the same small regions. One of the reasons he decided to stay in London
was a chance meeting with a professional gambler named Matthew Benham who founded
two gaming companies, Matchbook, a sports betting exchange community, and
Smartodds, which provides statistical research and sports modeling services.
When Ankersen and Benham met, they started talking about how soccer was a sport that
was yet to be disrupted by data and probabilistic thinking. Benham was impressed enough
to invite Ankersen to help run Brentford Football Club, which he had recently acquired.
Soon after, Benham also bought FC Midtjylland, the soccer club in Ankersen’s hometown.
Ankersen’s insight was this: Soccer is one of the world’s most unfair sports. Although
there is a saying that “the league table never lies,” in Ankersen’s opinion that is exactly
what it does. Because soccer is a low-scoring sport, the win/loss outcome of a game is not
an accurate representation of the actual performance of a team, and therefore the intrinsic
value of its players. From a professional gambler’s perspective, the key to placing a good
bet is to continually update your position with relevant insights that impact the
probability of an event occurring. Rather than trying to be right, gamblers strive to be less
wrong with time.
Benham and Ankersen started to use the scientific application of statistics — the
“moneyball” technique pioneered in baseball — when assessing the performance of a
team. Their key performance metric became “expected goals” for and against a team,
based on the quality and quantity of chances created during a match. The point of this
exercise was to develop an alternative league table, which might serve as a more reliable
predictor of results and a better basis on which to value and acquire players.
3/4
Benham and Ankersen’s approach has lessons for all kinds of leaders as they seek to
incorporate more data into their decision-making. A probabilistic HR manager, for
example, might examine the data about where a company’s best people come from and
how they perform throughout their career to identify new sources of talent that may have
been overlooked. A probabilistic sales professional might be conscious that it’s not enough
to simply close lots of deals; it’s critical to also think about where leads come from. Rather
than relying on inflexible credit policies, a probabilistic risk manager might start to look
deeper into their data to see if there are low-risk segments in their customer base that
they may have missed.
Developing a probabilistic mindset allows you to be better prepared for the uncertainties
and complexities of the Algorithmic Age. Even when events are determined by an
infinitely complex set of factors, probabilistic thinking can help us identify the most likely
outcomes and the best decisions to make.
Mike Walsh is the author of The Algorithmic Leader: How to Be Smart When
Machines Are Smarter Than You. Walsh is the CEO of Tomorrow, a global
consultancy on designing companies for the 21st century.
4/4
The Value of Probabilistic Thinking: Spies, Crime, and
Lightning Strikes
fs.blog/2018/05/probabilistic-thinking
Probabilistic thinking is essentially trying to estimate, using some tools of math and logic,
the likelihood of any specific outcome coming to pass. It is one of the best tools we have to
improve the accuracy of our decisions. In a world where each moment is determined by
an infinitely complex set of factors, probabilistic thinking helps us identify the most likely
outcomes. When we know these our decisions can be more precise and effective.
The Great Mental Models Volumes One and Two are out.
Learn more about the project here.
Our lack of perfect information about the world gives rise to all of probability theory, and
its usefulness. We know now that the future is inherently unpredictable because not all
variables can be known and even the smallest error imaginable in our data very quickly
throws off our predictions. The best we can do is estimate the future by generating
realistic, useful probabilities. So how do we do that?
Probability is everywhere, down to the very bones of the world. The probabilistic
machinery in our minds—the cut-to-the-quick heuristics made so famous by the
psychologists Daniel Kahneman and Amos Tversky—was evolved by the human species in
a time before computers, factories, traffic, middle managers, and the stock market. It
served us in a time when human life was about survival, and still serves us well in that
capacity.
But what about today—a time when, for most of us, survival is not so much the issue? We
want to thrive. We want to compete, and win. Mostly, we want to make good decisions in
complex social systems that were not part of the world in which our brains evolved their
(quite rational) heuristics.
For this, we need to consciously add in a needed layer of probability awareness. What is it
and how can I use it to my advantage?
1/7
There are three important aspects of probability that we need to explain so you can
integrate them into your thinking to get into the ballpark and improve your chances of
catching the ball:
1. Bayesian thinking,
2. Fat-tailed curves
3. Asymmetries
Thomas Bayes and Bayesian thinking: Bayes was an English minister in the first
half of the 18th century, whose most famous work, “An Essay Toward Solving a Problem
in the Doctrine of Chances” was brought to the attention of the Royal Society by his friend
Richard Price in 1763—two years after his death. The essay, the key to what we now know
as Bayes’s Theorem, concerned how we should adjust probabilities when we encounter
new data.
The core of Bayesian thinking (or Bayesian updating, as it can be called) is this: given that
we have limited but useful information about the world, and are constantly encountering
new information, we should probably take into account what we already know when we
learn something new. As much of it as possible. Bayesian thinking allows us to use all
relevant prior information in making decisions. Statisticians might call it a base rate,
taking in outside information about past situations like the one you’re in.
Consider the headline “Violent Stabbings on the Rise.” Without Bayesian thinking, you
might become genuinely afraid because your chances of being a victim of assault or
murder is higher than it was a few months ago. But a Bayesian approach will have you
putting this information into the context of what you already know about violent crime.
You know that violent crime has been declining to its lowest rates in decades. Your city is
safer now than it has been since this measurement started. Let’s say your chance of being
a victim of a stabbing last year was one in 10,000, or 0.01%. The article states, with
2/7
accuracy, that violent crime has doubled. It is now two in 10,000, or 0.02%. Is that worth
being terribly worried about? The prior information here is key. When we factor it in, we
realize that our safety has not really been compromised.
Conversely, if we look at the diabetes statistics in the United States, our application of
prior knowledge would lead us to a different conclusion. Here, a Bayesian analysis
indicates you should be concerned. In 1958, 0.93% of the population was diagnosed with
diabetes. In 2015 it was 7.4%. When you look at the intervening years, the climb in
diabetes diagnosis is steady, not a spike. So the prior relevant data, or priors, indicate a
trend that is worrisome.
It is important to remember that priors themselves are probability estimates. For each bit
of prior knowledge, you are not putting it in a binary structure, saying it is true or not.
You’re assigning it a probability of being true. Therefore, you can’t let your priors get in
the way of processing new knowledge. In Bayesian terms, this is called the likelihood ratio
or the Bayes factor. Any new information you encounter that challenges a prior simply
means that the probability of that prior being true may be reduced. Eventually, some
priors are replaced completely. This is an ongoing cycle of challenging and validating what
you believe you know. When making uncertain decisions, it’s nearly always a mistake not
to ask: What are the relevant priors? What might I already know that I can use to better
understand the reality of the situation?
Now we need to look at fat-tailed curves: Many of us are familiar with the bell
curve, that nice, symmetrical wave that captures the relative frequency of so many things
from height to exam scores. The bell curve is great because it’s easy to understand and
easy to use. Its technical name is “normal distribution.” If we know we are in a bell curve
situation, we can quickly identify our parameters and plan for the most likely outcomes.
3/7
At first glance they seem similar enough. Common outcomes cluster together, creating a
wave. The difference is in the tails. In a bell curve the extremes are predictable. There can
only be so much deviation from the mean. In a fat-tailed curve there is no real cap on
extreme events.
The more extreme events that are possible, the longer the tails of the curve get. Any one
extreme event is still unlikely, but the sheer number of options means that we can’t rely
on the most common outcomes as representing the average. The more extreme events
that are possible, the higher the probability that one of them will occur. Crazy things are
definitely going to happen, and we have no way of identifying when.
Think of it this way. In a bell curve type of situation, like displaying the distribution of
height or weight in a human population, there are outliers on the spectrum of possibility,
but the outliers have a fairly well defined scope. You’ll never meet a man who is ten times
the size of an average man. But in a curve with fat tails, like wealth, the central tendency
does not work the same way. You may regularly meet people who are ten, 100, or 10,000
times wealthier than the average person. That is a very different type of world.
Let’s re-approach the example of the risks of violence we discussed in relation to Bayesian
thinking. Suppose you hear that you had a greater risk of slipping on the stairs and
cracking your head open than being killed by a terrorist. The statistics, the priors, seem to
back it up: 1,000 people slipped on the stairs and died last year in your country and only
500 died of terrorism. Should you be more worried about stairs or terror events?
Some use examples like these to prove that terror risk is low—since the recent past shows
very few deaths, why worry?[1] The problem is in the fat tails: The risk of terror violence is
more like wealth, while stair-slipping deaths are more like height and weight. In the next
ten years, how many events are possible? How fat is the tail?
4/7
The important thing is not to sit down and imagine every possible scenario in the tail (by
definition, it is impossible) but to deal with fat-tailed domains in the correct way: by
positioning ourselves to survive or even benefit from the wildly unpredictable future, by
being the only ones thinking correctly and planning for a world we don’t fully understand.
This massively misunderstood concept has to do with asymmetries. If you look at nicely
polished stock pitches made by professional investors, nearly every time an idea is
presented, the investor looks their audience in the eye and states they think they’re going
to achieve a rate of return of 20% to 40% per annum, if not higher. Yet exceedingly few of
them ever attain that mark, and it’s not because they don’t have any winners. It’s because
they get so many so wrong. They consistently overestimate their confidence in their
probabilistic estimates. (For reference, the general stock market has returned no more
than 7% to 8% per annum in the United States over a long period, before fees.)
Another common asymmetry is people’s ability to estimate the effect of traffic on travel
time. How often do you leave “on time” and arrive 20% early? Almost never? How often
do you leave “on time” and arrive 20% late? All the time? Exactly. Your estimation errors
are asymmetric, skewing in a single direction. This is often the case with probabilistic
decision-making.[2]
Far more probability estimates are wrong on the “over-optimistic” side than the “under-
optimistic” side. You’ll rarely read about an investor who aimed for 25% annual return
rates who subsequently earned 40% over a long period of time. You can throw a dart at
the Wall Street Journal and hit the names of lots of investors who aim for 25% per annum
with each investment and end up closer to 10%.
When Vera Atkins was second in command of the French unit of the Special Operations
Executive (SOE), a British intelligence organization reporting directly to Winston
Churchill during World War II[3], she had to make hundreds of decisions by figuring out
the probable accuracy of inherently unreliable information.
Atkins was responsible for the recruitment and deployment of British agents into
occupied France. She had to decide who could do the job, and where the best sources of
intelligence were. These were literal life-and-death decisions, and all were based in
probabilistic thinking.
First, how do you choose a spy? Not everyone can go undercover in high-stress situations
and make the contacts necessary to gather intelligence. The result of failure in France in
WWII was not getting fired; it was death. What factors of personality and experience
5/7
show that a person is right for the job? Even today, with advancements in psychology,
interrogation, and polygraphs, it’s still a judgment call.
For Vera Atkins in the 1940s, it was very much a process of assigning weight to the
various factors and coming up with a probabilistic assessment of who had a decent chance
of success. Who spoke French? Who had the confidence? Who was too tied to family?
Who had the problem-solving capabilities? From recruitment to deployment, her
development of each spy was a series of continually updated, educated estimates.
Getting an intelligence officer ready to go is only half the battle. Where do you send them?
If your information was so great that you knew exactly where to go, you probably wouldn’t
need an intelligence mission. Choosing a target is another exercise in probabilistic
thinking. You need to evaluate the reliability of the information you have and the
networks you have set up. Intelligence is not evidence. There is no chain of command or
guarantee of authenticity.
The stuff coming out of German-occupied France was at the level of grainy photographs,
handwritten notes that passed through many hands on the way back to HQ, and
unverifiable wireless messages sent quickly, sometimes sporadically, and with the
operator under incredible stress. When deciding what to use, Atkins had to consider the
relevancy, quality, and timeliness of the information she had.
She also had to make decisions based not only on what had happened, but what possibly
could. Trying to prepare for every eventuality means that spies would never leave home,
but they must somehow prepare for a good deal of the unexpected. After all, their jobs are
often executed in highly volatile, dynamic environments. The women and men Atkins sent
over to France worked in three primary occupations: organizers were responsible for
recruiting locals, developing the network, and identifying sabotage targets; couriers
moved information all around the country, connecting people and networks to coordinate
activities; and wireless operators had to set up heavy communications equipment,
disguise it, get information out of the country, and be ready to move at a moment’s notice.
All of these jobs were dangerous. The full scope of the threats was never completely
identifiable. There were so many things that could go wrong, so many possibilities for
discovery or betrayal, that it was impossible to plan for them all. The average life
expectancy in France for one of Atkins’ wireless operators was six weeks.
Finally, the numbers suggest an asymmetry in the estimation of the probability of success
of each individual agent. Of the 400 agents that Atkins sent over to France, 100 were
captured and killed. This is not meant to pass judgment on her skills or smarts.
Probabilistic thinking can only get you in the ballpark. It doesn’t guarantee 100% success.
There is no doubt that Atkins relied heavily on probabilistic thinking to guide her
decisions in the challenging quest to disrupt German operations in France during World
War II. It is hard to evaluate the success of an espionage career, because it is a job that
comes with a lot of loss. Atkins was extremely successful in that her network conducted
valuable sabotage to support the allied cause during the war, but the loss of life was
significant.
6/7
Conclusion
Successfully thinking in shades of probability means roughly identifying what matters,
coming up with a sense of the odds, doing a check on our assumptions, and then making a
decision. We can act with a higher level of certainty in complex, unpredictable situations.
We can never know the future with exact precision. Probabilistic thinking is an extremely
useful tool to evaluate how the world will most likely look so that we can effectively
strategize.
References:
[1] Taleb, Nassim Nicholas. Antifragile. New York: Random House, 2012.
[2]Bernstein, Peter L. Against the Gods: The Remarkable Story of Risk. New York: John
Wiley and Sons, 1996. (This book includes an excellent discussion in Chapter 13 on the
idea of the scope of events in the past as relevant to figuring out the probability of events
in the future, drawing on the work of Frank Knight and John Maynard Keynes.)
[3]
Helm, Sarah. A Life in Secrets: The Story of Vera Atkins and the Lost Agents of SOE.
London: Abacus, 2005.
7/7