Bullshit or science?

What the theorem really means is that we should never be too sure about anything. If you want an example of this, you only need to read two books about whether God and Jesus Christ existed or not.

I first heard of Bayes’ theorem when I followed a crash course in statistics at the University of Sussex in 1971. I learnt the theorem and its uses in quantitative finance, but never got round to finding its origins. Over the last decade, however, I started paying attention to it, after I kept coming across articles touting it as an almost magical guide for navigating through life.

Then, the statistician Vincent Marmarà started his election polls using the multiple imputation technique, and he got my attention. He has since become the most successful pollster in Malta ever. Multiple imputation, of course, is motivated by the Bayesian framework and, as such, the general methodology used by Marmarà is to impute using the posterior predictive distribution of the missing data in voter surveys, given the observed data and some estimate of the parameters.

In the theorem, any quantity that is not known as an absolute fact (e.g. it could be the number of people who decide on whom they are going to vote for based on what is happening to the Gross National Product) is treated probabilistically, meaning that a numerical probability or a probability distribution is assigned: the probability of people voting for a party that is held to be responsible for a rising GDP is higher than that of people who would vote for a party that is not held responsible for the higher GDP.

Second, research questions and designs are based on prior knowledge and expressed as prior distributions (the more surveys are held, the more knowledge there is about voter likes and dislikes, and how such knowledge is distributed among the population).

Finally, these prior distributions are updated by conditioning on new data as they are observed to create a posterior distribution that is a compromise between prior and data knowledge. This is a rigorous mathematical process.
In a Bayesian opinion poll, pollsters use prior knowledge of a population to construct a prior distribution of how the population will vote. Then, after they interview a random sample of the target population, they update the distribution (to get a so-called posterior distribution) and use this to make the forecasts.

📺 If you want a simple explanation of Bayes’ theorem, you might wish to watch a short video about it here.

It’s everywhere

First articulated in the 18th century by a hobbyist-mathematician seeking to reason backward from effects to cause, Bayes’ theorem spent the better part of two centuries struggling for recognition and respect. Yet today, argues Tom Chivers in a new book titled ‘Everything Is Predictable’, it can be seen as “perhaps the most important single equation in history”.

The core of the theorem is a quantitative method for getting wiser step-wise by constantly updating what you think you know. It is a method for calculating the validity of beliefs (hypotheses, claims, propositions) based on the best available evidence (observations, data, information).

Here’s the simplest description: Initial belief plus new evidence = new and improved belief. The process is repeated continuously.

The concept was developed by Thomas Bayes, a Presbyterian minister, but published by his friend Richard Price, who rescued it from oblivion in 1763. The work “sunk almost without trace”, Mr Chivers says, until it was independently discovered and refined in 1774 by the French polymath Pierre-Simon Laplace.

Image: The Royal Society

Today, Bayes’ theorem drives the logic of spam filters and artificial intelligence. Once you look for it, Mr. Chivers says, you start to see Bayes’ theorem everywhere. Bayesian statistics “are rippling through everything from physics to cancer research, ecology to psychology,” The New York Times reports.

Physicists have proposed Bayesian interpretations of quantum mechanics. Philosophers assert that science as a whole can be viewed as a Bayesian process, and that Bayes can distinguish science from pseudoscience more precisely than falsification, the method popularised by the philosopher Karl Popper.

Artificial-intelligence researchers, including the designers of self-driving cars, employ Bayesian software to help machines recognise patterns and make decisions. Bayesian programmes, according to Sharon Bertsch McGrayne, author of a popular history of Bayes’ theorem, “sort spam from e-mail, assess medical and homeland security risks and decode DNA, among other things.”  On the website edge.org, physicist John Mather frets that Bayesian machines might be so intelligent that they make humans “obsolete”.

Many cognitive scientists speculate that the human brain incorporates Bayesian algorithms as they perceive, deliberate, decide. Some insist that if more of us adopted conscious Bayesian reasoning, the world would be a better place. In a world where personal beliefs have become equivalent to facts, irrespective ̶ or even contrary to ̶ evidence, Baynesian reasoning would surely help.

Belief as valid as its evidence

Under Baynesian reasoning, the plausibility of your belief depends on the degree to which your belief ̶ and only your belief, not that of others ̶ explains the evidence for it. The more alternative explanations there are for the evidence, the less plausible your belief is. That, to me, is the essence of Bayes’ theorem.

“Alternative explanations” can encompass many things. Your evidence might be erroneous, skewed by a malfunctioning instrument, faulty analysis, confirmation bias, even fraud. Your evidence might be sound but explicable by many beliefs, or hypotheses, other than yours.

Bayes theorem has become so pervasive that it is now even used in criminal trials. Throughout the course of a criminal trial, there are many different pieces of evidence and alibis that jurors are to consider when determining the verdict. As would be expected there is usually conflicting pieces of evidence provided by the defence and prosecution and from this evidence the jury needs to determine the likelihood of either side’s story being true.

The most notable examples of the theorem being used in the courtroom was the jury trial of O.J. Simpson for the murders of his ex-wife Nicole Brown Simpson and her friend Ronald Goldman, and in the 2007 trial of Levi Bellfield for the murders of Marsha McDonnell and Amelie Delagrange in the UK.

One of the most notable examples of Bayes’ theorem being used in the courtroom was the jury trial of O.J. Simpson. Photo: VINCE BUCCI/AFP/Getty Images

There’s nothing magical about Bayes’ theorem. It boils down to the truism that your belief is only as valid as its evidence. If you have good evidence, Bayes’ theorem can yield good results. If your evidence is flimsy, Bayes’ theorem won’t be of much use. Garbage in, garbage out.

I think that a good dose of Bayesian reasoning might also do wonders in the fields of politics. At the moment, thousands of people in Malta have concluded that certain people are guilty or innocent of all sorts of crimes on scant evidence. They might well keep in mind what Edgar Allen Poe said in ‘The Narrative of Arthur Gordon Pym of Nantucket’. In this work, I came across this sentence: “In no affairs of mere prejudice, pro or con, do we deduce inferences with entire certainty, even from the most simple data.”

Never too sure about anything

Ingrained in Bayes’ theorem is a moral message: many a time, the evidence will just confirm what you already believe.  This helps explains why so many claims turn out to be erroneous. Bayesians claim that their methods can help common people and scientists overcome confirmation bias and produce more reliable results. But a word of caution is also required. As science writer Faye Flam put it recently in The New York Times, Bayesian statistics “can’t save us from bad science”.

So, while I think that Bayes’ theorem can be a useful tool in certain circumstances, relying on it entirely is as likely to send you off at a tangent as not using it at all. In order words, the theorem is cool but what it really means is that we should never be too sure about anything, including Bayes’ theorem. The claim that Bayes’ theorem can protect us from bullshit is, well, bullshit.

If you want an example of this, you only need to read two books about whether God and Jesus Christ existed or not. The first, by Stephen Unwin, is called ‘The Probability of God: A Simple Calculation That Proves the Ultimate Truth’, in which he uses Bayes’s theorem to demonstrate, with probability one minus epsilon, that the Christian God exists. That was countered by ‘Proving History: Bayes’s Theorem and the Quest for the Historical Jesus’ by Richard Carrier, who uses Bayes’s theorem to prove, with probability one minus epsilon, that the Christian God does not exist because Jesus himself never did.

The real question is this: how can probability prove a thing and its opposite simultaneously? The answer is simple: the same way logic can prove a thing and its opposite. All arguments of certainty and uncertainty are conditional. For example, is the proposition “Jesus was divine” true? Well, that depends on the evidence. If you say, “Given Jesus lived, died, and was resurrected as related in the Gospels” then the proposition has probability one (i.e., it is true). But if you say, “Given Jesus was a myth, created as a conspiracy to flummox the Romans and garner tithes”, then the proposition has probability zero (i.e., it is false).

Given still other evidence, the probability the proposition is true may lie between these two extremes. In no case, however, is probability or logic broken. It does explain why focusing on probability is wrong, though. Both Unwin and Carrier would have helped themselves better, and contributed to a more fruitful discussion about Jesus, had they made their evidence explicit and eschewed unnecessary quantification.

Unwin gives a Bayes factor of 2, concluding that in his perspective, the probability of God’s existence is 67%. However, Unwin admits that this test is extremely sensitive to the choice of prior beliefs. Under his assessment of the evidence, his prior belief in God’s existence (50%) yields the probability of God’s existence at 67%; using prior beliefs of 10% or 75%, using the same evidence, swings the result to 18% or 86% respectively.

An early review of Unwin’s work, which is mercifully brief, asks just the right question: “Can you imagine anyone arguing that the existence of evil in the world, given that God exists, is 23% as opposed to 24%, for instance?” Indeed. The crucial factor is that probability questions have an order. That is, the probability that evil exists given God does, is different from the probability that God exists given evil does.

So here we have probability proving two diametrically opposite conclusions. In a very much neglected ‘Treatise on Probability’, the economist John Maynard Keynes put his finger on the difficulty people have with probability, particularly Bayes’s Theorem: “No other formula in the alchemy of logic has exerted more astonishing powers. For it has established the existence of God from the premiss of total ignorance; and it has measured with numerical precision the probability the sun will rise tomorrow.”

Art or science?

Where does this leave us? There is no doubt that Bayes’s theorem has found a home in political science. This is because it formalises a basic cognitive process: updating expectations as new information is obtained. Also, the versatility of the formula, makes it an exceptionally useful tool for scientists studying social phenomena using a variety of methods.

However, like any other tool, Bayes’s theorem and its various permutations can be misused and, if the social scientist is not careful, one day he can get a nasty surprise. This is because constructing prior distributions can sometimes be an art rather than a science, and errors could have a cascading effect.

0 0 votes
Article Rating
Notify of
Inline Feedbacks
View all comments