Handing off decision-making to predictive AI would be catastrophic
You may have caught the following article trending on Medium recently:
And gives us the tools for our next evolutionary step
The fact that this article is trending is a bad thing. It is a sign that we are inching closer to another AI winter.
The author David Weinberger’s central thesis is:
- Humans endeavor to gain an understanding of complex systems. However, the predictions we make based on “human understanding” is not as accurate as those of AI, which don’t truly understand anything.
- However, AI’s predictions are more accurate than predictions based on human understanding.
- Therefore, we should abandon our pursuit of understanding, and focus on building the AI that can make the decisions for us.
- Handing over the reigns to predictive AI will usher in the next stage of human evolution.
The fourth point seems like more of a conceit for a transhumanist sci-fi novel than an argument and Weinberger doesn’t provide support for it, so I focus here on deconstructing the first three points.
Prediction without understanding cannot advance science
To effectively contain a civilization’s development and disarm it across such a long span of time, there is only one way: kill its science. — Cixin Liu, The Three Body Problem
I take it for granted that we want science to advance for many reasons, such as improving medicine or our ability to make useful things. Good engineering practice takes you from the bow and arrow to the crossbow. Science takes you from the bow and arrow to the cruise missile.
From Weinberger’s article:
Deep learning’s algorithms work because they capture, better than any human can, the complexity, fluidity, and even the beauty of a universe in which everything affects everything else, all at once. We’re beginning to accept that the true complexity of the world far outstrips the laws and models we devise to explain it. We need to give up our insistence on always understanding our world and how things happen in it.
But insisting on understanding our world is how science advances.
“Understanding” in this context means a model — defined as a set of assumptions about how things within a specific problem domain work, expressed in logical form. These assumptions include the core components of the system, how they affect one another, and what the effect of changing one variable is on another variable.
You’ve probably heard the expression by statistician George Box, “all models are wrong, some are useful.” We know for example that our current model of the systems biology is incomplete and in many ways wrong, but it has provided useful life-saving drugs and gene therapies. The advancement of science is a process of new models displacing older ones because the new ones can explain empirical data in ways the old ones cannot (See Thomas Kuhn’s Structure and Theory of Scientific Revolutions for a deep dive into this argument).
Weinberger is essentially arguing in favor of the model-blind side of the model-blind/model-based dichotomy. While model-based methods predict empirical data under the constraints of a model, model-blind methods shrug off those constraints and focus on building a predictive algorithm with optimal accuracy. These methods commonly exceed model-based methods in predictive accuracy.
But more accurate prediction is not how science advances. Copernicus’s heliocentric model of the solar system did not predict the movements of objects in the sky as well as the geocentric Ptolemaic model that preceded it. But it was a “less wrong” model and paved the way for Keppler’s even better model, and from there, Google maps.
Cutting-edge predictive machine learning tools that as Weinberger says predict well but lack understanding, such as deep neural nets, work by being amazing at finding complex correlations in high dimensional space. Everyone learns in stats 101 that correlation alone (no matter how nuanced) does not imply causation, where causation is an assumption about how components in a system affect one another. You need understanding for that.
Predictive accuracy is not the only performance measure we care about
From Weinberger’s article:
Deep Patient was not only able to diagnose the likelihood of individual patients developing particular diseases, it was in some instances more accurate than human physicians, including about some diseases that until now have utterly defied predictability.
But prediction isn’t everything. I recently tweeted:
90% sounds pretty good! Perhaps, I could go a step further and train a deep net on the sounds emitted by the barrel as it is spinning, right before I slam it into the receiver, and maybe get 95% accuracy.
My point is obvious; predictive accuracy is not the only thing that matters in decision making. Very accurate prediction engines will still be wrong sometimes, and the consequences of being wrong can be catastrophic. Statistician and ML expert Michael Jordan gave a personal example of an incorrect prediction based on an ultrasound that his unborn infant would have an intellectual disability. This prediction led the doctor to recommend a risky medical procedure. Getting this prediction leads many would-be parents to decide to terminate a pregnancy.
Medical diagnosis is a domain where you care more about the risk of a false positive than predictive accuracy. In some fields, you might care more about the risk of a false negative, such as if your job were to stop terrorists from getting on planes, or stopping hackers from accessing valuable secrets or letting through that one rogue trade that will blow up your investment bank.
Decisions based on an algorithm’s highly accurate predictions in a complex system can lead to catastrophe in the rare instances it gets the prediction wrong.
In practice we can often address such cases by adjusting the decision threshold, toggling between being more sensitive or more specific. But this does not solve cases of black swans, where events with severe consequences are too rare to have appeared in the data used to train the prediction algorithm.
Algorithmic bias. Another prediction risk without a straightforward fix is that of algorithmic bias. A 2016 piece by ProPublica first highlighted the problem of racial bias in machine learning algorithms used in the context of criminal justice, specifically in predicting whether or not an individual will commit future offenses and basing sentences or parole decisions on that prediction. More recently Amazon came under scrutiny for selling facial recognition services to police departments; the tech was subsequently shown to have a racial bias. Of course, the problem of algorithmic bias extends far beyond race.
Even when these criminal justice algorithms don’t explicitly use race as a feature, they can engineer such features. When a job application doesn’t directly ask for race, a race proxy such as a name (e.g., Daniel vs. Darnell) could still indicate race to a hiring manager. These prediction algorithms can also use such proxies, except these proxies can be encoded as complex relationships between nodes in a deep neural network, such that they are too complex for humans to detect or understand.
Some would argue that predicting higher crime risk among blacks is the logical outcome of higher crime rates among blacks. Rather than argue about race politics and interpretation of crime statistics, I’d point out that accurate prediction is entirely beside the point. Justice is our performance metric! Justice is a core principle of our criminal justice system, and justice means judging people according to factors they can control and not by factors they cannot, such as their race, gender, zip code, whether or not they have incarcerated family members, etc.
If the only metric that mattered when employing machine learning in crime and punishment were predictive accuracy, we’d be living in a Minority Report style dystopia where babies born to low-income single moms get probation trackers attached at birth.
When our models predict but don’t understand, we make bad decisions
From Weinberger’s article:
We humans have long been under the impression that if we can just understand the immutable laws of how things happen, we’ll be able to perfectly predict, plan for, and manage the future. If we know how weather happens, weather reports can tell us whether to take an umbrella to work.
Let’s follow Weinberger’s advice and abandon our human attempts to understand the weather and instead train a deep neural network on historical weather data, and use it to predict each morning whether or not it is going to rain that day. At the end of each day, we feed the weather data for that day into the algorithm, so it updates its weights such that they stay attuned to the present climate.
Suppose we use this algorithm’s prediction to decide whether or not to carry an umbrella. In the previous section, I argued that this understanding-free highly accurate prediction engine might kill us when a category five storm hits. But if we ignore that, then Weinberger’s has presented an excellent ML case study for his argument.
Now let’s consider a similar case study in business. Instead of weather, we will forecast revenue on your e-commerce website in a given month, given internal financial information and market conditions from the previous month. If the deep net predicts revenue will drop off next month (a rainy day prediction), then we will run an advertising campaign to boost demand (carry an umbrella).
However, in the first case, the weather is not affected by whether or not we carry an umbrella. In the second case, future revenues are affected by whether or not we run an ad campaign. Those effects are fed back into the algorithm through that future revenue data, which affects the prediction, which affects future decisions, and so on, creating a feedback loop that will cause suboptimal decision making.
The solution to this problem is to build a predictive algorithm with a model of cause and effect so that it can adjust for cause and effect when it makes predictions.
The stakes can be much higher than the loss of revenue. Several extensive observational studies looked at postmenopausal women’s medical records and discovered that hormone use (estrogen and progestin) predicted a decrease in coronary heart disease (CHD). Based on this, doctors started prescribing hormone supplementation to postmenopausal women as a means of preventing CHD.
However, when the Women’s Health Initiative performed a randomized trial, they discovered that women who supplemented hormones had an increased incidence of CHD.
Why did this happen? Here’s a guess. Maybe some of the postmenopausal women in these initial studies were well off. They had the money to sign up for expensive exercise classes, like CrossFit and indoor rock climbing. They heard advice from their affluent gym buddies to mitigate the effects of menopause with hormone supplementation. Plus, all that exercise was strengthening their hearts, offsetting whatever damage the hormones were doing.
Or maybe, instead of affluence and overly intensive exercise, it was some other unobserved set of causes, what statisticians call confounders. The randomization in the trial eliminated the influence of these confounders, demonstrating the actual effect of hormone supplementation. The nature of the confounders is beside the point. The point is that in the presence of confounders, going from prediction straight to policy is a bad idea. The cost, in this case, was in human lives.
But in the era of big data couldn’t we measure everything that matters so there won’t be any confounders? No. The processes that generate big data don’t care about the decisions you plan to make unless you were the one who designed the processes.
The things that matter that you failed to measure in the data don’t poke you on the shoulder and tell you they are there. They just go on confounding in silence.
The solution is not to give up on understanding, but to build AI that understands
Weinberger conflates AI and highly accurate black-box prediction without understanding. The fact is that there are machine learning algorithms that make their own assumptions about the cause-effect relationships within a domain. In other words, they try to understand. These algorithms can learn these relationships from passive observation of data (e.g.; the PC algorithm), and they can they also try intervening in the data generating process to get a more direct picture of what affects what. In this manner, they behave like a human scientist.
I am not disparaging deep learning. The most cutting edge of this class of algorithms indeed employ deep neural network architectures, as was apparent at the 2018 causal learning workshop at NeurIPS, where deep learning experts such as Yoshia Bengio were in attendance.
Yes, complex systems are hard to model. But giving up on understanding and letting dumb AI make the tough decisions would be a disaster. Instead, we should focus on building AI that understands.