We Should Test AI the Way the FDA Tests Medicines
Predictive algorithms risk creating self-fulfilling prophecies, reinforcing preexisting biases. This is largely because it does not distinguish between causation and correlation. To prevent this, we should submit new algorithms to randomized controlled trials, similar to those the FDA supervises when approving new drugs. This would enable us to infer whether an AI is making predictions on the basis of causation.
We would never allow a drug to be sold in the market without having gone through rigorous testing — not even in the context of a health crisis like the coronavirus pandemic. Then why do we allow algorithms that can be just as damaging as a potent drug to be let loose into the world without having undergone similarly rigorous testing? At the moment, anyone can design an algorithm and use it to make important decisions about people — whether they get a loan, or a job, or an apartment, or a prison sentence — without any oversight or any kind of evidence-based requirement. The general population is being used as guinea pigs.
Artificial intelligence is a predictive technology. They assess, for example, whether a car is likely to hit an object, whether a supermarket is likely to need more apples this week, and whether a person is likely to pay back a loan, be a good employee, or commit a further offense. Important decisions, including life-and-death ones, are made on the basis of algorithmic predictions.
Predictions try to fill in missing information about the future in order to reduce uncertainty. But predictions are rarely neutral observers — they change the state of affairs they predict, to the extent that they become self-fulfilling prophecies. For example, when important institutions such as credit ratings publish negative forecasts about a country, that can result in investors fleeing the country, which in turn can cause an economic crisis.
Self-fulfilling prophecies are a problem when it comes to auditing the accuracy of algorithms. Suppose that a widely used algorithm determines that you are unlikely to be a good employee. Your not getting any jobs should not count as evidence that the algorithm is accurate, because the cause of your not getting jobs may be the algorithm itself.
We want predictive algorithms to be accurate, but not through any means — certainly not through creating the reality they are supposed to predict. Too many times we learn that algorithms are defective once they have destroyed lives, as when an algorithm implemented by the Michigan Unemployment Insurance Agency falsely accused 34,000 unemployed people of fraud.
How can we limit the power of predictions to change the future?
One solution is to subject predictive algorithms to randomized controlled trials. The only way to know if, say, an algorithm that assesses job candidates is truly accurate, is to divide prospective employees into an experimental group (which is subjected to the algorithm) and a control group (which is assessed by human beings). The algorithm could assess people in both groups, but its decisions would only be applied to the experimental group. If people who were negatively ranked by the algorithm went on to have successful careers in the control group, then that would be good evidence that the algorithm is faulty.
Randomized controlled trials would also have great potential in identifying biases and other unforeseen negative consequences. Algorithms are infamously opaque. It’s difficult to understand how they work, and when they have only been tested in a lab, they often act in surprising ways once they get exposed to real-world data. Rigorous trials could ensure that we don’t use racist or sexist algorithms. An agency similar to the Food and Drug Administration could be created to make sure algorithms have been tested enough to be used on the public.
One of the reasons randomized controlled trials are considered the golden standard in medicine (as well as economics) is because they are the best evidence we can have of causation. In turn, one of AI’s most glaring shortcomings is that it can identify correlations, but it doesn’t understand causation, which often leads it astray. For example, when an algorithm decides that male job candidates are likelier to be good employees than female ones it does it because it cannot distinguish between causal features (e.g., most past successful employees have attended university because university is a good way to develop one’s skills) and correlative ones (e.g., most past successful employees have been men because society suffers from sexist biases).
Randomized controlled trials have not only been the foundation of the advancement of medicine, they have also prevented countless potential disasters — the release of drugs that could have killed us. Such trials could do the same for AI. And if we were to join AI’s knack to recognize correlations with the ability of randomized controlled trials to help us infer causation, we would stand a much better chance of developing both a more powerful and a more ethical AI.