Artificial Intelligence (AI) algorithms are a fact of modern digital life. They inhabit the “black boxes” via which Netflix determines which programs to recommend or — infamously — who should be targeted for baby products advertising. They are also increasingly being used to shortlist applicants for recruitment decisions, read medical images and make diagnoses, determine applicant eligibility for loans, and identify children “at risk” of hospital admission, thereby facilitating pre-emptive allocation of social support to prevent future harm and reduce hospitalization costs.
A knitting pattern is an algorithm
While the workings within the black box may appear mysterious, it must be remembered that algorithms are just specified processes or designated sets of rules for using inputs to get a defined output. There is nothing particularly special about computers and algorithms. A knitting pattern is an algorithm, just like the checklist that pilots go through before the plane takes off. Algorithms serve to reduce variability in the interpretation and use of inputs when seeking the outputs. When precisely followed (i.e., with none of the variation that might occur when humans interpret inputs differently), the outputs will be predictable.
On one hand, algorithms are simply models built using observed data. AI algorithms are an advancement on standard algorithms because specified processes can be used in building and refining the algorithm itself. Input data can be used to “train” the model without human involvement. Hence, they tend to outperform human decision makers for consistency when it comes to decisions such as the sentences judges impose for crimes (where it has been shown that the judge’s biases, disposition, or even whether or not they had lunch can lead to inconsistencies between sentences for otherwise-identical crimes).
On the other hand, AI algorithms, as deterministic programs, will reflect any errors or inconsistencies inherent in the data or human processes used to specify and train them in the first place. Just as with any other computer program, “garbage in” leads to “garbage out.” The model will be only as good as the data training it. Hence, algorithms trained to identify polar bears may fail to identify a polar bear in a zoo if all the training examples show polar bears in their natural, snowy, environment. The algorithm doesn’t recognize a polar bear the same way a human would, and the common repeated characteristics in the input data—in this case snow—could cause the algorithm to fail at identifying the polar bear in a snowless zoo.
Biases, unintended or otherwise
Similarly, biases, unintended or otherwise, captured in the training data will be reliably replicated in the algorithm-determined models. If, for example, the data for deciding who is eligible for a loan includes biases against, say, women (even if gender isn’t specifically included as a variable in the training data), then the ensuing model will deliver decisions biased against women. But if the decision-maker explicitly wants to bias decisions in favor of a specific group, then a model derived from unbiased data will not serve the purpose. As AI algorithms extend the art of decision-making modelling, it thus makes sense to subject their uses to a set of well-tested principles governing the design and use of models.
Ruling the modellers
First, the modeler must understand the context in which the algorithm will be used—that is have a clear understanding of the problem to which an answer is sought. Models should assist in making decisions, not become rule makers that then indiscriminately determine the future outcome. Though, in some cases, this may be the decision maker’s objective—for example, social media algorithms created for removing terrorist material.
Second, the modeler needs to know the data used to train the algorithm. They should know where and how was the data was collected, what biases may have been present in its collection, and what data is actually being included in the algorithm and what isn’t. This will assist in identifying the model’s limits—what it will be good at doing and where it may fall short. This will then help answer future questions when the algorithm appears to provide anomalous outputs. Understanding the data also minimizes the risk of algorithms and data being used in situations where they are more likely to lead to errors or anomalies.
These precautions help resist the “rush to computerize” urge. Just because the data exists and computer algorithms enable it does not always mean that any one AI model will be good or fit for the purpose to which it is applied. Thoughtful modelling practice will necessarily lead to better AI modelling.
Originally published in AEIdeas, 28 October 2022.