Types of Machine Learning — A Taxonomy for Legal Minds
Supervised, unsupervised, reinforcement learning and their legal implications
In the legal and policy world, we are trained to see the world through a framework of rules, precedents, and definitions. As Artificial Intelligence (AI) becomes embedded in everything from our financial markets to our justice systems, it is no longer enough to see it as a black box. This inaugural issue will begin by exploring the fundamental technical methods of how AI is built.
AI is a broad umbrella, but in practice, many of the most significant and pervasive applications we encounter daily are powered by Machine Learning (ML). ML is the computational method that allows systems to learn from data and identify patterns without being explicitly programmed for every contingency. Think of the systems that recommend a movie on Netflix or a product on Amazon, or even the spam filter in your email.
ML models are not a monolith; they learn in distinct ways, each presenting a unique profile of capabilities and risks. Understanding this taxonomy is the first step toward crafting nuanced and effective governance.
Supervised Learning (aka Learning from the Past): This approach uses labeled examples, or outputs, to train a model to make predictions about new, unseen examples. Remember studying for an exam with a complete answer key? An example is training a model to predict how much someone will like a new recipe, using past recipes (as input) and ratings (as labels).
Legal implications: This approach is often at the center of bias and discrimination lawsuits, as the quality and representativeness of the labeled data are critical to a fair outcome. In 2018, Amazon scrapped an experimental AI recruiting tool which had been trained on a decade of resumes that predominantly reflected male candidates, causing the model to penalize resumes with terms like "women's". An employment decision produced by such a model, even if the bias is unintentional, could form the basis of a disparate impact claim under anti-discrimination laws.
Unsupervised Learning: A machine is given a large amount of data and is tasked with making sense of it by finding patterns or regularities. This is useful for tasks like clustering unknown symbols from an archaeological site or for systems like word2vec. Because these models find patterns on their own without explicit labels, their decision-making process can be difficult to trace and explain, even to its creators, leading to what is known as the black box problem. The model might cluster a group of loan applicants and flag them as high-risk, but it may be unable to articulate a simple, human-understandable reason why. The decision is a result of complex, high-dimensional mathematical correlations, not a clear, linear rule.
Legal implications: The Equal Credit Opportunity Act in the US requires lenders to provide a specific reason when denying credit. How can a financial institution comply with this law if its unsupervised learning model flags an applicant based on a complex pattern of correlations? This challenge is at the heart of the debate surrounding the European Union's General Data Protection Regulation (GDPR), which includes provisions that have been interpreted as a "right to explanation" for automated decisions.
Reinforcement Learning (RL) (aka Learning by Doing): RL involves training artificial agents to make decisions based on rewards and punishments from their environment. It's used when we know that one output is better than another, but we don't know the single "best" way to do something. The agent's key objective is to learn a strategy, or "policy", that maximizes its cumulative reward over time. An example is training an elevator to find the best stopping position to minimize wait times, or training financial trading algorithms to maximize returns.
Legal implications: Agency comes into play. An RL agent is not just classifying data or finding patterns; it is an active, goal-seeking entity that can devise novel and unexpected strategies to achieve its objectives. First, if an RL agent is making its own decisions to maximize rewards, what is its legal status, and who is responsible for its actions?
Second, this behavior directly attacks the legal concept of foreseeability, a cornerstone of tort law, particularly in negligence and product liability cases. A manufacturer is typically held liable for harms that are a reasonably foreseeable consequence of a product's defect or use. But what happens when an autonomous vehicle, trained via RL, develops a harmful driving strategy that its developers could not have reasonably predicted?
Photo by Conor Samuel on Unsplash If the specific failure mode was unforeseeable, establishing fault under a traditional negligence framework becomes exceedingly difficult. This legal quandary is pushing lawmakers and legal scholars to consider alternative liability regimes, such as strict liability, where the act of deploying a powerful and inherently unpredictable autonomous agent carries liability for any resulting harm, regardless of the developer's specific fault or ability to foresee the failure.
Weak supervision: While the three paradigms provide a clean conceptual framework, the reality of applied machine learning is often messier. A significant amount of work falls into a hybrid category known as semi-supervised or weakly supervised learning. In many real-world scenarios, obtaining a perfectly labeled dataset for supervised learning is prohibitively expensive or impossible. Weak supervision uses a small amount of high-quality labeled data alongside a much larger trove of unlabeled data to train a model. For instance, a social media platform might use user-generated hashtags as loose labels for training an image classification model. The hashtag #beach may appear on photos of sandy shores, but also on photos of coastal roads or even restaurants with a beach theme. The model must learn to perform its task despite the inherent imprecision of its training data.
Legal implications: If a content moderation model, trained on noisy user hashtags, incorrectly removes a piece of protected political speech, who bears the liability? Is it the user who applied the ambiguous hashtag? Is it the platform that designed a system knowing its inputs were imperfect? At the very least, this highlights the importance of maintaining a "human in the loop" for consequential decisions, establishing clear and accessible appeals processes for those affected by an automated decision. Accountability cannot rest solely on the algorithm's output; it must be built into the entire human-machine system.
As this primer demonstrates, not all AI is created equal. The specific way a machine learns—whether from a perfect dataset, a trove of unlabeled data, or by trial and error—fundamentally shapes its capabilities and risks. By moving beyond the headlines and engaging with the underlying technology, we can begin to craft intelligent, effective, and forward-looking governance that matches the complexity of the systems we seek to regulate.

