Legal informatics can play a unique role in theoretically framing, and technically implementing, the alignment of AI behavior with human goals.
AI is being widely deployed. Meanwhile, specifying the desirability (i.e., value) of an AI system taking any particular action in any particular state of the world is unwieldy beyond a very limited set of value-action-state tuples. In fact, the purpose of most AI systems is to train on a subset of value-action-state tuples and have the resulting model generalize to previously unencountered states, i.e., maintain the same level of performance in novel circumstances sufficiently different from the training data. The reward function ascribing values to an agent’s actions during training is inevitably a proxy for the breadth of human preferences. Even in training, an AI system often exhibits unanticipated “shortcut” behaviors that optimize the inherently limited function, leading the system to aggressively optimize for reward at the expense of other (usually less quantifiable) variables of interest. Surprising, and negative, AI behaviors may result. Learning reward functions based on human feedback or human demonstration can help. But, regardless of approach, it is not possible to manually specify or automatically enumerate a discernment of humans’ desirability of all actions a system might take. Therefore, after training, the system is deployed with an incomplete map of human preferred territory. The resulting mismatch between what a human wants and what an AI does is the alignment problem.
Recognizing that alignment is a spectrum, and that it is a moving target as AI capabilities advance, we have three primary desiderata for a framework to increase alignment. First, the framework should be theoretically well-developed, with modular constructs for specifying human goals that handle ambiguity and novelty. Second, it should scale with AI: as AI systems become more capable, the framework should provide solutions calibrated to that higher level of capability. Third, it should be rigorously battle-tested – even better if the documentation of the battle-testing produces reams of data that can be leveraged for learning societal values.
Law, as the applied philosophy of multi-agent alignment, uniquely fulfills these demands. AI alignment is a problem because we cannot directly specify AI behavior ex ante. Similarly, parties to a legal contract cannot predict every contingency of their relationship, and legislators cannot predict the specific circumstances under which their laws will be applied. Methodologies for making (public and private) law, and for interpreting law – that apply broad pre-determined goals to novel specific circumstances – have been theoretically refined by scholars, practitioners, and courts for centuries (requirement one). As the state-of-the-art for AI advances, we can set higher bars of demonstrated legal understanding capabilities. If a developer claims their system has advanced capabilities on tasks, then the developer could demonstrate correspondingly advanced legal abilities of the AI (requirement two). The practices of making, interpreting, and enforcing law have been battle tested through millions of legal and regulatory actions that have been memorialized in digital format (requirement three), providing large data sets of detailed historical examples, generalizable precedents with accompanying explanations, and millions of well-trained active lawyers from which to elicit targeted AI model fine-tuning feedback to embed an ever-evolving comprehension of human goals.
We are leveraging law, as purely an information source for AI training and validation, as potentially a significant tool in the alignment kit.
The descriptions of current and past projects of CodeX non-residential fellows are provided to illustrate the kind of work our non-residential fellows are carrying out. These projects are listed here for informational purposes only and are not endorsed by CodeX, Stanford Law School, or Stanford University.