A Legal Informatics Approach to Aligning Artificial Intelligence with Humans

Artificial Intelligence (AI) capabilities are rapidly advancing, and highly capable AI could cause radically different futures depending on how it is developed and deployed. We are currently unable to specify human goals and societal values in a way that reliably directs AI behavior. Specifying the desirability (value) of an AI system taking a particular action in a particular state of the world is unwieldy beyond a very limited set of value-action-states. The purpose of machine learning is to train on a subset of states and have the resulting agent generalize an ability to choose high value actions in unencountered circumstances. But the function ascribing values to an agent’s actions during training is inevitably an incredibly incomplete encapsulation of the breadth of human values, and the training process is unavoidably a sparse exploration of states pertinent to all possible futures. Therefore, after training, AI is deployed with a coarse map of human preferred territory and will often choose actions unaligned with our preferred paths.

Law is a computational engine that converts opaque human values into legible and enforceable directives. Law Informs Code is the research agenda attempting to capture that complex computational process of human law, and embed it in AI. Similar to how parties to a legal contract cannot foresee every potential “if-then” contingency of their future relationship, and legislators cannot predict all the circumstances under which their proposed bills will be applied, we cannot ex antespecify “if-then” rules that provably direct good AI behavior. Legal theory and practice have developed arrays of tools to address these specification problems, a language of alignment. For instance, legal standards allow humans to develop shared understandings and adapt them to novel situations, i.e., to generalize expectations regarding actions taken to unspecified states of the world. In contrast to more prosaic uses of the law (e.g., as a deterrent of bad behavior through the threat of sanction), leveraged as an expression of how humans communicate their goals, and what society values, Law Informs Code.

We are building systems that leverage data generated by legal processes and the theoretical constructs and practices of law (methods of law-making, statutory interpretation, contract drafting, applications of standards, legal reasoning, etc.) to facilitate the robust specification of inherently vague human goals for AI. This helps with human-AI alignment and the local usefulness of AI. Toward society-AI alignment, we are developing a framework for understanding law as the applied philosophy of multi-agent alignment, harnessing public law as an up-to-date knowledge base of democratically endorsed values ascribed to state-action pairs. Although law is partly a reflection of historically contingent political power – and thus not a perfect aggregation of citizen preferences – if properly parsed, its distillation offers a legitimate computational comprehension of human goals and societal values.

View Publication

Project Lead: John Nay

The descriptions of current and past projects of CodeX non-residential fellows are provided to illustrate the kind of work our non-residential fellows are carrying out. These projects are listed here for informational purposes only and are not endorsed by CodeX, Stanford Law School, or Stanford University.