A Passion for Data, a Vision for Law

With the launch of a groundbreaking contract dataset and a new lab for legal AI, Julian Nyarko is driving the future of law

To support himself while earning a string of law degrees on two continents, Julian Nyarko worked as a pizza delivery driver, cashier, call center worker, and boxing coach for marginalized youth in his hometown, Hamburg, Germany. The son of a social worker mother and a father who emigrated from Ghana and drove a taxi, Nyarko was the first in his family to attend college. His early experiences, he says, helped shape his scholarly focus on fairness and transparency.

That—and a gift for math and data analysis—defined the kinds of questions he wanted to ask.

Since joining the Stanford Law faculty in 2019, Nyarko has built a prolific research portfolio that blends large datasets, machine learning, and a focus on algorithmic fairness. His 2024 paper, “What’s in a Name?,” revealed troubling patterns in how large language models (LLMs) respond to prompts containing names associated with different races and genders. In a more recent computational study, “Breaking Down Bias,” Nyarko and his co-authors found that racial and other biases exhibited by LLMs can be “pruned” away, but because the biases are highly context-specific, there are limits to holding AI model developers liable for harmful outputs.

A Passion for Data, a Vision for Law
Professor Julian Nyarko

Other recent projects also showcase his focus on modernizing legal inquiry through empirical methods, including the unveiling of a first-of-its-kind public database of corporate contracts and the launch of the Legal Innovation through Frontier Technology Lab (liftlab), focused on AI for the private legal services sector.

Law and computation aren’t separate domains, Nyarko says. They’re both systems that structure behavior. “What interests me is using data and design to explore how legal systems evolve, and how we might improve them.”

Columbia Law School Professor Eric Talley, JD ’99 (PhD ’00), Nyarko’s postdoctoral advisor, says Nyarko is not just engaged in notable research, “he is the leader of a group of young legal scholars applying artificial intelligence and machine learning to the analysis of complex legal texts, particularly contracts and other instruments of private ordering. I’ve co-authored several articles with Julian, and his commitment to rigorous empirical methodology is both admirable and inspiring.”

A Student Favorite

Nyarko earned his law degree in   Germany, then both an LLM and a PhD through UC Berkeley’s Jurisprudence and Social Policy Program, followed by a postdoctoral fellowship at Columbia. Along the way, he taught himself to code, which he began integrating into his research well before generative AI was part of the common lexicon.

He has also distinguished himself in the classroom. In 2023, based on student feedback, Nyarko received Stanford Law’s Barbara Allen Babcock Award for Excellence in Teaching.

“Teaching is where I get to see students wrestle with ideas, challenge assumptions, and start to think of themselves as participants in the system,” he says. “That they took something from that and chose to recognize me is incredibly gratifying.”

Groundbreaking Dataset of Corporate Contracts

Nyarko recently unveiled the Material Contracts Corpus (MCC), a publicly accessible dataset of more than a million contracts filed by public companies with the U.S. Securities and Exchange Commission between 2000 and 2023. Built with Stanford Law student Peter Adelson, JD/MBA ’25 (BS/MS ’17), the MCC transforms decades of filings into a richly annotated, machine-readable research tool. It makes large-scale empirical analysis of contract language not just possible, but practical—for the first time.

“Contracts are the invisible infrastructure of the economy,” Nyarko says. “But until now, they’ve been remarkably difficult to study in a systematic way. We wanted to open that world up.”

Although technically available through the SEC’s EDGAR system, these agreements are buried in exhibits, not labeled, and formatted in ways that frustrate analysis. The MCC addresses those problems, offering a searchable interface that standardizes agreement types, normalizes party names, and tags metadata for precision. The MCC was designed for legal and technical audiences. And unlike proprietary tools, it’s fully open and free to use.

George Triantis, JSD ’89, Richard E. Lang Professor of Law and Dean of Stanford Law School, sees the MCC as both an academic breakthrough and a model of public-minded scholarship.

“Julian’s work is multifaceted and is shaping efforts to benchmark and build standards for evaluating effectiveness and bias in AI tools,” says Triantis, a leading contracts and commercial law scholar himself. “The MCC contracts database reflects his spirit of collaboration and public interest, by providing a database for rigorous empirical analyses of negotiation and contracting patterns.”

“Contracts are the invisible infrastructure of the economy. But until now, they’ve been remarkably difficult to study in a systematic way. We wanted to open that world up.”

Professor Julian Nyarko

Lifting Up AI

Through the new liftlab, which Nyarko is launching this summer, he is developing a research program focused on evaluating how artificial intelligence is shaping legal work—and how well it’s doing the job.

With Megan Ma as the executive director, the liftlab will provide an independent academic space to develop and assess legal AI tools, construct rigorous benchmarks, and explore how machine learning can enhance—not just automate—legal private practice. “Everyone has a good demo,” Nyarko says. “What we need now is a way to measure what works, what doesn’t, and what it is that we’re optimizing for. In contract analysis, for instance, we currently do not have a shared understanding for what makes a good contract. Without that, it seems difficult to develop effective AI tools for automated contract generation.”

The lab is rooted in a central tension that’s emerged as AI enters the legal mainstream: While many tools promise faster results, few offer transparency about accuracy, reliability, or fairness. And in the race to streamline workflows, little attention has been paid to whether these tools improve the substance of legal work, or just make it faster, Nyarko says. The liftlab will address this gap, both by designing empirical evaluations for existing AI systems and by developing new solutions to enhance the provision of legal services. For example, in one project, he uses AI to identify what type of contract language is subject to frequent litigation. Armed with that information, he hopes that attorneys can avoid common pitfalls during the drafting process by avoiding particularly contentious words or phrases.

Nyarko also sees the lab as a training ground. One goal is to create new simulation-based tools that will allow students and early-career lawyers to test legal strategies, explore alternative arguments, and receive feedback in low-stakes environments.

“This technology has the potential to open up new forms of experiential learning,” he says. “If we develop it carefully, it can help students gain valuable skills with more context than traditional classroom methods sometimes allow. I’m excited to see where things go.” SL