There’s No Free Benchmark: An Institutional View of Legal AI Benchmarking

Abstract

There is significant excitement around the use of AI in law. But little information exists on the performance and associated risks of the domain’s widely marketed tools. Recent work, for instance, has demonstrated the significant potential for “hallucinations” – wherein models make up facts, law, and precedent – leading Chief Justice Roberts to spotlight this risk in his annual report on the judiciary. We argue that there is a need for public benchmarking in law. First, relative to other AI application domains, the legal AI ecosystem lacks legibility – there is little information about the design and performance of many commercial legal AI systems. Legal AI has not benefited from the types of benchmarking that have catalyzed, measured, and informed AI innovation and responsible use in other domains. Second, we articulate the challenges of the institutional design of benchmarking. We illustrate how benchmarks can be captured, watered down, and abused. Careful institutional design around the why, who, what, and how of benchmarking will be critical to navigate difficult tradeoffs of transparency, objectivity, expertise, and resources. Third, addressing legal AI’s illegibility requires matching institutional models to available resources and constraints. Rather than advocating for a single “best” approach to benchmarking, we show how benchmarking strategies depend on available resources.

Details

Author(s):
Publish Date:
January 1, 2026
Publication Title:
Proceedings of the National Academy of Sciences
Format:
Journal Article
Citation(s):
  • Neel Guha, Andy Zhang, Christine Tsang, Christopher Manning, Julian Nyarko & Daniel E. Ho, There’s No Free Benchmark: An Institutional View of Legal AI Benchmarking, Proceedings of the National Academy of Sciences (forthcoming 2026).

Other Publications By