LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

Abstract

The advent of large language models (LLMs) and their adoption by the legal community has given
rise to the question: what types of legal reasoning can LLMs perform? To enable greater study
of this question, we present LEGALBENCH: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LEGALBENCH was
built through an interdisciplinary process, in which we collected tasks designed and hand-crafted
by legal professionals. Because these subject matter experts took a leading role in construction,
tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning
skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the
law, we additionally show how popular legal frameworks for describing legal reasoning—which
distinguish between its many forms—correspond to LEGALBENCH tasks, thus giving lawyers and
LLM developers a common vocabulary. This paper describes LEGALBENCH, presents an empirical
evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations
LEGALBENCH enables.

Details

Author(s):

Neel Guha
Julian Nyarko
Daniel E. Ho
Christopher Re

Publish Date:

August 23, 2023

Publication Title:

NeurIPS

Format:

White Paper

Citation(s):

Neel Guha, Julian Nyarko, Daniel E. Ho and Christopher Ré, et al., LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models, NeurIPS (2023).

Related Organization(s):

Stanford Law AI Initiative

Link(s):: Read More

Other Publications By

Julian Nyarko Daniel E. Ho Stanford Law AI Initiative