Borrowing from the Law to Filter Training Data for Foundation Models

(Originally published by the Stanford Human-Centered Artificial Intelligence on August 10, 2022)

Using “Pile of Law,” a dataset of legal materials, Stanford researchers explore filtering private or toxic content from training data for foundation models.

Borrowing from the Law to Filter Training Data for Foundation Models

Foundation models are often trained on what is essentially the entire internet. By learning from such a vast dataset, they can impressively memorize and reproduce information that we want them to learn. For example, they might learn to accurately answer factual questions such as “Who is the president of the United States?” At the same time, however, foundation models can memorize and reproduce information that could be harmful. For example, they might disclose people’s Social Security numbers, credit card information, or criminal records, or answer questions about Muslims by suggesting they are terrorists.

These are problems that the creators of foundation models need to fix, says Peter Henderson, a JD/PhD student at Stanford. “We don’t want models to associate people with either their private content or with harmful characteristics.”

To avoid such consequences, the creators of foundation models sometimes try to filter out private or toxic content before using a dataset to train a model. But trying to remove all – or even most – of the private or toxic content from the entirety of the internet is extremely challenging. One reason: Context matters. Privacy expectations differ across cultures and even across time. And deciding if a phrase is toxic might depend on who is speaking, why they are using a particular phrase, and the expectations of the readers. In sum: It’s a balancing act, and different researchers apply different standards.

“We wondered if there was a more principled way to filter pretraining data,” Henderson says. He and his colleagues, including Mark Krass, also a JD/PhD student, had an idea: Look to the law. There’s a long history of courts setting standards for information disclosure, so why not import those standards into the machine learning environment?

To test their idea, Henderson and his colleagues assembled Pile of Law, a vast dataset of court and administrative opinions, legal code, case books, and other legal documents. They then explored whether Pile of Law could help identify a principled way to filter pretraining data with a particular focus on privacy and toxicity.

Based on the team’s initial experiments, Pile of Law offers some valuable opportunities: First, it can help researchers ensure their training data meets minimum legal standards. And second, it can reveal problems with commonplace filtering standards, such as in the toxicity realm.

Filtering for Privacy

When Henderson and Krass first looked at the datasets currently used to train foundation models, they found none that were explicitly filtered for personally sensitive information. So they decided to identify the standards that courts and governments use to balance privacy and transparency and then test whether the implicit use of those standards in Pile of Law could point them toward a nuanced approach to data filtering.

First the team cataloged the various ways courts have addressed privacy concerns. They found some bright-line rules that model designers might adapt to filter their training data. For example, no U.S. jurisdictions reveal minors’ names, Social Security numbers, financial account numbers, or dates of birth. But they also found approaches that were more contextual. For example, U.S. courts typically disclose people’s criminal records or litigants’ names in civil cases, but there are exceptions. In sexual assault cases, for example, the victims’ names are often pseudonymized. Similarly, administrative law judges use their discretion to protect the names of people who come before them in contexts such as applying for disability benefits or for political asylum.

The existence of these contextual standards means that certain subsets of Pile of Law are already implicitly filtered to protect certain people’s privacy. In the immigration context, for example, people seeking asylum who allege that they were tortured in their own countries are likely to have been given pseudonyms in the public record. Henderson and his team decided to test whether a model could learn these contextualized standards by using Pile of Law as the training data. The result: a model that predicts with 80% accuracy whether a paragraph in an immigration case should use a pseudonym or not. And they showed that these predictions were aligned with the law: Sentences referencing asylum and torture were more likely to trigger pseudonymity than sentences referring to criminal offenses.

These and several other experiments suggest that Pile of Law can help researchers develop context-appropriate privacy filters, Henderson says. Next, the team would like to expand these efforts beyond the legal domain: Might a model learn to pseudonymize the names of asylum seekers in a dataset that includes the entire internet?

Filtering for Toxicity

In the toxicity arena, Henderson and Krass found a different landscape. Existing filters are widely used and go well beyond what would be suggested by court standards. Indeed, applying current toxicity filters to Pile of Law could filter out important portions of some key legal precedents from the civil rights era, including Brown v. Board of Education, an important case that led to the desegregation of schools in the United States. In addition, the team found that existing filters may remove toxic content from shorter spans of text while leaving it in place if it appears in longer written work – an unexplained outcome that is potentially problematic.

“The lesson is to think more carefully before you take a filter off the shelf to filter data before training,” Henderson says. “We’re therefore calling for more research to properly address toxicity in the training data.”

Next: Legal Reasoning

While Henderson and Krass hope Pile of Law will help make data filtering less ad hoc than it is today, they also have a second goal: using Pile of Law to build foundation models that are capable of legal reasoning. The team has already shown that foundation models do a lousy job of understanding how to apply the law to a set of facts. But Henderson hopes that AI systems will one day improve attorneys’ efficiency and thoroughness by, for example, checking their citations and identifying all of the relevant arguments in a case. The goal, he says: to improve access to justice for people who can’t afford to pay for a lawyer.

“It’s a tough challenge, but why not aim for a hard problem to solve?” he says. “And one that can actually help people.”

Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.