The Foundry Problem: World Models and the Missing Liability Framework for Self-Supervised Learning

Abstract

AI data-based liability doctrine has converged on two planes of the machine learning pipeline: training data and model output. The phase between them, self-supervised learning (SSL), has yet to receive sustained legal attention. It should. This is where foundation models are made. It is where bias settles into representational geometry, where private data gets compressed into recoverable form, and where data structural defects are cast long before any fine-tuning or deployment decision can correct them, if at all. This post argues that SSL creates a distinct category of risk, Representational Risk, that cannot be remediated by downstream actors and therefore requires a liability framework of its own. The normative foundation I base this on exists in the AI Data Stewardship Framework (AI-DSF), whose controls currently map directly onto the SSL risk landscape across three domains: negligent entrustment of unlabeled data, statutory privacy violations arising from memorization, and strict products liability for algorithmic poisoning. The post extends these risks to causal world models, where SSL errors in physical representation become design defects with potential for physical injury. The post proposes a tiered SSL Safe Harbor grounded in the AI-DSF’s control structure. Model Providers who satisfy documented stewardship obligations benefit from a rebuttable presumption of non-negligence as to Structural Defects. Those who do not, have no defensible position against FTC algorithmic disgorgement or common law negligence claims.

The Causal Middle

AI data-based litigation has generally converged on two planes. Plaintiffs challenge the input. Training data scraped without authorization, as in NYT v. OpenAI. Or they challenge output. Hallucinated facts, defamatory text, infringing images generated by deployed systems.

Between raw data and model output lies a process that has yet to receive close attention. Self-supervised learning (SSL). It is the computational mechanism by which modern foundation models transform internet-scale text and images into weights. It is where bias sediments, where private data is compressed into recoverable form, and where structural defects are cast into a model long before any fine-tuning or deployment decision can correct them. It is like a foundry and what comes out of it is determined by what happens inside it. And right now, no liability framework reaches inside.

I argue that SSL creates a category of risk I call Representational Risk. These are defects that cannot be remediated by downstream actors because they are encoded at the representational level, in the learned geometry of the model itself. A distinct liability framework, targeting what I call the Base Model Provider (the artifact produced by the initial pretraining run, before any subsequent refinement) is required and the normative foundation to support it exists in the AI Data Stewardship principles.

Why SSL Changes the Legal Calculus

Legal scholarship has begun to examine the latent space as a site of doctrinal interest. BJ Ard’s Copyright’s Latent Space: Generative AI and the Limits of Fair Use (110 Cornell L. Rev. 2025) argues that fair use doctrine should account for how generative AI models extract what Ard calls “non-authorial value,” the facts, tropes, and structural patterns that exist independently of any artist’s creative choices. Ard’s analysis is grounded in intellectual property, namely who owns what the latent space contains, and whether training on it constitutes infringement. My post occupies a different position. Where Ard examines ownership of value encoded in the latent space, I examine responsibility for harms cast there.

Traditional supervised machine learning requires labeled data. A human annotator marks images as “cat” or “not cat,” and the model learns to replicate that judgment. The legal implications are relatively tractable. But SSL discards labels. It trains models on self-generated prediction tasks. Masked language modeling teaches a model to predict a missing word from surrounding context. Contrastive learning teaches it to recognize that two augmented views of the same image are more similar than two random images. These objectives require no human curation of meaning. They require only data, enormous quantities of it, scraped from the open web.

This has two legal consequences that supervised learning (without the self) does not generate.

First, SSL models learn implicit structure from the training corpus. The model learns the statistical relationships embedded in how humans actually write, what images they produce alongside what text, what appears near what. The resulting representations encode cultural assumptions, demographic patterns, and factual associations that no annotator ever reviewed or approved.

Second, SSL models memorize. Research by Carlini and colleagues has demonstrated that large language models trained with SSL will, under appropriate prompting, reproduce verbatim text from their training data, including private phone numbers, email addresses, and personal health information. The memorization is to be expected as it is a feature of how SSL achieves generalization.

Third, and most consequentially for liability theory, SSL produces what ML researchers call a “world model.” While a supervised model learns to replicate human judgments within a defined label set, an SSL’s world model learns a functional representation of how the world works. It absorbs semantic relationships, causal associations, factual co-occurrences, and cultural patterns, all derived from data without any human having approved the resulting structure. The world model is essentially an internal map of a collected set that represents reality, built from whatever the training corpus contained, that the model then uses to reason across novel situations it has never encountered.

What happens, or should happen, when the world model is wrong, when it causes harm?

The SSL Risk Taxonomy

A. The Stewardship of Unlabeled Data

The typical SSL pipeline begins with Common Crawl, a freely available scrape of approximately four billion web pages, collected without quality or content filtering beyond technical deduplication. GPT-3 was trained substantially on Common Crawl.

Common Crawl contains everything the web contains. Medical misinformation, demographic stereotypes, extremist content, and factual errors accumulated over years of archiving are all there. So, when an SSL model trains on this corpus, these patterns are structurally encoded into the model’s representation of language. A Base Model trained on unvetted Common Crawl data builds a world model from them. The spatial relationships that govern what the model treats as similar, relevant, or probable are derived directly from the statistical structure of whatever that corpus contained. A world model built from Common Crawl is a map of reality as the unfiltered internet represents and is not something a fine-tuner can correct just by adjusting a few layers of weights.

The applicable legal theory is negligent entrustment and it works as follows. A developer who uses an unvetted, unfiltered corpus for SSL training has entrusted a computational process with data that a reasonable actor would recognize as generating predictable harm. Establishing a duty of care requires foreseeability of downstream use. Even a developer training a Base Model on Common Crawl in 2024 knows the model will be used in medical, legal, and financial contexts and the harm from biased representations in those context should be expected.

On the other hand, a critic will argue that Common Crawl is the only corpus large enough to achieve state-of-the-art SSL performance, making its use an industry standard rather than a negligent choice. The argument is flawed. The negligent act that needs to be focused on is not in using Common Crawl, but in ingesting that content without applying the formalities represented in the AI-DSF’s Foundational Controls. Data Provenance Protections, Data Threat Defenses, and Continuous Data Vulnerability Management controls exist because large unfiltered corpora are the operational reality of SSL development. A developer who uses Common Crawl with documented provenance controls, anomaly detection, and pre-ingestion quality review is, or at least should be, not negligent. A developer who uses it without those controls, knowing what the corpus contains, is.

B. Memorization and the Right to Be Forgotten

The memorization problem sits at the intersection of SSL mechanics and privacy law.

GDPR Article 17 grants data subjects the right to erasure and the CCPA provides a parallel right for California residents. But these laws were drafted with databases in mind, records that can be located, identified, and deleted, not weights.

Whether a latent representation of personal data constitutes “personal data” within the meaning of these statutes is unresolved. The Article 29 Working Party’s guidance suggests that data is “personal” if an individual can be identified from it, directly or indirectly. If Carlini-style extraction attacks can recover verbatim PII from SSL model weights, the argument that the weights contain personal data in the statutory sense is serious. The weights are not merely derived information in the way that an anonymized aggregate is derived information. They are, under specific conditions, a reproducible copy.

Regulators should, therefore, treat recoverable memorization as per se statutory retention. If PII survives in extractable form within model weights, the right to erasure applies. The base model provider has an obligation either to demonstrate that memorized content is not extractable or to retrain without the relevant data.

C. Algorithmic Poisoning and Product Liability

Backdoor attacks on SSL training sets are documented and reproducible. An adversary who can inject a small number of poisoned examples into a training corpus, achievable through contributions to Common Crawl or to widely-used open-source datasets, can install hidden triggers in the resulting model’s latent space. When the trigger pattern appears at inference time, the model behaves in a manner the deployer did not intend and may not be able to readily detect.

The product liability framing here is relatively straightforward. Under the Restatement (Third) of Torts, a product contains a manufacturing defect when it deviates from its intended design in a way that renders it unreasonably dangerous. A backdoored SSL model fits that description. The defect is latent. It is not detectable through standard evaluation. And it can produce serious harm in deployed systems.

But the harder question is who bears liability when the poisoning occurs at the data level, before the developer has assumed possession of the affected training examples. Supply chain product liability provides precedent. A manufacturer who incorporates a defective component bears liability even if the defect originated upstream. The SSL base model provider, having chosen to train on an unvetted corpus without adversarial robustness evaluation, has made a decision that determines whether the defect reaches the downstream product.

D. Causal Hallucinations and the World Model as Design Defect

The three risks above involve language models. The world model problem makes it even more complicated and potentially severe.

SSL is not limited to text. Systems such as JEPA learn world models from video. They infer semantic relationships and physical ones: how objects move, how forces propagate, how materials deform under stress. These are causal world models and they represent, in latent space, the developer’s implicit claim about how physical reality behaves.

When a causal world model is used to train a robotic agent or an autonomous system, the legal stakes shift from bias and privacy to physical injury. A robot trained on an SSL world model that misrepresents the brittleness of glass, the stopping distance of a vehicle, or the load tolerance of a structural component is not operating on a statistical error. It is operating on false physics. That is a design defect under the Restatement (Third) of Torts, section 2(b), which holds a product defective in design when the foreseeable risks of harm could have been reduced by a reasonable alternative design.

The reasonable alternative is a verified world model. A developer who conducts latent space audits on physical representations before licensing a world model for use in robotic or autonomous systems can test whether the model’s causal structure deviates from reality in ways that produce predictable harm. A developer who does not conduct those audits and licenses the model anyway has made a design choice and that is why we have product liability.

This extends Representational Risk beyond the informational domain. A world model is a representation of causality. When causality is wrong and the error is actionable because it causes harm, the foundry metaphor takes on a different weight. What is cast in the foundry is a defective model of physical reality that will govern how machines act in the world.

Leveraging the AI Data Stewardship Framework

The AI Data Stewardship Framework (AI-DSF) provides the architecture for translating these liability theories into actionable obligations. The AI-DSF organizes its controls into three tiers. Basic Controls represent the baseline every organization should have. Foundational Controls apply to organizations with higher risk profiles. Organizational Controls focus on people and processes. Several of these controls map directly onto the SSL risk landscape.

Continuous Data Vulnerability Management is a Basic Control. It requires that pre-training and post-training procedures be “executed, documented, and measured” and that “proven anomaly detection tools are continuously used.” Applied to SSL, this control operationalizes the latent space audit. Probing classifiers can test whether a model’s representations encode demographic associations that no annotator reviewed or approved. Extraction-based evaluation can estimate memorization risk before a checkpoint is released.

To make the latent space audit legally operational, I propose that the AI-DSF require what I call a Representation-to-Risk certification. This would be a documented record, produced before any downstream license is granted, that specifies which categories of representational bias were probed, which PII memorization tests were conducted, and what remediation was applied to identified risks. The certification functions as like a material safety data sheet for the latent space. It gives downstream licensees a verified record of what they are inheriting, and it gives regulators and courts a documented standard against which to measure whether the Base Model provider exercised reasonable care. A provider who cannot produce one has not satisfied the Continuous Data Vulnerability Management obligation.

The AI-DSF further requires that “data deletion and data unlearning methodologies are readily available and implementable.” This is a direct reference to Machine Unlearning, a technical approach to removing specific training examples from a model’s learned behavior without full retraining. For the GDPR and CCPA memorization problem, Machine Unlearning is the AI-DSF’s prescribed remediation mechanism. A Base Model Provider that has not implemented Machine Unlearning capabilities before encountering a right-to-erasure request has failed a Basic Control obligation.

Data Provenance Protections is a Foundational Control that “implements safeguards against data poisoning.” This is the control that maps directly onto the backdoor attack risk. The AI-DSF requires that developers implement guardrails against “unlicensed, unverified, and unintended data sets” and maintain documented data provenance throughout the supply chain. A developer who trains on an unvetted Common Crawl corpus without provenance controls has failed a Foundational Control that the AI-DSF identifies as necessary for organizations with elevated risk profiles. SSL developers, who are the Base Model Providers for the entire industry, are precisely that.

Data Threat Defenses, also a Foundational Control, explicitly references NIST AI 100-2e2025, the Adversarial Machine Learning taxonomy. This control requires that developers identify and mitigate threats from internal and external sources and maintain alignment with adversarial robustness standards.

Data Inventory, a Basic Control, governs how dataset diversity and sufficiency are established and monitored, and requires that data licensing requirements are reviewed and complied with prior to dataset ingestion. This is the control that addresses the Common Crawl problem. A developer who ingests Common Crawl without reviewing its licensing status and content quality against a documented standard has bypassed a Basic Control before the SSL process has even begun.

The downstream relationship is addressed through the AI-DSF’s supply chain standard. All supply chain members are subject to a meet-or-exceed requirement that syncs to the organization’s own policies. A Base Model provider that documents its AI-DSF compliance and makes that documentation available to fine-tuners and deployers enables those downstream actors to verify what they are inheriting. Without that documentation, fine-tuners and deployers are operating blind.

Finally, the AI-DSF explicitly identifies the FTC’s power of algorithmic disgorgement as a consequence of stewardship failure. Algorithmic disgorgement means the destruction of the model. Not a fine. Not an injunction against future conduct. The deletion of the asset, along with every downstream product built on it. A developer who has trained an SSL base model on unlicensed or tainted data and has not followed the AI-DSF’s controls has built its entire model investment on a foundation the FTC can dissolve. The SSL Safe Harbor proposed below is the only mechanism available to a Base Model provider for protecting a billion (+) -dollar research and development investment from a regulatory delete order. A company that has not implemented the AI-DSF’s controls before the FTC begins its inquiry has no safe position from which to argue.

Proposed Liability Model: The SSL Safe Harbor

I propose a tiered liability framework organized around the distinction between Structural Defects and Instructional Defects.

Structural Defects originate in the SSL phase itself. They include biases encoded in the learned representations, memorized private data recoverable from the weights, and backdoor triggers installed through corpus poisoning. They are structural because they are encoded in the world model, the foundry’s core output. A fine-tuner inherits that world model. It can adjust behavior at the margins. It cannot rebuild the map. These defects therefore persist through fine-tuning and cannot be remediated by downstream actors without access to and control over the base model. Liability for Structural Defects falls appropriately on the Base Model provider.

Instructional Defects are introduced through fine-tuning, prompt design, or deployment decisions. A model fine-tuned to generate harmful content, deployed in a context for which its representational properties are unsuitable, or prompted in ways that elicit harmful outputs falls into this category. Liability for Instructional Defects falls appropriately on the Fine-Tuner or Deployer.

The Stewardship Defense is the incentive mechanism that makes this framework work. A Base Model provider who has implemented the AI-DSF’s controls during the SSL phase receives a rebuttable presumption of non-negligence as to Structural Defects. The operative obligations are specific. On the Basic Controls side, the provider must maintain documented Data Inventory practices with pre-ingestion licensing review, Continuous Data Vulnerability Management with anomaly detection and Machine Unlearning capabilities, and a Data Incident Response Plan covering poisoning and memorization events. On the Foundational Controls side, it must implement Data Provenance Protections with documented safeguards against poisoning, Data Threat Defenses aligned with the NIST adversarial machine learning taxonomy, and Audit and Control findings reported to senior management. Finally, at the Organizational level, it must implement a formal Data Stewardship Program with board-level oversight, and conduct Fuzzing Tests and Red Team exercises against its pre-training pipeline.

A provider who has satisfied these controls has done what the AI-DSF requires. The presumption of non-negligence appropriately follows. A plaintiff who can demonstrate that the provider knew of a specific risk, that the relevant control was designed to address that risk, and that the provider failed to implement or maintain that control can overcome the presumption. But the burden shifts. And that shift is precisely the incentive the SSL ecosystem currently lacks.

This structure mirrors the EU AI Act’s conformity assessment mechanism and the NIST AI RMF’s risk tiering, without requiring comprehensive regulatory adoption. Courts can develop the framework through common law without waiting for legislation. The AI-DSF already exists as a documented standard. Its controls are specific and auditable. The doctrinal infrastructure for a negligence per se argument, or at minimum a strong res ipsa inference, is available to courts willing to engage with it.

Closing the Pipeline Gap

Output-only AI regulation will fail. A framework that holds deployers liable for what models say, without addressing what models are, treats the symptom while leaving the pathology unexamined. Every harmful output emerges from a representational substrate that was formed in the SSL phase, before any deployer or fine-tuner made a single decision.

The SSL phase is where the model is made. It is where foundational decisions about data, representation, and structure determine everything that follows. A liability framework that reaches this phase is  more complete and more accurate about where the decisions actually occur and who actually makes them.

The global AI oversight conversation has focused on output monitoring, transparency requirements for deployers, and consumer-facing disclosure. These are insufficient. The AI Data Stewardship Framework provides the tools to extend that conversation upstream, to the moment when raw data becomes latent representation and the structural properties of AI systems are cast.