The Foundry Problem: World Models and the Missing Liability Framework for Self-Supervised Learning
Abstract
AI liability doctrine has converged on two phases of the machine learning pipeline: training data and model output. The phase between them, self-supervised learning (SSL), has received no sustained legal attention. This is where foundation models are made. It is where bias sediments into representational geometry, where private data is compressed into recoverable form, and where structural defects are cast long before any fine-tuning or deployment decision can correct them. This post argues that SSL creates a distinct category of risk, Representational Risk, that cannot be remediated by downstream actors and therefore requires a liability framework of its own. The normative foundation for that framework exists in the AI Data Stewardship Framework (AI-DSF), whose controls map directly onto the SSL risk landscape across three domains: negligent entrustment of unlabeled data, statutory privacy violations arising from memorization, and strict products liability for algorithmic poisoning. The post extends these risks to causal world models, where SSL errors in physical representation become design defects with potential for physical injury. The post proposes a tiered SSL Safe Harbor grounded in the AI-DSF’s control structure. Base Model Providers who satisfy documented stewardship obligations receive a rebuttable presumption of non-negligence as to Structural Defects. Those who do not have no defensible position against FTC algorithmic disgorgement or common law negligence claims. Output-only regulation cannot reach these harms. Upstream liability can.
The Causal Middle
AI litigation has converged on two targets. Plaintiffs challenge inputs. Training data scraped without authorization, as in the NYT v. OpenAI litigation. Or they challenge outputs. Hallucinated facts, defamatory text, infringing images generated by deployed systems. These are not wrong targets. But they miss the most consequential phase of the AI lifecycle.
Between raw data and model output lies a process most legal scholars have not examined. Self-supervised learning (SSL) is the computational mechanism by which modern foundation models transform internet-scale text and images into mathematical weights. It is where bias sediments, where private data is compressed into recoverable form, and where structural defects are cast into a model long before any fine-tuning or deployment decision can correct them. It is the foundry. What comes out of it is determined by what happens inside it. And right now, no liability framework reaches inside.
I argue that SSL creates a category of risk I call Representational Risk. These are defects that cannot be remediated by downstream actors because they are encoded at the representational level, in the learned geometry of the model itself. A distinct liability framework, targeting what I call the Base Model Provider, is required. The normative foundation for that framework already exists. The AI Data Stewardship principles developed at CodeX provide the appropriate baseline.
Why SSL Changes the Legal Calculus
Legal scholarship has begun to examine the latent space as a site of doctrinal interest. BJ Ard’s Copyright’s Latent Space: Generative AI and the Limits of Fair Use (110 Cornell L. Rev. 2025) argues that fair use doctrine should account for how generative AI models extract what Ard calls “non-authorial value,” the facts, tropes, and structural patterns that exist independently of any artist’s creative choices. The analysis is grounded in intellectual property, namely who owns what the latent space contains, and whether training on it constitutes infringement. This post occupies a different position. Where Ard examines ownership of value encoded in the latent space, I examine responsibility for harms cast there. The latent space, on this account, is not primarily a repository of extractable information. It is a foundry. The question is not who owns what it holds. It is who bears liability for the defects it produces.
Traditional supervised machine learning requires labeled data. A human annotator marks images as “cat” or “not cat,” and the model learns to replicate that judgment. The legal implications are relatively tractable. Annotation decisions are human choices that can be audited, and model behavior is constrained by the label set.
SSL discards labels. It trains models on self-generated prediction tasks. Masked language modeling teaches a model to predict a missing word from surrounding context. Contrastive learning teaches it to recognize that two augmented views of the same image are more similar than two random images. These objectives require no human curation of meaning. They require only data, enormous quantities of it, scraped from the open web.
This shift has two legal consequences that supervised learning does not generate.
First, SSL learns implicit structure from the training corpus. The model does not learn what humans have decided to call things. It learns the statistical relationships embedded in how humans actually write, what images they produce alongside what text, what appears near what. The resulting representations encode cultural assumptions, demographic patterns, and factual associations that no annotator ever reviewed or approved.
Second, SSL models memorize. Research by Carlini and colleagues has demonstrated that large language models trained with SSL will, under appropriate prompting, reproduce verbatim text from their training data, including private phone numbers, email addresses, and personal health information. The memorization is not incidental to an otherwise clean process. It is a feature of how SSL achieves generalization. The model must retain sufficient specificity about training examples to successfully predict their masked elements.
Third, and most consequentially for liability theory, SSL produces what ML researchers call a world model. A supervised model learns to replicate human judgments within a defined label set. An SSL model learns a functional representation of how the world works. It absorbs semantic relationships, causal associations, factual co-occurrences, and cultural patterns, all derived from data without any human having approved the resulting structure. The world model is not a lookup table. It is an internal map of reality, built from whatever the training corpus contained, that the model uses to reason across novel situations it has never encountered.
This distinction matters legally because it determines what fine-tuning can and cannot fix. A fine-tuner who adjusts a corrupted world model is not correcting the map. It is changing where the navigation starts. The underlying representation of the territory remains wrong. Biases, factual errors, and poisoned associations encoded at the world model level persist through fine-tuning in ways that annotation-level errors in supervised systems do not.
The legal system has no doctrine that maps cleanly onto any of these three consequences. That gap is the problem this post addresses.
The SSL Risk Taxonomy
A. The Stewardship of Unlabeled Data
The typical SSL pipeline begins with Common Crawl, a freely available scrape of approximately four billion web pages, collected without quality or content filtering beyond technical deduplication. GPT-3 was trained substantially on Common Crawl. So were BERT and most of their successors.
Common Crawl contains everything the web contains. Medical misinformation, demographic stereotypes, extremist content, and factual errors accumulated over years of archiving are all present. When an SSL model trains on this corpus, these patterns are not incidentally absorbed. They are structurally encoded into the model’s representation of language. A base model trained on unvetted Common Crawl data does not merely reflect the web’s biases. It builds a world model from them. The spatial relationships that govern what the model treats as similar, relevant, or probable are derived directly from the statistical structure of whatever the corpus contained. A world model built from Common Crawl is a map of reality as the unfiltered internet represents it. That is not a starting point a fine-tuner can correct by adjusting a few layers of weights.
The applicable legal theory is negligent entrustment. A developer who uses an unvetted, unfiltered corpus for SSL training has entrusted a computational process with data that a reasonable actor would recognize as generating predictable harm. Establishing a duty of care requires foreseeability of downstream use. A developer training a base model on Common Crawl in 2024 knows the model will be used in medical, legal, and financial contexts. The harm from biased representations in those contexts is not speculative.
A skeptic will argue that Common Crawl is the only corpus large enough to achieve state-of-the-art SSL performance, making its use an industry standard rather than a negligent choice. The argument has surface appeal. But it misidentifies where the negligence lies. The negligence is not in using Common Crawl. It is in ingesting it without applying the AI-DSF’s Foundational Controls. Data Provenance Protections, Data Threat Defenses, and Continuous Data Vulnerability Management exist precisely because large unfiltered corpora are the operational reality of SSL development. A developer who uses Common Crawl with documented provenance controls, anomaly detection, and pre-ingestion quality review is not negligent. A developer who uses it without those controls, knowing what the corpus contains, is. The distinction is between the data source and the discipline applied to it.
B. Memorization and the Right to Be Forgotten
The memorization problem sits at the intersection of SSL mechanics and privacy law.
GDPR Article 17 grants data subjects the right to erasure. CCPA provides a parallel right for California residents. Neither statute was drafted with neural network weights in mind. Both were drafted with databases in mind, records that can be located, identified, and deleted.
Whether a latent representation of personal data constitutes “personal data” within the meaning of these statutes is unresolved. The Article 29 Working Party’s guidance suggests that data is “personal” if an individual can be identified from it, directly or indirectly. If Carlini-style extraction attacks can recover verbatim PII from SSL model weights, the argument that the weights contain personal data in the statutory sense is serious. The weights are not merely derived information in the way that an anonymized aggregate is derived information. They are, under specific conditions, a reproducible copy.
Regulators should treat recoverable memorization as per se statutory retention. If PII survives in extractable form within model weights, the right to erasure applies. The base model provider has an obligation either to demonstrate that memorized content is not extractable or to retrain without the relevant data. Neither obligation is cost-free. That is the point.
C. Algorithmic Poisoning and Product Liability
Backdoor attacks on SSL training sets are documented and reproducible. An adversary who can inject a small number of poisoned examples into a training corpus, achievable through contributions to Common Crawl or to widely-used open-source datasets, can install hidden triggers in the resulting model’s latent space. When the trigger pattern appears at inference time, the model behaves in a manner the deployer did not intend and cannot readily detect.
The product liability framing is relatively straightforward. Under the Restatement (Third) of Torts, a product contains a manufacturing defect when it deviates from its intended design in a way that renders it unreasonably dangerous. A backdoored SSL model deviates from its intended design. The defect is latent. It is not detectable through standard evaluation. And it can produce serious harm in deployed systems.
The harder question is who bears liability when the poisoning occurs at the data level, before the developer has assumed possession of the affected training examples. Supply chain product liability provides precedent. A manufacturer who incorporates a defective component bears liability even if the defect originated upstream. The SSL base model provider, having chosen to train on an unvetted corpus without adversarial robustness evaluation, has made a decision that determines whether the defect reaches the downstream product.
D. Causal Hallucinations and the World Model as Design Defect
The three risks above involve language models. The world model problem extends further, and its physical consequences are more severe.
SSL is no longer limited to text. Systems such as Sora and JEPA learn world models from video. They infer not just semantic relationships but physical ones: how objects move, how forces propagate, how materials deform under stress. These are causal world models. They represent, in latent space, the developer’s implicit claim about how physical reality behaves.
When a causal world model is used to train a robotic agent or an autonomous system, the legal stakes shift from bias and privacy to physical injury. A robot trained on an SSL world model that misrepresents the brittleness of glass, the stopping distance of a vehicle, or the load tolerance of a structural component is not operating on a statistical error. It is operating on a false physics. That is a design defect under the Restatement (Third) of Torts, section 2(b), which holds a product defective in design when the foreseeable risks of harm could have been reduced by a reasonable alternative design.
The reasonable alternative is a verified world model. A developer who conducts Latent Space Audits on physical representations before licensing a world model for use in robotic or autonomous systems can test whether the model’s causal structure deviates from reality in ways that produce predictable harm. A developer who does not conduct those audits and licenses the model anyway has made a design choice. Product liability reaches that choice.
This extends Representational Risk beyond the informational domain. A world model is not merely a representation of language. It is a representation of causality. When causality is wrong and the error is actionable, the foundry metaphor takes on a different weight. What is cast in the foundry is not just a biased language map. It is, in some systems, a defective model of physical reality that will govern how machines act in the world.
Leveraging the AI Data Stewardship Framework
The AI Data Stewardship Framework (AI-DSF) provides the normative architecture for translating these liability theories into actionable obligations. The AI-DSF organizes its controls into three tiers. Basic Controls represent the baseline every organization should have. Foundational Controls apply to organizations with higher risk profiles. Organizational Controls focus on people and processes. Several of these controls map directly onto the SSL risk landscape.
Continuous Data Vulnerability Management is a Basic Control requiring that pre-training and post-training procedures be “executed, documented, and measured” and that “proven anomaly detection tools are continuously used.” Applied to SSL, this control operationalizes the Latent Space Audit. Probing classifiers can test whether a model’s representations encode demographic associations that no annotator reviewed or approved. Extraction-based evaluation can estimate memorization risk before a checkpoint is released. These are not speculative obligations. They describe procedures that the AI-DSF already requires and that SSL developers can implement today.
To make the Latent Space Audit legally operational, I propose that the AI-DSF require what I call a Representation-to-Risk certification: a documented record, produced before any downstream license is granted, that specifies which categories of representational bias were probed, which PII memorization tests were conducted, and what remediation was applied to identified risks. The certification functions as a Safety Data Sheet for the latent space. It gives downstream licensees a verified record of what they are inheriting, and it gives regulators and courts a documented standard against which to measure whether the Base Model Provider exercised reasonable care. A provider who cannot produce one has not satisfied the Continuous Data Vulnerability Management obligation.
The AI-DSF further requires that “data deletion and data unlearning methodologies are readily available and implementable.” This is a direct reference to Machine Unlearning, a technical approach to removing specific training examples from a model’s learned behavior without full retraining. For the GDPR and CCPA memorization problem, Machine Unlearning is the AI-DSF’s prescribed remediation mechanism. A Base Model Provider that has not implemented Machine Unlearning capabilities before encountering a right-to-erasure request has failed a Basic Control obligation.
Data Provenance Protections is a Foundational Control that “implements safeguards against data poisoning.” This is the control that maps directly onto the backdoor attack risk. The AI-DSF requires that developers implement guardrails against “unlicensed, unverified, and unintended data sets” and maintain documented data provenance throughout the supply chain. A developer who trains on an unvetted Common Crawl corpus without provenance controls has not merely made a poor engineering choice. It has failed a Foundational Control that the AI-DSF identifies as necessary for organizations with elevated risk profiles. SSL developers, who are the Base Model Providers for the entire industry, are precisely that.
Data Threat Defenses, also a Foundational Control, explicitly references NIST AI 100-2e2025, the Adversarial Machine Learning taxonomy. This control requires that developers identify and mitigate threats from internal and external sources and maintain alignment with adversarial robustness standards. The SSL poisoning scenario is not an edge case the AI-DSF failed to anticipate. It is a named threat category within the framework’s own standard reference.
Data Inventory, a Basic Control, governs how dataset diversity and sufficiency are established and monitored, and requires that data licensing requirements are reviewed and complied with prior to dataset ingestion. This is the control that addresses the Common Crawl problem at its source. A developer who ingests Common Crawl without reviewing its licensing status and content quality against a documented standard has bypassed a Basic Control before the SSL process has even begun.
The downstream relationship is addressed through the AI-DSF’s supply chain standard. All supply chain members are subject to a meet-or-exceed requirement that corresponds to the organization’s own policies. A Base Model Provider that documents its AI-DSF compliance and makes that documentation available to fine-tuners and deployers enables those downstream actors to verify what they are inheriting. Without that documentation, fine-tuners and deployers are operating blind. The AI-DSF treats this information asymmetry as a control failure, not merely a commercial inconvenience.
Finally, the AI-DSF explicitly identifies the FTC’s power of algorithmic disgorgement as a consequence of stewardship failure. Algorithmic disgorgement means the destruction of the model. Not a fine. Not an injunction against future conduct. The deletion of the asset itself, along with every downstream product built on it. A developer who has trained an SSL base model on unlicensed or tainted data and has not followed the AI-DSF’s controls has built its entire model investment on a foundation the FTC can legally dissolve. The SSL Safe Harbor proposed below is not merely a litigation defense. It is the only mechanism available to a Base Model Provider for protecting a billion-dollar research and development investment from a regulatory delete order. A company that has not implemented the AI-DSF’s controls before the FTC begins its inquiry has no safe position from which to argue.
Proposed Liability Model: The SSL Safe Harbor
I propose a tiered liability framework organized around the distinction between Structural Defects and Instructional Defects.
Structural Defects originate in the SSL phase itself. They include biases encoded in the learned representations, memorized private data recoverable from the weights, and backdoor triggers installed through corpus poisoning. They are structural because they are encoded in the world model, the foundry’s core output. A fine-tuner inherits that world model. It can adjust behavior at the margins. It cannot rebuild the map. These defects therefore persist through fine-tuning and cannot be remediated by downstream actors without access to and control over the base model. Liability for Structural Defects falls appropriately on the Base Model Provider.
Instructional Defects are introduced through fine-tuning, prompt design, or deployment decisions. A model fine-tuned to generate harmful content, deployed in a context for which its representational properties are unsuitable, or prompted in ways that elicit harmful outputs falls into this category. Liability for Instructional Defects falls appropriately on the Fine-Tuner or Deployer.
The Stewardship Defense is the incentive mechanism that makes this framework function. A Base Model Provider who has implemented the AI-DSF’s controls during the SSL phase receives a rebuttable presumption of non-negligence as to Structural Defects. The operative obligations are specific. On the Basic Controls side, the provider must maintain documented Data Inventory practices with pre-ingestion licensing review, Continuous Data Vulnerability Management with anomaly detection and Machine Unlearning capabilities, and a Data Incident Response Plan covering poisoning and memorization events. On the Foundational Controls side, it must implement Data Provenance Protections with documented safeguards against poisoning, Data Threat Defenses aligned with the NIST adversarial machine learning taxonomy, and Audit and Control findings reported to senior management. Finally, at the Organizational level, it must implement a formal Data Stewardship Program with board-level oversight, and conduct Fuzzing Tests and Red Team exercises against its pre-training pipeline.
A provider who has satisfied these controls has done what the AI-DSF requires. The presumption of non-negligence follows. A plaintiff who can demonstrate that the provider knew of a specific risk, that the relevant control was designed to address that risk, and that the provider failed to implement or maintain that control can overcome the presumption. But the burden shifts. And that shift is precisely the incentive the SSL ecosystem currently lacks.
This structure mirrors the EU AI Act’s conformity assessment mechanism and the NIST AI RMF’s risk tiering, without requiring comprehensive regulatory adoption. Courts can develop the framework through common law without waiting for legislation. The AI-DSF already exists as a documented standard. Its controls are specific and auditable. The doctrinal infrastructure for a negligence per se argument, or at minimum a strong res ipsa inference, is available to courts willing to engage with it.
Closing the Pipeline Gap
Output-only AI regulation will fail, and it will fail predictably. A framework that holds deployers liable for what models say, without addressing what models are, treats the symptom while leaving the pathology unexamined. Every harmful output emerges from a representational substrate that was formed in the SSL phase, before any deployer or fine-tuner made a single decision. Holding only the deployer liable is like holding the driver of a car with defective brakes liable while exempting the manufacturer who built the braking system.
The SSL phase is where the model is made. It is where foundational decisions about data, representation, and structure determine everything that follows. A liability framework that reaches this phase is not merely more complete. It is more accurate about where the decisions actually occur and who actually makes them.
The global AI oversight conversation has focused on output monitoring, transparency requirements for deployers, and consumer-facing disclosure. These are not wrong. They are insufficient. The AI Data Stewardship Framework provides the tools to extend that conversation upstream, to the moment when raw data becomes latent representation and the structural properties of AI systems are cast.
The foundry cannot be exempt from inspection simply because the casting happens before anyone is watching.