AI, Data Value Augmentation, Latent Data Rights, and Post Hoc Regulation

There is a correlation between the power of AI capabilities and the data on which it is employed. It translates as follows: The greater the power/capability of the AI app, the more likely the latent data value tends to increase and with that the higher probability that stringent laws will emerge and be applied to the use of those applications. This is an augmentation phenomena and the challenges it creates are already made clear in the battle between the ACLU and Clearview AI over the latter’s facial recognition capabilities and the ends to which it is put to use.

Data, publicly available, and cast into the internet ecosystem with little, if any careful thought by its owners, suddenly becomes endowed with a significantly elevated status and import once sufficiently powerful AI is applied to it. This capability signals, and the Clearview AI case highlights it, that we are paving the way into a legal framework that accommodates a recognition that data, even freely/carelessly given data, contains latent rights. Rights that will be subject to post hoc regulation because powerful AI apps, serving as a catalyzing agent, generated value that would (absent AI) have no, or very little value.

Consumer consent (one of the key issues in the ACLU v. Clearview AI litigation) is something that has accumulated a bad vibe, if you will; it has become a joke. Overall, it is mistakenly awarded by courts and regulators with overly high importance when evaluating whether certain company actions can be considered “fair” (think of that in the context of an FTC perspective). Now, consumers have a poor track record (to say the least) of reading and understanding the terms and conditions that companies present to them.¹ This has led courts and regulators to take, generally speaking, a paternalistic view, finding that no valid consent was provided and invalidating the use sought by the company that collected the data.

Absent appropriate caution, courts and regulators asked to deal with powerful AI applications use of publicly available data may be tempted to automatically ascribe a higher value to it and impose post hoc restrictions. The problem with this is that meaningful consumer consent is unlikely going to get easier to obtain. AI applications, on the other hand, are only going to be more and more capable as time goes by. A default finding that latent data rights exist in data that was essentially carelessly dumped out of the car’s window is an ambitious and misguided post hoc regulatory approach.


1. There are many reasons for why this is so and I have proposed methods and structures that use AI-powered computational law applications that can help empower consumers to make better choices. See the Maximizing Representative Efficacy posts, starting with this one.


September 27, 2023: The number of copyright infringement lawsuits launching against OpenAI and other large language model (LLM) developers is related to the augmentation phenomena of AI. LLMs have a voracious appetite for data. But for the most part, the data that feeds chatbots like ChatGPT (based the GPT 3 and 4 LLM) is not the product of a license from the copyright owner. That’s a problem. Whether or not this copying activity is infringement is currently unsettled but if any of the lawsuits make it the entire way through the litigation process to a decision, there may yet be some guidance on that point. But let’s set that aside for a moment. The fact that these lawsuits are being filed against LLM developers serves illustrates the data augmentation phenomena at play: The value of the data is impacted (i.e., augmented) by the power of the technology that is applied on it. Data that was previously deemed unimportant by its owner suddenly becomes important and worth litigating over when it is exposed to AI.

May 23, 2022: Deep learning models can lead to unintended consequences. If this behavior cannot be controlled, appropriate remedies need to be in place to mitigate the harm, through, for example, XAI. A good example of how this comes about is available from a research paper recently published in The Lancet. It reports on a finding that AI models are finding undesirable information in x-ray image learning data sets, specifically the patient’s race. What makes this particularly perplexing is that it is unclear what information the AI is honing on in the image to arrive at this conclusion, in some cases with 90% accuracy. This ties in with the latent data value discussed above, where AI generates information from previously unidentified sources.