Exploring AI in Healthcare: Legal, Regulatory, and Safety Challenges

In this episode, Pam and Rich are joined by health policy expert Michelle Mello and Neel Guha, a Stanford JD/PhD candidate in computer science, for a discussion on the transformative role of AI in healthcare.

Exploring AI in Healthcare: Legal, Regulatory, and Safety Challenges

Artificial Intelligence holds the potential to transform much of our lives and healthcare professions are embracing it for everything from cost savings to diagnostics. But who is to blame when AI assisted healthcare goes wrong? How is the law developing to balance the benefits and risks? In this episode, Pam and Rich are joined by health policy expert Michelle Mello and Neel Guha, a Stanford JD/PhD candidate in computer science, for a discussion on the transformative role of AI in healthcare. They examine AI’s potential to enhance diagnostics and streamline workflows while addressing the ethical, legal, and safety challenges this new technology can bring. The conversation highlights the urgency of adapting regulatory frameworks, the complexities of liability among hospitals, developers, and practitioners, and the need for rigorous testing to ensure patient safety as AI integration in healthcare advances.

This episode originally aired on November 21, 2024.

Transcript

Michelle Mello: All of us have probably experienced the maddening process of trying to get prior authorization for some medical service that your doctor wants to give you, but your insurance company has to preapprove. Physicians HATE this process. It takes a huge amount of time. There’s a very high rejection rate, but there’s like a 90 percent overturn rate on the rejection, so it is a huge inefficiency in the healthcare system.

And to make a long story short, where we’ve gotten to now is there is a “battle of the bots,” where hospitals now are using AI to generate and submit those requests. The insurance company has its own algorithm to review—and deny— most of those requests, and then the humans get involved and duke it out over the ones where they disagree.

Pam Karlan: This is Stanford Legal, where we look at the cases, questions, conflicts, and legal stories that affect us all every day. I’m Pam Karlan along with Rich Ford. Please subscribe or follow this feed on your favorite podcast app. That way you’ll have access to all our new episodes as soon as they’re available. Today, we are lucky to be joined by two of our colleagues here at Stanford: Michelle Mello, who is a professor both here in the law school and a professor of health policy over at the medical school, and Neel Guha, who’s doing both a JD degree and a PhD in computer science.

For those of you who weren’t at Stanford in recent years, one of the things the law school did was it switched its calendar, so we’re now on the same calendar as the rest of the university. And part of the reason that we did that was to enable us to have students studying both at the law school and in other departments more seamlessly and in no area, I think, of law, is that more important today than in the relationship between the law school and the computer science department.

We’re going to be talking today about AI and medicine, and of course, as soon as you have AI and medicine, you’re going to have law as well. So thanks so much both to Michelle and to Neel for joining us.

Rich Ford: So when people hear AI, they think of things like the matrix or, some … but maybe you could tell us a little bit more about exactly the kind of area we’re talking about, large language models, what they do and how it works.

Neel Guha: Yeah, sure. Today, a lot of when you when we talk about AI colloquially, we’re talking about a class of systems called generative AI systems. And these are systems which produce content. That content could take many forms. It could be audio content like the sound of someone’s voice. It could be text like you might expect a chat bot to produce. It could even be images or video or complex chemical structures, as AI might be used in various biological design applications. These sort of generative AI systems are trained on massive amounts of data. And that’s how they learn the sort of statistical patterns necessary to produce content.

We’ve seen an explosion in these systems, both in terms of practical applications, in terms of the development, the amount of investment they’re drawing, really over the last three to four years. Many of you probably know these systems by name. They’re called chat GPT is one example. There are others.

And so while this, these class of systems, I think are really exciting and really powerful, we’ve observed their capacity to do what we might describe as complex, sophisticated reasoning in a variety of applications. They’re also built on a similar set of underlying technical principles, or technical components, as the more classical systems which you might think of as doing prediction tasks, classifying whether a patient is healthy or sick, classifying the sentiment of a review, the types of AI systems that might control self-driving cars and others.

Rich Ford: And so how is AI being used in medicine? I read in one of your papers that AI was embedded in medical devices, which made me feel a little nervous. But tell me why I shouldn’t be nervous, but also in general, how is AI used in medicine?

Michelle Mello: Like many other industries, it has become pervasive. I’ll talk about just a few types of uses, but there are many others. So, in devices the most common use is to embed them in radiological imaging equipment. About, …if you get a mammogram in the United States today, about 90 percent chance that mammogram read will be AI assisted, and that is a good thing. AI plus radiologists outperforms just radiologists in every study.

And so, it’s embedded in the device. There are surgical devices as well that help surgeons visualize, for example, anatomy, it can predict parts of the anatomy that can’t be visualized with the eye to help guide surgeons. And there’s even AI-assisted surgical robots that are starting to come in use with human supervision.

I would say more common uses include the kind of generative AI that Neel was talking about to try to reduce the drudgery of being a physician. Physicians in the United States spend about two hours of what we call “pajama time” for every one hour of patient care. Pajama time is the time at the end of the day where you are writing up notes for patient visits, you’re responding to patient emails, you’re entering your billing codes in the medical record, and it is contributing to a burnout rate among … in the medical profession that now tops 40%.

So there’s huge potential for AI to shortcut those tasks by generating drafts. that the physician can review and then accept. Another form of AI t use in medicine is not generative AI, it’s predictive AI. It’s leveling up the predictors that physicians have long used to classify patients, whether it’s patients who are likely to do well after surgery versus not likely to do well, or patients who probably have a disease, or don’t probably have the disease. They’ve used simpler algorithms for a long time that are based on rules and a limited number of variables. But now they’re using massive data sets of millions of data points to divine associations between things that were never in the predictors before, and the outcomes of interest. And then finally, an interesting thing that I think most people haven’t heard about is the use of AI in health insurance.

I think all of us have probably experienced the maddening process of trying to get prior authorization for some medical service that your doctor wants to give you, but your insurance company has to preapprove. Physicians HATE this process. It takes a huge amount of time. There’s a very high rejection rate, but there’s like a 90 percent overturn rate on the rejection, so it is a huge inefficiency in the healthcare system. And to make a long story short, where we’ve gotten to now is there is a “battle of the bots” where hospitals now are using AI to generate and submit those requests. The insurance company has its own algorithm to review—and deny—most of those requests, and then the humans get involved and duke it out over the ones where they disagree.

So, these are just a few of the types of uses we’re seeing.

Rich Ford: So it’s being used …all that sounds pretty good. It’s going to save physicians time doing boring paperwork. It’s going to make it easier for them to see things that they couldn’t see without AI. It’s going to help them predict patient outcomes. Is there a downside? And if so, what is it?

Michelle Mello: The downside is that the reason we’re doing a lot of this in medicine is that physicians and nurses are overworked and too busy. Yet the whole schema of these models relies on the assumption that humans are going to remain in the loop, so to speak, in a robust way, meaning that when the AI drafts of an email to you, the patient, in response to your medical question, the physician’s going to review it carefully and edit it before it sends it out. But what we are finding empirically is that you cannot have both. You can’t both have a time-saving piece of generative AI and a piece of AI where the human actually spends time reviewing it. So the danger is what we call automation bias. That over time, people come either to trust the thing, or just to not care enough about whether it’s right. Because it’s right most of the time. And without that human vigilance, or another system in place to catch the errors of the first system, which I’m sure Neel can talk about—that’s a new kind of area of AI development—without that, the fear is that there are going to be errors that cause harm.

Rich Ford: If something goes wrong so you mentioned errors, and we like to think that computers are infallible, but in fact, you’re suggesting that they sometimes make mistakes. Then there’s the potential for legal liability. Maybe you could tell us a little bit about the areas in which we’ve already seen this as a problem and what that means for the future of AI and all the good things that you told us about.

Neel Guha: Yeah, so one of the projects we did was trying to explore and understand in the deployments of AI we’ve seen already, where AI contributes to patient harm, and this results in some sort of litigation. Can we get a better understanding of how courts are thinking about the liability questions that emerge? Who is liable, on what basis or standards they’re finding liability? And so we looked through a bunch of cases to try and understand and find instances where there was some sort of AI system or complex software system at play that produced harm in either a healthcare or non-healthcare setting.

And we noticed a few trends in terms of who the patterns of plaintiffs and defendants are and what are the types of claims that are emerging. The first is that we often see legal claims when hospitals are using AI systems to manage patient care or manage operations, and there are mistakes or errors that result in harm to patients. And in these cases, patients typically bring negligence claims against the hospital for these harms. The second setting is when physicians are relying on erroneous AI predictions, maybe when AI is being used to recommend a particular treatment strategy or provide a diagnosis. And in these cases, plaintiffs will bring medical malpractice claims against the physician, alleging that they should have caught the error in the AI system or independently reach the correct conclusion. They’ll also bring product liability claims against the developer of this software AI system. And the final category of claims are brought when AI is embedded in some sort of device that’s either implanted in a patient or used in the process of a surgery or some other operation, and this produces an error.

And in this case, plaintiffs will bring negligence claims or medical malpractice claims against physicians and hospitals alleging they were negligent in maintaining the device and installing the device and neglecting to update the device. And they’ll also bring claims against the developers of these devices themselves, alleging they should have designed it better or they should have been more clear about the potential risks.

Pam Karlan: So this may be too much of a … “I teach civil procedure, but I don’t understand medicine question,” but is what happens in some of these cases? The patient sues the hospital or the doctor impleads the software people to say, it’s not my fault, it’s their fault, and then the patient files a amends the complaint to add them in. How do the patients know that what happened to them was the product, not just of their doctor allegedly screwing up? But how does the lawyer figure this out? Is it in the discovery process? You find that.. Just how does this work?

Michelle Mello: The standard problem in medical malpractice is the patient doesn’t know. It’s unusual for a hospital to be candid about why something bad happened. And so it’s it …we’re working on that. That’s an area that I’ve spent a lot of time trying to exhort them to change their ways, but most of the time, you have to file a lawsuit to initiate discovery, and that’s when things become clearer, and until that time you throw in everybody that might possibly have been implicated, and then you can amend your complaint later when you learn that AI was used in your care and you didn’t know about it, maybe you’d throw in an informed consent claim for good measure since they didn’t tell you about that.

Rich Ford: So, it raises a lot of fascinating questions. One, we have multiple potential defendants. Another question is, so we mentioned that AI produces errors, but it also produces all of these good outcomes and additional information. And I’m just wondering — when the plaintiffs claimed that the hospital was negligent, whether the negligence standard is measured against what it would have been like in the absence of AI or whether it’s measured against something else?  I’m thinking about … I live in San Francisco and there are a lot of self-driving cars and the self-driving cars used to be really slow and now they drive like everybody else. And they’re probably still better than the average human motorist. So, if you’re better than the human would have been, is … are you off the hook?

Michelle Mello: You would think that would be the standard, but I don’t think we know, first of all, it’s hard to get that level of visibility into what’s going on in these cases. But that’s not generally how product liability works. If you use a product and it causes harm, you can’t argue it was better than if I tried to …it was better that I cut it with this saw than if I tried to gnaw it with my teeth, right? That’s not the way it works. They’re looking for whether you can point to something about the product that could have been better. There’s another design that would have been better. And Neel can talk about how hard that is to do. But that’s what it is.

And then when it’s an individual who’s being sued, like the physician, the question courts are generally focused on is, should the physician have known better than to rely on this erroneous output? And so again, the counterfactual is not what would it look like if the physician had been acting alone, but conditional on the interaction with the machine, did they behave reasonably?

Neel Guha: I think like an issue we have zooming out of healthcare and considering other deployments of AI, we so often talk about the setting or, we have a human babysitting an AI system, and the hope is that the human is able to provide some sort of cover for the AI system catching its errors. And a challenge is that the ways in which humans make mistakes and the ways in which AI systems make mistakes are radically different. Humans can get tired, they can get distracted, they can eat a bad breakfast one morning. None of those are issues with AI systems. But when AI systems make mistakes, they tend to make mistakes in a very systematic fashion.

So, in the best case, a human and AI system are able to counteract. But in the worst case, they might actually feed off each other’s worse impulses. A human who’s quite good at catching AI mistakes, combined with an AI system, might perform substantially better than a human who is actually correcting the AI system when it’s correct, and deferring to the AI system when it’s wrong.

Pam Karlan: And …one of the things, I don’t know if there’s a version of this that’s true in medicine, but I know there’s a version of this that’s true in anti discrimination lies because of the way machine learning works, at some point, you lose the ability to get behind it and figure out what was going on. And so, for example, in cases where you have to show that there was a discriminatory purpose, at some point in the repeated iterations that go into the machine making its decision about who to target with these ads or what to use as the measure of whether this person is likely to commit a crime if they’re let out on bail, you can no longer figure out what the machine was metaphorically looking at. Is there a problem that’s similar to that in the medical field?

Neel Guha: Yeah, absolutely. I think one of the interesting things is that if you look at the AI systems from 30 to 40 years ago, they look a lot more like what we might typically think of as a programmer writing code, a set of logical steps of: if you see this, then make this decision, and if you see that, then make that decision. And modern AI systems are based off of machine learning, which is built in a radically different way. Rather than telling programs how to do a task, we just give a bunch of examples of “here is the input and here is the output,” and we let them figure it out through a complex process of trial and error that is so incredibly mathematically sophisticated and complex that we really struggle to explain it. And so, the best we can do is think about it in terms of inputs and outputs. If I give these types of inputs, what types of outputs do I observe and am I happy or unhappy with that?

Rich Ford: So, in your study what’s the … can you give us a sense of the distribution of types of claims that are being raised? Some might involve the physician not checking the AI, and some might involve defectiveness with the AI itself. What do these cases look like?

Neel Guha: So it’s … I think it’s a couple different cases and it blends some of the ideas in that there’s … so many different points of failure that you can always bring claims against this entire constellation of actors. Hospitals should have been better about selecting the AI system, keeping it up to date. We found cases where hospitals were on the hook for neglecting to hit the software update button. Physicians can be on the hook or liable because they don’t recognize that maybe a patient isn’t the right fit for a particular software system, or doesn’t present the characteristics, we think would make the software system good for them. They may also neglect to realize that the AI system is recommending something that contravenes basic medical knowledge. And then the developers themselves might be on the hook for failing to design the system in a way that ensures patient safety or having some critical defect.

Pam Karlan: Are the systems medical devices, is there any regulation of the systems as medical devices, the AI systems? Or is it just the device that the system is implanted in?

Michelle Mello: Yeah, there is. So there is software that is in a medical device that’s regulated by the FDA as a medical device. And then there is software as a medical device. So there’s a category of clinical decision support tools that the software itself has been deemed a regulable medical device. That is just the very tippy top of the iceberg of AI tools that are in use in health care today, and the FDA only reviews it in a meaningful … but it’s a way that is not going to catch all of the problems with it. So, there is a profound need to enable this agency or other agencies to meet the challenges of this technology.

Our regulatory scheme for medical devices dates to 1976, okay? In 1976, the bestselling car was the Chevy Impala. So, you’ve got a regulatory agency that’s out there in a Chevy Impala trying to compete with the Waymo cars and keep up. And it just can’t. There’s not the statutory authority to allow it to regulate more robustly than it is, not to mention that now it’s under threat from the decisions last term undercutting agencies’ discretion to be creative in their interpretation of statutory authority. So, there’s a real need for additional intervention here.

Rich Ford: And so there’s a need for an additional intervention on the regulatory side. Is there … and I’d like to talk more about what you proposed there, but I’m also curious as to whether there’s a need for change on the tort liability side. Every time a new technology comes out, there’s some people who claim that we need to completely overturn the legal system. There was the law of cyberspace back in the 2000s, and then sometimes it seems like the existing legal regime will work perfectly well when you’re just applying it to something new to a new technology. Is this one of those cases or do we really need a law of AI?

Michelle Mello: Yeah, I think there’s really only a couple changes that I would feel comfortable seeing happen now. One is that we have seen, when we look around, increasing use by savvy AI developers of disclaimers and limitations of liability in their licensing agreements with hospitals. Hospitals do not seem to have woken up to this, and do not seem to have been exercising what is frankly like fairly substantial bargaining power right now. Every AI developer in the country would like to have a Stanford or a Mayo Clinic take up their tool. But we don’t seem to be pushing for favorable terms in this contract. So essentially, like in the absence of a robust regulatory scheme or lots of tort litigation, the developers are kind of reshaping the liability landscape just through contract. And I don’t love that, we don’t allow other kinds of product makers to disclaim warranties. So why do we allow it here?

And the other bit is how we allocate liability between hospitals and physicians, because I think it’s our strong sense that physicians are going to be at the pointy end of the stick here, and yet many, many hospitals do not have processes in place to vet these tools and monitor them, such that it’s reasonable to expect the physician to absorb whatever residual prospect of screw up remains after that vetting. So some kind of shift in enterprise liability that would allow hospitals to absorb more legal responsibility, I think might create the right incentives for their them to be doing what they should be doing, which is a lot of private governance.

Rich Ford: Maybe you could tell us a little more about the regulatory side and what you think appropriate changes to the regulatory regime might look like.

Neel Guha: Yeah, I think so. One of the …challenges with regulating AI, and it touches back to your previous questions of what should courts be changing doctrinally now, is what are the technical components of AI that matter for regulators? In other words, if we’re going to use an abstraction of this technology, at what point do differences in AI systems or differences between AI systems and existing software systems, which are ubiquitous, really dictate what’s novel, how regulation should be set up, how we should write legal rules.

I think one interesting way of approaching this, which we’ve been thinking about this a lot, is for most complex technologies, be it cars or pharmaceuticals, we have a very sophisticated body of science dedicated to doing quality assurance. We …there is a science to crash tests. There’s a science to clinical trials. And in the same way, there is a science to AI quality assurance. AI researchers refer to this as “the process of evaluation” and it focuses on how you measure and how you behave and how you monitor these AI systems. And I think one of the interesting questions over the next five to 10 years is how we align the regulatory approach for AI with the approach that computer scientists currently use to perform evaluation. What sorts of information do we produce from evaluation that’s useful for governance? What types of governance does this enable? For instance, are you determining whether to let an AI system on the market? Are you determining whether to pull it out of circulation or are you producing information that can guide litigants who are bringing claims against AI developers?

And I think it’s also important to be aware that evaluation, much like crash tests or pharmaceutical trials, is really expensive. It’s really specialized and it’s subject to all sorts of constraints. There are certain types of AI systems which are really difficult to evaluate unless you put them out in the real world and you observe them interacting with human beings on the streets of San Francisco. And so that necessitates a slightly different regulatory approach where you’re balancing the value of information against the potential harm you’re causing to your surroundings.

Pam Karlan: And so is … are you saying that there should be clinical testing of the AI models?

Michelle Mello: This is the big debate in the medical research profession now is whether AI should be held up to the same standards of testing as other medical therapies, which, is clinical trials are the gold standard. The fact of the matter is that most of AI is being implemented in healthcare with no research at all. It’s being done under the rubric of quality improvement, which is like you sell leadership on the idea of something that’s plausibly good for patients and then you put it in and you look at what happened with an observational study. So we are like way, way behind where we are scientifically for other things, pre-market.

So I think the question that Neel has been asking in his work about what does that mean for what we do post-market then is really important. So if we don’t do that before we let them on the market, how do we evaluate their performance in the real world? Which, frankly, we don’t do all that well in the pharmaceutical realm either. Like once the drug is out there, we do a fairly crappy job of understanding how it works in real world.

Pam Karlan: Not to mention, we allow all sorts of off label uses. And so I’m thinking, what would an off label use here of an AI?

Michelle Mello: It’s happening right now. There are all kinds of models that are developed on adult data, for example, that are being used on kids. Kids are not little adults. We don’t allow them … drugs to be approved for kids based on adult studies because their anatomy is different. Their physiology is different, but we don’t have big data sets of pediatric data. And so what happens now is the developer develops a product, sells it to a health system for use in adults, and then the pediatricians want to get in on the action and they do. And there’s nothing that says they can’t.

Rich Ford: This raises … sometimes we’re talking about the use of AI in medicine like it’s another type of …we’re talking about it under the rubric of medicine, so for instance, clinical trials pharmaceuticals and what have you. But other times it sounds like we’re just talking about it as software and it could be medicine. It could be self-driving cars. It could be anything. Are there important differences between those two, either in terms of the practical significance or in terms of what the regulatory regime ought to look like?

Neel Guha: I think this is one of the most challenging questions in AI regulation, which is we don’t have a very good, definite way of drawing the boundary between what we might think of and look and see and that’s an AI system and what’s traditional software systems. And so if you look at the statutory definitions for a lot of proposals, you’ll find a definition of AI and then a long list of caveats, which might say, “this doesn’t count Excel spreadsheets or this doesn’t count a Google form.”

I think it’s also going to be a challenge from a technical perspective because AI as a technology is still in its infancy, and what we know about previous technologies is even after we develop it in a lab and we see some sort of technical promise, it takes a few years, if not longer, to figure out how this is going to manifest as a product that’s actually useful, that’s actually ingrained and distributed in society.

And the current trend in AI and software amongst tech is to actually try and blend the two. So maybe this is all a long way of saying this is a very challenging problem that we’re going to might have to wait and see how it pans out.

Michelle Mello: I do think there are some aspects of healthcare that make it hard because healthcare is so nonstandard. There are some uses of AI in civil life that we can imagine being more standardized. How do you screen resumes in a hiring process, for example. But every, … if you’ve seen one hospital, you’ve seen one hospital. And so these differences in how AI gets embedded in processes of care really matter. And I’ll just use a very simple example that we encountered here recently. So there was a tool that pulled from a medical record system that half of U. S. hospitals use to summarize what happened during a patient’s care every 12 hours. New nurse comes on shift, she can check over this note, know what has happened. It turns out that our nurses don’t put everything that’s important in those fields. They put it in a different field. They like to type out words instead of clicking structured boxes. So the tool was missing a lot of important information, but you’d never know that unless you were looking at our hospital and how work gets done.

And that makes, I think, healthcare a particularly devilish setting for trying to understand human/machine interaction.

Rich Ford: So given all of this, do we have reason for optimism? In one sense, it seems like we’re behind the eight ball. The AI is moving a lot more quickly than people thinking about regulation. It’s different in every hospital. How optimistic are you that we can develop an effective regulatory and or legal liability regime?

Michelle Mello: I am optimistic, and I think you have to be if you study patient safety as I do. I’ve spent 25 years looking at why things go wrong in healthcare, and it just … I can’t, I cannot tell you how much effort and time and money has been poured into trying to make hospitals safer.

And in some areas we’ve made a lot of progress, but there are some really stubborn areas. Regardless of whether there are safety problems with AI, we have to push on because there’s just huge potential, for example, to chip away at the tenacious problem of missed and delayed diagnoses. Probably all of us know somebody in our lives who’s had a missed or delayed diagnosis.

And whether your diagnosis is caught on time, largely depends on whether you happen to get lucky with your doctor, or seen something like that before, or whatever we can do so much better. So I do have a lot of optimism, and I think the field is coming along to the understanding that this is important. It’s going to happen. And the question is, how do we put the guardrails around it to make it safe?

Pam Karlan: We’ve so lucky to have you here with us. I should say that it’s just such a pleasure to see the work that you’re doing here. It’s such a such a thrill to have colleagues who really managed to span the gamut.

I think one of the most exciting things about being at Stanford is it’s a very thin university in some ways, and it’s a very permeable university in that people work across schools, they work across fields, and I, there’s probably no place that’s more important than in the relationship among, and notice I used among and not between, computer science, law, and medicine.

Questions

Josh Cooperman class of ’74, here for the reunion. Two questions, one of them has to do with your comment about the FDA. Would this be appropriate, since you’re already starting to study this so carefully, if Stanford, because there was a presentation, I think, yesterday by the professor who has the governmental lab…

Michelle Mello: Oh, the RegLab.

Q1: Would this be something that Stanford could get very involved in and start helping the FDA, so you would be at the forefront?

Michelle Mello: Absolutely. There are, I think, literally nine different AI initiatives across campus working on problems of governance and policy, for AI and many of them focus specifically on healthcare. So, we’ve had several convenings just in the last couple of years where we’ve tried to surface some of these policy gaps and then we’re working both openly writing white papers and articles to try to make the case for additional help for the agencies, and also behind the scenes because as you all know, sometimes regulators can’t say things that need to be said.

Q1: And the second question I have is, if you’re the medical malpractice lawyer who’s defending the doctor or the hospital, wouldn’t you want to start going into the mathematical issues to say where the…this software could be in error? And how do you get there? Because what I’m hearing from your presentation is that the math is so complicated, almost no one can understand it, and I can almost see a judge kind of looking at the ceiling.

Neel Guha: Yeah, I think it’s definitely a challenge and I think it’s one of the tension points to see how does tort law deal with questions about negligence and liability with this technical object that’s so opaque. I think one suggestion we had from our work is this is a great opportunity to incorporate or draw, start building a base of AI expertise in terms of involving them in litigation. I think one way this could end up also going is turning it into questions of evaluation. So, when developers build an AI system for a particular patient population, are they actually evaluating it does well on that patient population? Are you training on adults, but then using this AI system on children, for instance?

Q2: Phil Reistrom, class of ‘79. You talked about trying to balance the value from the info that you get versus the harm that’s done. But one of the things that I keep hearing over and over again is that the HIPAA laws prevent downloading all these millions of cases all across the country in order for the AI to actually do the learning that it needs to learn. So, I wanted to ask you what kind of law reform or tort reform or whatever would be necessary in order to free the researchers doing AI. So that they could take all these cases and benefit the rest of humanity from this.

Michelle Mello: Thanks for the question. It’s such an important one because what we have right now is a health data ecosystem that is incredibly concentrated. There are sort of data haves and data have nots. And a lot of it stems from the fact that even if HIPAA actually doesn’t prevent these kind of data flows, which in a lot of cases, it doesn’t, right? If you strip out these 19 fields, you are free to share as much patient data as you like. Even if it doesn’t formally do that, it is enough to make a lot of health systems very hesitant about passing their data along to a third-party company, and frankly what’s in it for them? So, there are a few companies that have amassed just massive data sets and are cranking out an enormous number of AI tools, and then there are lots of startups that would like to get in on the innovation marketplace, but are blocked by this concentration of data. I think it’s a much broader question than just HIPAA.

I think it’s a question about whether we want, for example, the NIH to get involved in creating like a data commons. There are European nations that have just said, you know what, there are things that have to be balanced against privacy rights, the government’s going to scoop up tons of data from–they have national health systems, it’s much easier–scoop up tons of data from the health system, and we’re going to make it widely available to researchers, and they’ve made incredible discoveries. I don’t think our polity is there, in terms of wanting to make that trade off. And I think it’s very hard to make the argument for it when the when the space is dominated, not by academic researchers, but by companies looking to commercialize products because we do know that even though patients are fairly benevolent about having their data used to further science, they feel pretty uncomfortable about monetizing their data.

Q3: Hi, I’m Rick Baer, a graduate of the engineering school, and I have a question. As we go forward with AI, it’s going to gain new capability, not just through training, but through learning, and learning would require the ability to basically acknowledge, recognize errors and unfortunate outcomes. And I’m wondering does, is the legal environment of medicine going to make it difficult for AI to progress and reach its potential?

Neel Guha: Yeah, I think it’s a tricky question. I think one of the sort of the rules of thumb, like course generalization about where we’re able to build higher performing or higher capability AI systems is being where we can access high quality data. And so pretty much wherever you have difficulty collecting, aggregating, organizing high quality data, AI systems, for those applications corresponding to that data, are harder to build. And so I think for the reasons that Michelle talked about, those are some of the constraints or challenges we might run into in the healthcare system.

I’d also add generally this also breaks down by population. So, one of the concerns we have about AI generally beyond healthcare, but also in the legal setting and others is that we have varying qualities of data across different types of populations of individuals and across different particular applications. And so there’s a larger question of how, if we want to ensure that AI’s benefits are equitably distributed, we need to make sure that we have data corresponding to the applications that we want AI’s benefits to flow towards.

Michelle Mello: I mean, I think there has been some adaptation already to this problem in the field. In tort law, we’ve long had the doctrine that if a manufacturer makes improvements in the product to make it safer, that can’t be used against them to prove that the last version was defective. So that’s helpful. But I think more, even more interestingly … So the FDA has recognized very early on this is a problem. Like when they approve a drug, like the molecule is what it is, it’s used for whatever it’s approved for and it doesn’t change. But many of these AI models change daily, if not hourly. And so how do you think about approval? What is the thing that got approved and what is its relationship to the thing that you’re running the next day?

But it actually has innovated with this new guidance framework for something called pre-determined change control plans, which is a framework for anticipating … submitting kind of a plan for anticipated changes or tuning, refining of your model. So that’s a very interesting example of dynamic regulation kind of evolving. Now, it’s all guidance, because again, no actual authority here to do any of this, but it is an example, I think, of where regulators are being really quite agile with the authority that they have to meet some of these challenges.

Q4: So, the software that you’ve been discussing is primarily institution- or provider-facing, so used by hospitals, doctors, nurses, et cetera. If instead you’re talking about software that is used directly by consumers, for example, to manage their own care, then what are the differences or similarities from the legal perspective that you see in that context?

Michelle Mello: So, the tableau at trial is much simpler, right? It’s just you and the product maker in that. And … although I will say, I have heard about physicians actually referring patients out to online tools that they can use in lieu of, or in addition to, healthcare. So, there’s still the potential for the physician to be in the loop on some of this stuff. But yeah, ordinarily it’s going to be the patient trying to prove that there was a defect in this software, and all the familiar problems that Neel and I have encountered around showing well, what is that defect in the context of an algorithm with millions of parameters that produces a different result probabilistically every time–very, very difficult to do. I think right now we’re in this awkward adolescence where every time there is one of these train wrecks around consumers’ use of an AI tool, it gets a lot of publicity. And so of course, what happens in this interim period before these liability rules get worked out is that these things just get settled quickly and quietly as possible.

And so I think even if we don’t really know how these cases are going to turn out or if we’re skeptical, I think as Neel and I are, that plaintiffs  are going to be very successful with a lot of these claims in court. I think that settlement dynamics will create some incentive for AI developers to want to design safely enough to avoid those claims.

Q5: Hi Imran Wadedeen class of ‘04. So as a patient is it probable that a hospital or a physician would have to disclose to me that AI is being used as part of planning my treatment. That’s part A. Part B is what would that basic disclosure translate to when it comes to adoption of AI tooling more broadly?

Michelle Mello: So that’s the problem I’m working on now. The answer is generally they don’t have an obligation. The exception is in some states, if they want to record you talking, their wiretapping laws come into play, so they have to get your consent for the recording, and most of them will give you a general disclosure.

I actually … my physician gave me one this week, that she’s going to use AI Scribe to take notes on the conversation. That was the entirety of the state mandate disclosure. So I have a project now where we’re interviewing a lot of patients about how they think about consent and disclosure, and what we hear is actually pretty surprising, that most of the time patients don’t particularly feel the need to be bothered with this information, and the reason that they feel that way is usually one of two things: They’re in the hospital, and they’re completely overwhelmed, and this is the last thing they want to have to care about. Or, there’s nothing they can do about it. There’s no opt-out here for most of this technology. The patient … it runs in the background as far as the patient is concerned. And so the question that’s of interest to me is, how do you think about a patient’s dignitary interests in having their autonomy respected enough to know what what’s going on in my care versus the fact that it is, as are many disclosures of online and elsewhere, hollow it’s performative, it’s hollow, the patient can exercise no agency in response to it. It’s a hard problem. My sense is that where things will land is that certain kinds of tools that involve a lot of risk and the patient can choose not to have will be disclosed. For example, if you want your surgery done by the robot or not by the robot. That’s a case for disclosure. But a lot of other … like, are you a patient that’s going to get blood stocked in the OR for a transfusion because you’re a high transfusion risk or low transfusion risk? I don’t see that happening.

Pam Karlan: Time for one more question.

Q6: Hi I’m the class of 1969. My name’s Joel Clevens. Given how complicated all of this is and all of the questions you’ve raised, are there solutions that if you could advise Congress or the FDA or something, is there a measure or are there two measures that you would recommend that we need to have in place in order to regulate the relationship between AI and medicine?

Neel Guha: I think, echoing some of the ideas that have been raised earlier, we need to think about what post-deployment monitoring looks like in a more systematic fashion and recognizing the types of investment in infrastructure post-deployment monitoring entails. Like Michelle said, molecules don’t change once you’ve productionized them as a drug. But we have a litany of examples of how you can deploy an AI system into the world with one expectation of what the world looks like, and then the world changes, and the performance of your model changes. So, immigration patterns to a hospital might change what the modal patient looks like. A pandemic or a stay-at-home order might change how … what types of patients in the hospital and where in their sort of health progress they’re being assessed by a model.

And so I think some sort of measures which recognize that and create the monitoring infrastructure and monitoring obligations on developers could be quite beneficial.

Michelle Mello: I think what I would like to see, beyond expanded FDA authority, is for the agency that accredits hospitals for participation in Medicare and Medicaid to start requiring them to do governance as a condition of that participation. That is the main regulatory lever that we have over the quality of care in hospitals. They must have Medicare, or they will go out of business. And in order to have Medicare, you have to meet all kinds of requirements like the number of nurses that are on staff per number of patients, all kinds of things about your structure and processes. And I think this needs to be one of them. And that would really go a long way to doing what I think we can do right now, which is each hospital knows how its workflow functions, it can be taught how to evaluate AI tools in that setting, and I think that would go a long way towards making things safe and responsible.

Q7: Raul Escandon, class of 72 engineering. I heard this morning from one Stanford official that … Can you hear me?  … That Stanford was having difficulty with the Biden administration concerning AI –they’re not getting enough cooperation getting things like what we’ve been talking about through or basically processed and so my question goes is tied to the last two questions. Is that the case that you are not getting enough cooperation from the Biden administration? Who have you been talking to? What kind of efforts have you made? And because my feeling is that the white house administration’s position is that the ethical framework for AI, is not there for them to take it basically an aggressive what is it, the cooperation attitude towards it. Yeah, because so you can see just like what liabilities and all kinds of other things in, political implications. They would be stuck in a hole if something, some of these problems were to come about. What kind of efforts have you made? Have you worked on the ethical framework necessary to justify whatever is being done? And as well, then that leads to the legal framework. Can you talk a little bit about that, about whether there’s any truth to the statement that I heard this morning?

Michelle Mello: Yeah, I’m not really sure exactly what that statement referred to. I’m not aware that there’s major disgruntlement. There are certainly quibbles with new regulations that have come out that, hospitals think don’t have a realistic picture of how they do their work, but, I think Biden administration is, fairly proactive and flexible in its approach, which is, it’s basically said “look, we need to get federal agencies thinking about the uses that they have, and we need to start start creating incentives for private governance in hospitals.” I’m not exactly sure what is, what that’s about.

There have been issues with the Attorney General my friend Rob Bonta, who’s who’s been very out front on this as well and particularly investigating algorithmic discrimination. So, a lot of what policy makers would like to do is like collect lists of tools that are in use in the hospital, and what they do, and what are all the things that go into them. And this is much easier said than done, and the value of that kind of disclosure, in the absence of a listener who knows what to do with it, is not entirely clear to some people.

Q7: How about the ethical framework?

Michelle Mello: So the ethical framework, there are countless ethical frameworks that have been proffered. What we’re working on at Stanford is figuring out how to take all these principles that I think there’s now pretty wide agreement are appropriate to apply to AI, and operationalize them. So if we care about equity, what exactly does that mean? And we’ve developed a governance process for our hospitals here in which we do an ethics review. We interview clinicians, we interview AI developers, we interview patients. We have an expert panel that we take our … each tool to before it gets deployed on patients. And we try to identify a series of potential ethical issues and mitigation steps. And then the hospital executive committee reviews those recommendations and makes the decision about go, no go.

And then if go, what is it that we’re going to do to keep this safe and responsible? Stanford’s a leader in that area, and we’re working very hard on developing something that we can help lower resource institutions implement.

Pam Karlan: Thanks to Neel and thanks to Michelle for being here. This is Stanford Legal. If you’re enjoying the show, tell a friend and please leave us a rating or a review on your favorite podcast app. Your feedback improves the show and helps new listeners to discover us. I’m Pam Karlan, along with Rich Ford. See you next time.