Healthcare professionals are embracing AI for everything from cost savings to diagnostics. But who is to blame when AI assisted healthcare goes wrong? How is the law developing to balance the benefits and risks? Here, SLS Professor Michelle Mello, a health policy expert, and Neel Guha, a JD/PhD candidate in computer science, examine AI’s potential to enhance diagnostics and streamline workflows while addressing the ethical, legal, and safety challenges this new technology can bring. They also highlight the urgency of adapting regulatory frameworks, the complexities of liability among hospitals, developers, and practitioners, and the need for rigorous testing to ensure patient safety as AI integration in healthcare advances.
Mello and Guha were interviewed by Stanford Legal co-hosts Richard Thompson Ford and Pamela Karlan during the recent alumni weekend at the law school. The following is an edited and shortened version of the full podcast transcript, which can be found here.
Rich Ford: How is AI being used in medicine?
Michelle Mello: As in many other industries, it has become pervasive. In medical devices, the most common use is to embed them in radiological imaging equipment. If you get a mammogram in the United States today, there’s a 90 percent chance that the mammogram read will be AI assisted, and that is a good thing. AI plus radiologists outperforms just radiologists in every study.
There are surgical devices as well that help surgeons visualize anatomy. These devices can predict parts of the anatomy that can’t be visualized with the eye to help guide surgeons. And there are even AI-assisted surgical robots that are starting to come in use with human supervision. A different set of common uses involve applications of generative AI to help reduce some of the drudgery of being a physician. Physicians in the United States spend about two hours of what we call “pajama time” for every one hour of patient care. Pajama time is the time at the end of the day where you are writing up notes for patient visits, responding to patient emails, entering your billing codes in the medical record—and it’s contributing to a burnout rate in the medical profession that now tops 40 percent. So there’s huge potential for AI to shortcut those tasks by generating drafts that the physician can review and then accept.
Neel Guha: Another form of AI used in medicine is not generative AI, but predictive AI. It’s leveling up the predictors that physicians have long used to classify patients, whether it’s patients who are likely to do well after surgery versus not likely to do well, or patients who probably have a disease, or don’t probably have the disease. They’ve used simpler algorithms for a long time that are based on rules and a limited number of variables. But now they’re using massive data sets of millions of data points to divine associations between things that were never in the predictors before.
Michelle Mello: I think most people haven’t heard about the use of AI in health insurance. All of us have probably experienced the maddening process of trying to get prior authorization for some medical service that your doctor wants to give you, but your insurance company has to pre-approve. Physicians hate this process. It takes a huge amount of time. There’s a very high rejection rate, but there’s like a 90 percent overturn rate on the rejection, so it is a huge inefficiency in the healthcare system. And to make a long story short, where we are now is akin to a “battle of the bots,” where hospitals now are using AI to generate and submit those requests. The insurance company has its own algorithm to review those requests, and then the humans get involved and duke it out over the ones where they disagree.
Rich Ford: And the downsides?
Michelle Mello: The downside is that the reason we’re doing a lot of this in medicine is that physicians and nurses are overworked and too busy. Yet the whole schema of these models relies on the assumption that humans are going to remain in the loop, so to speak, in a robust way–meaning that when the AI drafts of an email to you, the patient, in response to your medical question, the physician’s going to review it carefully and edit it before sending it out. But what we are finding empirically is that you cannot have both. You can’t both have a time-saving piece of generative AI and a piece of AI where the human actually spends time reviewing it. So the danger is what we call automation bias: that over time, people come either to trust the thing, or just to not care enough about whether it’s right. Because it’s right most of the time. And without that human vigilance, or another system in place to catch the errors of the first system, the fear is that there are going to be errors that cause harm.
Rich Ford: Then there’s the potential for legal liability. Maybe you could tell us a little bit about the areas in which we’ve already seen this as a problem and what that means for the future of AI?
Neel Guha: We have been looking at situations where AI contributes to patient harm and that results in some sort of litigation. Can we get a better understanding of how courts are thinking about the liability questions that emerge? Who is liable, on what basis or standards they’re finding liability? So, we looked through a bunch of cases to find instances where there was some sort of AI system or complex software system at play that produced harm in either a healthcare or non-healthcare setting.
We noticed a few trends in terms of who the plaintiffs and defendants are and the types of claims that are emerging. First, we often see claims when hospitals are using AI systems to manage patient care or manage operations, and there are mistakes or errors that result in harm to patients. In these cases, patients typically bring negligence claims against the hospital for these harms.
The second setting is when physicians are relying on erroneous AI predictions, maybe when AI is being used to recommend a particular treatment strategy or provide a diagnosis. In these cases, plaintiffs will bring medical malpractice claims against the physician, alleging that they should have caught the error in the AI system or independently reached the correct conclusion. They’ll also bring product liability claims against the developer of this software AI system.
The final category of claims are brought when AI is embedded in some sort of device that’s either implanted in a patient or used in the process of a surgery or some other operation, and this produces an error. In this case, plaintiffs will bring negligence claims or medical malpractice claims against physicians and hospitals alleging they were negligent in maintaining the device and installing the device and neglecting to update the device. And they’ll also bring claims against the developers of these devices themselves, alleging they should have designed it better or they should have been clearer about the potential risks.
A challenge is that the ways in which humans make mistakes and the ways in which AI systems make mistakes are radically different. Humans can get tired, they can get distracted, they can eat a bad breakfast one morning. None of those are issues with AI systems. But when AI systems make mistakes, they tend to make mistakes in a very systematic fashion.
So, in the best case, a human and AI system are able to counteract each other. But in the worst case, they might actually feed off each other’s worst impulses. A human who’s quite good at catching AI mistakes, combined with an AI system, might perform substantially better than a human who is actually ignoring the AI system when it’s correct, and deferring to the AI system when it’s wrong.
Pam Karlan: One thing we see in the realm of AI and discrimination is how, at some point, you lose the ability to get “behind” the AI and figure out what was going on. For example, in cases where you have to show that there was a discriminatory purpose, at some point in the repeated iterations that go into the machine making its decision about who to target with certain ads or what to use as the measure of whether a person is likely to commit a crime if they’re let out on bail, means you can no longer figure out what the machine was metaphorically looking at. Is there a problem that’s similar to that in the medical field?
Neel Guha: Absolutely. If you look at the AI systems from 30 to 40 years ago, they look a lot more like what we might typically think of as a programmer writing code, a set of logical steps of: if you see this, then make this decision, and if you see that, then make that decision. And modern AI systems are based on machine learning, which is built in a radically different way. Rather than telling programs how to do a task, we just give a bunch of examples of “here is the input and here is the output,” and we let them figure it out through a complex process of trial and error that is so incredibly mathematically sophisticated and complex that we really struggle to explain it. And so, the best we can do is think about it in terms of inputs and outputs. If I give these types of inputs, what types of outputs do I observe and am I happy or unhappy with that?
Pam Karlan: Is there any regulation of the systems as medical devices, the AI systems? Or is it just the device that the system is implanted in?
Michelle Mello: Yes, there is. There is software that is in a medical device that’s regulated by the FDA as a medical device. And then there is software as a medical device. So there’s a category of clinical decision support tools that the software itself has been deemed a regulable medical device. That is just the tip of the iceberg of AI tools that are in use in healthcare today. So, there is a profound need to enable the FDA or other agencies to meet the challenges of this technology.
Our regulatory scheme for medical devices dates to 1976. There’s not the statutory authority to allow the FDA to regulate more robustly than it is, not to mention that now it’s under threat from the Supreme Court decisions last term undercutting agencies’ discretion to be creative in their interpretation of statutory authority. So, there’s a real need for additional intervention here.
Rich Ford: I’m also curious as to whether there’s a need for change on the tort liability side. Every time a new technology comes out, there’s some people who claim that we need to completely overturn the legal system. Is this one of those cases or do we really need a law of AI?
Michelle Mello: There are only a couple changes that I would feel comfortable seeing happen now. One issue is the increasing use by savvy AI developers of disclaimers and limitations of liability in their licensing agreements with hospitals. Hospitals do not seem to have woken up to this, and do not seem to have been exercising what is fairly substantial bargaining power right now. Every AI developer in the country would like to have a Stanford or a Mayo Clinic take up their tool. But we don’t seem to be pushing for favorable terms in the contracts. So in the absence of a robust regulatory scheme or lots of tort litigation, the developers are kind of reshaping the liability landscape just through contract. And I don’t love that.
There is also the issue of how we allocate liability between hospitals and physicians. It’s our strong sense that physicians are going to be at the pointy end of the stick here, and yet many hospitals do not have processes in place to vet these tools and monitor them, such that it’s reasonable to expect the physician to absorb whatever residual prospect of screw up remains after that vetting. So some kind of shift in enterprise liability that would allow hospitals to absorb more legal responsibility might create the right incentives for them to be doing what they should be doing, which is a lot of private governance.
Pam Karlan: Should there be clinical testing of the AI models?
Michelle Mello: A big debate in the medical research profession now is whether AI should be held up to the same standards of testing as other medical therapies, where clinical trials are the gold standard. The fact of the matter is that most of AI is being implemented in healthcare with no research at all. It’s being done under the rubric of quality improvement, which is like, you sell leadership on the idea of something that’s plausibly good for patients and then you put it in and you look at what happened with an observational study. So we are way behind where we are scientifically for other things, pre-market.
Pam Karlan: What would be an example of an off-label use of an AI?
Michelle Mello: There are all kinds of models that are developed on adult data, for example, that are being used on kids. Kids are not little adults. We don’t allow them to be prescribed drugs based on adult studies because their anatomy is different. Their physiology is different. But we don’t have big data sets of pediatric data. And so what happens now is the developer develops a product, sells it to a health system for use in adults, and then the pediatricians want to get in on the action and they do. And there’s nothing that says they can’t.
Rich Ford: So given all of this, do we have reason for optimism? In one sense, it seems like we’re behind the eight ball. AI is moving a lot more quickly than people thinking about regulation. And it’s different in every hospital. How optimistic are you that we can develop an effective regulatory and or legal liability regime?
Michelle Mello: I am optimistic, and I think you have to be if you study patient safety as I do. I’ve spent 25 years looking at why things go wrong in healthcare, and I cannot tell you how much effort and time and money has been poured into trying to make hospitals safer. In some areas we’ve made a lot of progress, but there are some really stubborn areas. Regardless of whether there are safety problems with AI, we have to push on because there’s just huge potential, for example, to chip away at the tenacious problem of missed and delayed diagnoses. Probably all of us know somebody in our lives who’s had a missed or delayed diagnosis.
Michelle Mello (BA ’93) is a leading empirical health law scholar whose research is focused on understanding the effects of law and regulation on health care delivery and population health outcomes. She holds a joint appointment at the Stanford University School of Medicine in the Department of Health Policy. Mello recently won the 2024 Barbara Allen Babcock Award for Excellence in Teaching. Mello is the author of more than 250 articles on medical liability, public health law, the public health response to COVID-19, pharmaceuticals and vaccines, artificial intelligence, data ethics and privacy, biomedical research ethics and governance, and other topics.
Neel Guha is a fifth-year JD/PhD candidate in Computer Science at Stanford. His research explores AI governance, applications of machine learning to advance empirical legal research, and methods for improving the ability of machine learning systems to engage in sophisticated reasoning. He led the creation of LegalBench, a large-scale open effort to develop benchmark tasks for evaluating legal reasoning in LLMs.