close
close

How can we better regulate AI in healthcare?

This May, the Stanford Institute for Human-Centered AI brought together more than 50 policymakers, researchers, healthcare providers, AI developers, and patient advocates in a closed workshop to address the regulatory challenges introduced by the rapid integration of AI into the healthcare industry. conversations of the day included analyzing gaps in regulations regarding AI devices and exploring new applications of AI, both in patient contact and in administration and operations.

Read related story: Paths to Governing AI Technologies in Healthcare

Curt Langlotzdeputy director of Stanford HAI and professor of radiology, medicine, and biomedical data science, led the day’s conversation. Here, he offers some of the key takeaways from the day’s conversations and how the regulatory landscape needs to change to benefit patients, physicians, and developers.

What do regulators need to know about AI?

The Food and Drug Administration (FDA) is our primary federal regulator of clinical AI. They already know a lot about AI and are doing a great job of balancing safety and innovation in a 50-year-old regulatory system designed in the days of paper records and fax machines.

I want to highlight the challenges that potential buyers of AI algorithms face today. In my specialty, radiology, there are over 600 FDA-approved algorithms and over 100 companies selling AI products to radiologists. We know that these algorithms do not generalize well to new populations. So many potential customers have difficulty determining whether a given AI product will work in their practice. We need more transparency about the data on which these products were trained. The FDA is making great progress in this area, working in part with an international consortium. At the meeting, we discussed the advantages of model cards and data sheets and made an analogy to “Table 1” of prospective clinical trial publications, which detail the characteristics of the patients in whom a new drug or device is being tested.

What do AI developers need to better understand about regulation?

Developers tend to think of regulation as a problem to overcome. But in many ways we’re lucky that AI in healthcare is already a regulated industry with a neutral party ensuring that we build safe and effective systems. We’ve seen in other industries recently how a lack of standards can undermine public trust in AI.

We should also do more to eliminate the waste of effort that occurs when developers are unaware of the rigorous evaluations that regulators expect. If we applied the required rigor from the start, we would avoid having to rerun experiments later.

Where are the greatest opportunities now?

Many of the AI ​​algorithms available today are designed to detect things, whether it’s sepsis, a brain hemorrhage, a lung nodule, or something else. These systems, while marginally helpful, also create additional work for the user, not only to chase false positives but also to track true positives. That’s why healthcare workers who use these algorithms don’t always find them beneficial.

Ultimately, we see a shift toward algorithms that improve efficiency or have a clear return on investment. For example, an algorithm that can generate a clinical note or a radiology report can save the user significant time. And an algorithm that extracts new information from images, such as finding patients with unsuspected coronary artery disease or osteoporosis during a routine CT scan, can not only improve outcomes but also provide financial benefits to the healthcare organization.

Another possibility is using large language models to engage patients in their care. My lab designed a system that helps patients understand their imaging results. The patient receives a radiology report with hyperlinked complex medical terms. If they don’t understand something, they can click on the link and get a simple, clear explanation from a chatbot.

Where are the biggest pitfalls?

I think the role of large language models in medicine is currently being overhyped. There have been a number of recent papers showing that these models can pass certification exams and solve abstract clinical problems. I don’t see a clear path to regulatory approval for using these large models in this way. The hallucination issue will be hard to dismiss. And since the training data is essentially the entire internet, the system has a skewed view of disease probability based on what’s getting attention on the internet.

Instead, we can build models trained only on large amounts of high-quality medical data to be used to pre-train specialized downstream models. I don’t need a general model that can recommend restaurants and do a bunch of other things that are useful but are wasted effort in medicine. Instead, I want a model that can provide high-quality medical decision support.

Another pitfall is how much we still have to learn about how best to interact with these systems. We know that a combination of humans and machines is likely to be better than either alone. But poor human-machine interaction, often caused by poor system design, can lead to poor outcomes. There has been a lot of great work on how systems can better explain their outcomes, which is important for trustworthy systems. But there is still a lot of work to be done to design optimal human-AI interactions.

What surprised you in today’s conversation?

I was pleasantly surprised by the commitment of our colleagues in government. Often regulators are reluctant to share information about their processes. But we had many honest discussions about the concerns they face. I felt that our regulators and policymakers were open to change. In particular, I felt that there should be a greater emphasis on post-market surveillance rather than pre-market surveillance. There has been good progress in this area also. I think it’s a healthy dynamic.

What are the next steps for you? For this group of participants?

We plan to write up our findings and recommendations in a series of policy briefs. The meeting is organized around three use cases: clinical decision support, such as an AI algorithm that detects abnormalities in images; enterprise AI, such as apps that can listen to doctor-patient interactions and take notes; and consumer-facing AI, such as mental health chatbots. We will have some recommendations for policymakers in each of these areas.

The mission of Stanford HAI is to advance research, education, policy, and practice in artificial intelligence to improve human lives. Learn more.