How can we be sure artificial intelligence is safe for medical use?

Nurse practitioner Debra Brown guides patient Merdis Wells through a diabetic retinopathy exam at University Medical Center in New Orleans.

When Merdis Wells visited the diabetes clinic at the University Medical Center in New Orleans about a year ago, a nurse practitioner checked her eyes to look for signs of diabetic retinopathy, the most common cause of blindness.

At her next visit, in February of this year, artificial intelligence software made the call.

The clinic had just installed a system that's designed to identify patients who need follow-up attention.

The Food and Drug Administration cleared the system — called IDx-DR — for use in 2018. The agency said it was the first time it had authorized the marketing of a device that makes a screening decision without a clinician having to get involved in the interpretation.

Turn Up Your Support

MPR News helps you turn down the noise and build shared understanding. Turn up your support for this public resource and keep trusted journalism accessible to all.

It's a harbinger of things to come. Companies are rapidly developing software to supplement or even replace doctors for certain tasks. And the FDA, accustomed to approving drugs and clearing medical devices, is now figuring out how to make sure computer algorithms are safe and effective.

Wells was one of the first patients at the clinic in early February to be tested with the new device, which can be run by someone without medical training. The system produces a simple report that identifies whether there are signs that a patient's vision is starting to erode.

Wells had no problem with the computer making the call. "I think that's lovely!" she says.

"Do I still get to see the pictures?" Wells asks nurse practitioner Debra Brown. Yes, Brown replies.

"I like seeing me because I want to take care of me, so I want to know as much as possible about me," Wells says.

The 60-year-old resident of nearby Algiers, La., leans into the camera, which has an eyepiece for each eye.

"It's just going to be like a regular picture," Brown explains. "But when we flash, the light will be a little bright."

Once Wells is in position, Brown adjusts the camera.

"Don't blink!" she says. "3-2-1-0!" The camera flashes and captures the image. Three more flashes and the exam is done.

She says still planning to examine the images and backstop the computer's conclusion. That reassures Wells.

The test is quick and easy, which is by design. People with diabetes are supposed to get this screening test every year, but many don't. Brown says the new system could allow the clinic to screen a lot more patients for diabetic retinopathy.

That's the hope of the system's inventor, Michael Abramoff, an ophthalmologist at the University of Iowa and company founder.

"The problem is many people with diabetes only go to an eye-care provider like me when they have symptoms," he says. "And we need to find [retinopathy] before then. So that's why early detection is really important."

Abramoff spent years developing a computer algorithm that could scan retina images and automatically pick up early signs of diabetic retinopathy. And he wanted it to work in clinics, like the one in New Orleans, rather than in ophthalmologists' offices.

Developing the computer algorithm wasn't the hard part.

"It turns out the biggest hurdle, if you care about patient safety, is the FDA," he says.

That hurdle is essential for public safety, but not an easy one for a brand-new technology — especially one that makes a medical call without an expert on hand.

Often medical software gets an easy road to market, compared with drugs. Software is handled through the generally less rigorous pathway for medical devices. For most devices, the evaluation involves a comparison with something already on the market.

A retinal image shows severe nonproliferative diabetic retinopathy, a vision-threatening form of the disease, characterized by hemorrhages (the darker red spots in the image) across the retina.

But this technology for detecting diabetic retinopathy was unique, and a patient's vision is potentially on the line.

When Abramoff approached the FDA, "of course they were uncomfortable at first," he says, "and so we started working together on how can we prove that this can be safe."

Abramoff needed to show that the technology was not just safe and effective but that it would work on a very diverse population, since all sorts of people get diabetes. That ultimately meant testing the machine on 900 people at 10 different sites.

"We went into inner cities, we went into southern New Mexico to make sure we captured all those people that needed to be represented," he says.

All the sites were primary care clinics, because the company wanted to demonstrate that the technology would well without having an ophthalmologist on hand.

That extensive test satisfied the FDA that the test would be broadly useable, and reasonably accurate. IDx-DR surpassed the FDA's requirement. Test results that indicated eye disease needed to be correct at least 85 percent of the time, while those finding no significant eye damage needed to be correct at least 82.5 percent of the time.

"It's better than me, and I'm a very experienced retinal specialist," Abramoff says.

The FDA helped guide the company's software through its regulatory process, which is evolving to accommodate inventions flowing out of artificial intelligence labs.

Bakul Patel, associate director for digital health at the FDA, says that in general, the FDA expects more evidence and assurances for technologies that have a greater potential to cause harm if they fail.

Some software is completely exempt from the FDA process. A simple tweak in a routine piece of software may not require any FDA review at all. The rules get tighter for a change that could substantially alter the performance of an artificial intelligence algorithm.

The agency has years of experience approving software that is part of medical devices, but new algorithms are creating new challenges.

For one thing, the agency needs to be wary of approving an algorithm that's based on a particular set of patients, if it's not clear that it will be effective in different groups. An algorithm to identify skin cancer may be developed primarily on white patients and may not work on patients with darker skin.

And many algorithms, once on the market, will continue to gather data that can be used to improve their performance. Some programs outside of health science continually update themselves to accomplish that. That raises questions about how and when updated software needs another round of review.

"We realize that we have to re-imagine how we look at these things, and allow for the changes that go on, especially in this space," Patel says.

To do that, the FDA is testing out a whole new approach to clearing algorithms. The agency is experimenting with a system called precertification that puts more emphasis on examining the process that companies use to develop their products, and less emphasis on examining each new tweak. Continued monitoring is another element of this strategy.

"We're going to take this concept and take it on a test run," Patel says.

A retinal scan is displayed at University Medical Center in New Orleans using software detects is called diabetic retinopathy.
Richard Harris

Because many algorithms will likely be in a state of continual evolution, "it's really important when a system is deployed in the real world that we monitor those systems to make sure that they're performing the way we expect," says Christina Silcox, a researcher at the Duke-Margolis Center for Health Policy.

She's enthusiastic about the prospects of AI in medicine, while alert to some of the challenges the FDA will face.

"Right now we might see an update to a medical device every 18 months," she says. "In software you might expect to see one every two weeks or every month." Seemingly minor software glitches can occasionally have serious unintended consequences. One of the worst cases involved a radiation therapy machine that, in the 1980s, gave huge overdoses of radiation to some patients because of a software bug. Researchers looking at more recent incidents identified 627 software recalls by the FDA from 2011 through 2015. Those included 12 "high risk" devices such as ventilators and a defibrillator.

Patel certainly doesn't want to see a high-profile failure, because that could set back a promising and rapidly growing industry.

One challenge that's beyond the FDA's scope is figuring out how to resolve conflicting conclusions from rival devices. Genetic tests that are used to guide cancer treatment, for example, already provide conflicting treatment recommendations, says Isaac Kohane, a pediatrician who heads the biomedical informatics department at Harvard Medical School. "Guess what," he says, "The same thing is going to happen with these AI programs."

"We're going to have built-in disagreements and no doctor and no patient will know what is right," he says.

Indeed, IDx isn't the only company that interested in using an algorithm to identify early signs of diabetic retinopathy. Among its competitors is Verily, one of Google's sister companies, which is currently deploying its technology in India. (Google is among NPR's financial supporters).

"Actually I'm quite bullish in the long term," Kohane says, as he looks out on the burgeoning field of AI. "In the short term, it's wild land grab."

He says we need the equivalent of Consumer Reports in this area to help resolve these disagreements and identify superior technologies. He would also like reviews to examine not simply whether a technology performs as expected, but if it's an improvement for patients. "What you really want is to get healthy," he says.

The cost of the camera and set-up for the IDx-DR systems is around $20,000, a company spokesperson said in an email. There are options to rent or lease-to-own the camera that can reduce the upfront costs.

The list price for each exam is $34, the spokesperson said. But it varies depending on factors including patient volume.

A technically accurate piece of software doesn't automatically lead to better health.

At the diabetes clinic in New Orleans, for example, the system replaced a service that also checked for another cause of blindness, glaucoma.

Nurse practitioner Brown visually scans Wells' images for signs of glaucoma, but that wouldn't happen when the work is handed off to someone who lacks her expertise. Instead, the diabetes clinic staff will refer patients to get another appointment for that test.

Wells also got something that future patients might not – a review of her retina images, so she could see for herself any suspected issues. That interaction with a health care professional was also an important moment to talk about her diet and what she can do to stay healthy.

Chevelle Parker, another nurse practitioner, points to some silvery lines inside the eye's blood vessels.

"That happens when your sugar levels are high," Parker explains. "It can also be an indication of diabetic retinopathy. So we're going to do a referral and send you on for complete testing."

The software did its intended job. While Wells seemed a bit upset by the news, at least she has found out about this concern early, while there's still time to protect her vision.

How can we be sure artificial intelligence is safe for medical use?

Go Deeper.

Like this?

Turn Up Your Support