These days, it might seem like algorithms are out-diagnosing doctors at every turn, identifying dangerous lesions and dodgy moles with the unerring consistency only a machine can muster. Just this month, Google generated a wave of headlines with a study showing that its AI systems can spot breast cancer in mammograms more accurately than doctors.
But for many in health care, what studies like these demonstrate is not just the promise of AI, but also its potential threat. They say that for all of the obvious abilities of algorithms to crunch data, the subtle, judgment-based skills of nurses and doctors are not so easily digitized. And in some areas where tech companies are pushing medical AI, this technology could exacerbate existing problems.
For Google’s mammogram paper, the main criticism is that the company is attempting to automate a process that’s already somewhat controversial. As Christie Aschwanden pointed out in Wired earlier this month, doctors have argued for years that early scans for breast cancer might harm as much as they help, and the introduction of AI could tip the balance.
“There’s this idea in society that finding more cancers is always better, but it’s not always true,” Adewole Adamson, a dermatologist and assistant professor at Dell Medical School, tells The Verge. “The goal is finding more cancers that are actually going to kill people.” But the problem is “there’s no gold standard for what constitutes cancer.”
As studies have found, you can show the same early-stage lesions to a group of doctors and get completely different answers about whether it’s cancer. And even if they do agree that that’s what a lesion shows — and their diagnoses are right — there’s no way of knowing whether that cancer is a threat to someone’s life. This leads to overdiagnosis, says Adamson: “Calling things cancer that, if you didn’t go looking for them, wouldn’t harm people over their lifetime.”
As soon as you do call something cancer, it triggers a chain of medical intervention that can be painful, costly, and life-changing. In the case of breast cancer, that might mean radiation treatments, chemotherapy, the removal of tissue from the breast (a lumpectomy), or the removal of one or both breasts entirely (a mastectomy). These aren’t decisions to be rushed.
But the complexities of such a diagnosis are not given proper attention in Google’s study, says Adamson. First, the company’s researchers trained their algorithm on images that had already been identified as cancerous or not. But as there’s no gold standard for cancer diagnosis, particularly early cancer, it’s arguable whether such training data provides a good baseline. Second, Google’s algorithm only produces binary outcomes: yes, it’s cancer, or no, it’s not. As Adamson argues in a recent paper, there needs to be space for uncertainty, a third option that represents the gray area of diagnosis and that prolongs debate rather than closing it off.
When asked about these issues, the team from Google told The Verge that their algorithms’ reductions in false positive rates (incidents when something is incorrectly identified as cancer) would lessen the threat of overdiagnosis. They also stressed that the paper was “early stage research” and that they would be investigating in the future the sort of nonbinary analysis that Adamson advocates.
“This is exactly the kind of research we will be engaging in with our partners as a next step,” said a Google Health spokesperson. “We hope to be exploring workflow considerations, user-interface considerations, among many other areas.”
For Adamson, though, these challenges are bigger than a single paper. Overdiagnosis, he says, “is a problem for a lot of different cancers; for prostate, melanoma, breast cancer, thyroid. And if AI systems become better and better at finding smaller and smaller lesions you will manufacture a lot of pseudo-patients who have a ‘disease’ that won’t actually kill them.”
Overdiagnosis is one challenge when integrating AI into medicine, but for some doctors, the roots of the issue run deeper. They’re found not in specific papers or algorithms, but in the AI world’s confidence that it can supplant a whole category of medical work: radiology.
In 2016, the AI pioneer Geoffrey Hinton (one of the three “godfathers of AI” who won the 2018 Turing Award) said: “People should stop training radiologists now. It’s just completely obvious that within five years deep learning will do better than radiologists.” In 2017, co-founder of Google Brain, Andrew Ng, repeated the point while commenting on an algorithm that detects pneumonia from X-rays: “Should radiologists be worried about their jobs?”
The rhetoric has calmed down in recent years, but for actual radiologists, these soundbites have always sounded misguided and a bit insulting. (As Hinton noted in 2016 when he recalled prophesying about radiologists’ doom in a hospital: “It didn’t go down too well.”) Although algorithms are certainly able to spot specific features in medical imagery as well as doctors, that’s a far cry from being able to don a gown and start walking the wards.
The core of the issue is that radiologists don’t just look at images, says Hugh Harvey a radiologist and health tech consultant. “It’s a complete misunderstanding of what radiologists do.” The job, he says, “is more like reading a novel and trying to write a summary of what it’s about.”
As Harvey noted in a blog post back in 2018, it involves scheduling and prepping patients, collecting the data in various ways (from fluoroscopies, ultrasounds, biopsies, etc.), correlating this with other parts of the diagnosis, and engaging in all sorts of ancillary tasks, like teaching, training, and auditing others’ work. “AI really can’t replace what radiologists do in any meaningful sense,” says Harvey. “It can find things that are hard to find and show them to radiologists to get an opinion,” but not much more.
The origins of the AI world’s overconfidence here lie not in any particular vendetta against radiologists, but in the structural affinities of artificial intelligence itself. Machine vision has proved to be, by far, the greatest strength of deep learning, the dominant flavor of AI. It was an image recognition test that kickstarted the current AI boom in 2012, and it’s deep learning vision algorithms that underpin its most powerful applications, from self-driving cars to facial recognition.
Because of this, AI researchers have got plenty of mileage from applying relatively standard vision algorithms to medical datasets. This generates a lot of “firsts,” as AI learns to spot feature X in data Y and creates the impression of a fast-moving swell of technological progress. Doctors say that the most tool-like of these applications — those that simply flag features in data for doctors to verify — are the most useful. But the more complex ones that try to make their own diagnoses don’t necessarily deal with the underlying medical challenges. That’s especially true when many of the algorithms creating headlines have yet to be integrated into a clinical environment.
As Harvey puts it: “Deep learning is being used as a hammer, and tech companies are looking for nails, but some of the nails — they’re not quite right.”
If there is one consistent theme to be found in the borderlands of AI and medicine, it’s that problems are just not as simple as they first seem.
Health care reporter Mary Chris Jaklevic pointed out in a recent article that a lot of the misinformation here stems from the “machine versus doctor” narrative found in so many AI studies and the subsequent reporting. Such a narrative is both clicky and sticky, attracting readers’ interest in the moment and shaping their understanding of future debate. But it’s also one-dimensional and reduces the complexities of medical diagnosis to a few numbers. Doing so fails to account for the parts of health care work that are less easy to quantify.
Despite this, most experts involved in this work — be they programmers or doctors — are still cautiously optimistic about AI’s potential in health care. As Adamson notes, it’s the ability of AI to scale that makes it so powerful and gives it so much promise as well as demanding caution.
Once an algorithm has been exhaustively vetted, he notes, and the intricacies of how it will fit into the diagnostic process are worked out, it can be deployed quickly and easily almost anywhere in the world. But if those tests are rushed, then bad side effects like overdiagnosis will multiply just as fast.
“I don’t think AI should be thrown in the dustbin, quite the contrary,” says Adamson. “It has the potential to do good things, if designed appropriately. My concern isn’t with AI as a technology, but how we will apply it.”