How AI Chatbots Pose Risks When used for Medical Advice

Recent studies sharpen doubts about using AI in clinical settings without human oversight, even as such tools spread rapidly among patients and clinicians.

MIT SMR Editors 10 minutes ago

Topics

Artificial intelligence chatbots remain unreliable when used for medical advice, with new research showing they are prone to accepting and repeating false information, especially when errors are presented in authoritative clinical language.

A study published in The Lancet Digital Health found that large language models are vulnerable to medical misinformation embedded in prompts that resemble real hospital records or clinical guidance.

Researchers from the Icahn School of Medicine at Mount Sinai tested 20 open-source and proprietary AI models using millions of prompts drawn from hospital discharge notes, simulated clinical scenarios and social media posts, all containing fabricated medical recommendations.

The models were exposed to three types of content: hospital discharge summaries with a single fabricated recommendation, common health myths taken from Reddit, and hundreds of short clinical scenarios written by physicians.

“Current AI systems can treat confident medical language as true by default, even when it’s clearly wrong,” said Dr Eyal Klang, a co-lead author of the study. “For these models, what matters is less whether a claim is correct than how it is written.”

Overall, the AI systems accepted and propagated false information about 32% of the time. When the misinformation appeared in realistic-looking hospital notes, that figure rose to nearly 47%, Dr Girish Nadkarni, chief AI officer of the Mount Sinai Health System said. By contrast, when misinformation came from Reddit posts, the rate dropped to 9%.

Among the LLMs tested, OpenAI’s GPT models were the least susceptible to false claims, while others accepted incorrect information in more than 60% of cases.

Meanwhile, in a separate study, researchers from the Oxford Internet Institute and the Nuffield Department of Primary Care Health Sciences said using AI chatbots to seek medical advice can be “dangerous.”

The research involved nearly 1,300 participants who were asked to assess personal medical scenarios and decide on appropriate actions, such as seeing a general practitioner (GP) or going to hospital. One group used AI chatbots for guidance, while others relied on traditional sources, including their own judgment.

The study found no improvement in decision-making among those who used AI. Instead, chatbot responses often combined accurate advice with misleading or incorrect information, leaving users uncertain about the safest course of action.

“Despite all the hype, AI just isn’t ready to take on the role of the physician,” said Dr Rebecca Payne, a GP and co-author of the Oxford study. She warned that people using chatbots to interpret symptoms could receive wrong diagnoses or fail to recognise when urgent medical care is needed.

Lead author Andrew Bean said the results highlighted a gap between how AI systems perform in tests and how they behave in real-world use. “Interacting with humans poses a challenge even for top-performing models,” he said.

Together, the studies suggest that while AI systems can perform well on standardized medical exams, they remain unreliable when used by patients or clinicians in real-life situations, particularly when misinformation is presented in authoritative language.

Topics

About the Author

Tags:

Artificial intelligence in Medical Lancet Digital Health

Topics

Share