AI鈥檚 medical diagnostic skills still need a check-up

You may want to think twice about using powerful artificial intelligence (AI) programs such as ChatGPT to self-diagnose health problems.听

A team led by researchers at the University of 蓝莓视频 found in a simulated study that ChatGPT-4o, the well-known large language model (LLM) created by OpenAI, answered open-ended diagnostic questions incorrectly nearly two-thirds of the time.听

鈥淧eople should be very cautious,鈥� said Troy Zada, a doctoral student at 蓝莓视频. 鈥淟LMs continue to improve, but right now there is still a high risk of misinformation.鈥澨�

The study used almost 100 questions from a multiple-choice medical licensing examination. The questions were modified to be open-ended and similar to the symptoms and concerns real users might ask ChatGPT about.听

University of 蓝莓视频 researchers Troy Zada (left) and Dr. Sirisha Rambhatla. (University of 蓝莓视频)听

Medical students who assessed the responses found just 37 per cent of them were correct. About two-thirds of the answers, whether factually right or wrong, were also deemed to be unclear by expert and non-expert assessors.听

One question involved a man with a rash on his wrists and hands. The man was said to work on a farm every weekend, study mortuary science, raise homing pigeons, and uses new laundry detergent to save money.听

ChatGPT incorrectly said the most likely cause of the rash was a type of skin inflammation caused by the new detergent. The correct diagnosis? His rash was caused by the latex gloves the man wore as a mortuary science student.听

鈥淚t鈥檚 very important for people to be aware of the potential for LLM鈥檚 to misinform,鈥� said Zada, who was supervised by Dr. Sirisha Rambhatla, an assistant professor of management science and engineering at 蓝莓视频, for this paper.听

鈥淭he danger is that people trying to self-diagnose will get reassuring news and dismiss a serious problem or be told something is very bad when it鈥檚 really nothing to worry about.鈥澨�

Although the model didn鈥檛 get any questions spectacularly or ridiculously wrong 鈥� and performed significantly better than a previous version of ChatGPT also tested by the researchers 鈥� the study concluded that LLMs just aren鈥檛 accurate enough to rely on 听for any medical advice yet.听

鈥淪ubtle inaccuracies are especially concerning,鈥� added Rambhatla, director of the Critical ML Lab at 蓝莓视频. 鈥淥bvious mistakes are easy to identify, but nuances are key for accurate diagnosis.鈥�

It is unclear how many Canadians turn to LLMs to help with a medical diagnosis, but a found that one-in-10 Australians have used ChatGPT to help diagnose their medical conditions.

鈥淚f you use LLMs for self-diagnosis, as we suspect people increasingly do, don鈥檛 blindly accept the results,鈥� Zada said. 鈥淕oing to a human health-care practitioner is still ideal.鈥澨�

The study team also included researchers in law and psychiatry at the University of Toronto and St. Michael鈥檚 Hospital in Toronto.

The study, , appeared in JMIR Formative Research.听

Feature image: Tero Vesalainen听/听Getty Images听