Interview

Can language models such as ChatGPT diagnose diseases?

A recent study shows in which areas language models are already effective in everyday medical practice and in which areas they are not. We spoke to Altuna Akalin, the study's lead researcher, about the opportunities and limitations of human-like AI in everyday clinical practice.

Mr. Akalin, your research group has investigated whether artificial intelligence (AI) in the form of a chat program can make diagnoses and guide patients to the right doctor more quickly. How did you proceed?

We investigated how AI can support patients and doctors on three levels: First, the AI should suggest the appropriate specialist; second, it should make a diagnosis; and third, it should assess the urgency of the case. For each level, we ran through two scenarios to see how AI could help. The first scenario was that a patient was at home, so only symptoms and general data such as age and gender were available. In the second scenario, we assumed that the patient was at a doctor's office, so additional information such as pulse and blood pressure was available.

Which AI tools did you use?

We developed four different clinical decision support workflows that use Claude, a powerful language model developed by the US company Anthropic, as their central language component. However, it wasn't just a matter of feeding the model inputs; the workflows also required comprehensive conceptual and technical design.

How effective was the AI?

At the first level, recommending a suitable specialist, the AI was very reliable. When only symptoms were mentioned, a specific specialty was selected in 87 per cent of cases from the first three suggestions. Providing the AI with vital signs, as would be taken during a visit to the family doctor, increased the success rate to 97 percent. The AI also performed very well in terms of diagnosis: the most effective AI correctly identified the disease in over 82 percent of cases, even without vital signs.

In what ways could these tools be helpful in everyday medical practice?

It's important to note that the aim is not to replace doctors with AI, as this is not possible. However, doctors sometimes need a sparring partner to exchange professional ideas with, and AI could provide exactly that. In most cases, the diagnosis is clear, so doctors don't need any help. However, in complicated cases in particular, it can be helpful for doctors to get a second opinion directly from AI. Incidentally, it was a complicated case in my personal life that first made me consider investigating AI as a diagnostic aid. My wife had various symptoms, numerous tests were carried out, and we were sent to different doctors. The amount of documentation grew with each visit. It took many months before we discovered that my wife had a rare disease. At that point, I thought: wouldn't it be a good idea to feed all the findings into an artificial intelligence system for a second opinion?

Can AI actually be better than doctors at diagnosing such complex cases?

AI can certainly be helpful when it comes to diagnosing rare diseases, of which there are thousands. If a patient has various abnormal findings that do not seem to correspond to any known diseases, a doctor may be able to identify something by searching the specialist literature. This process can take more than an hour, which is often not feasible in everyday clinical practice. AI can perform this task in seconds if it has access to all the patient's data. While we did not specifically examine such complex cases in the study, it stands to reason that AI can often help in these situations.

How did AI perform in triage and assessing urgency?

AI was less accurate in this area. The good news is that it never misclassified a serious, life-threatening condition as harmless. This means that it would not be responsible for patients in life-threatening situations not receiving quick treatment. However, AI misjudged moderate cases more often. This resulted in some patients being sent to the emergency room unnecessarily, while others were not recommended for the emergency room despite it being advisable. In emergency care triage, such misjudgements should account for less than 5 per cent of cases, a threshold that AI has not yet achieved.

Why is AI not yet good enough in this area?

Put simply, we need to fine-tune the AI further. AI is a learning system, so it needs to process more cases in order to learn whether its decisions were right or wrong and why they were wrong. Then it will be able to make better decisions.
Therefore, there is still a lot of potential for improvement in triage with AI.

How could AI be integrated into everyday clinical practice?

This will certainly not happen overnight, but rather step by step. For AI to be accepted, a user-friendly interface is essential: AI should facilitate communication between medical professionals as much as possible. For example, it would be helpful if you could simply talk to the AI instead of having to type everything in and chat. This would probably significantly lower the threshold for use. The framework conditions would also need to improve: findings must, of course, be available in digital form so they can easily be presented to and processed by the AI.

But what about data protection?

This is, of course, a major issue, particularly in Germany. Data protection must be considered from the outset, even when AI interacts with patients. This can be challenging because the field is developing rapidly, but I think adequate protection of patient data can be ensured if you stay on top of it.

In which direction would you like to continue your research in this area?

We have two main areas of research. Firstly, we are conducting a study on how AI can support therapy decisions for cancer patients. Secondly, we are developing various AI second opinion tools to support both doctors and patients. All of these research activities are challenging because the field is evolving extremely quickly. For example, we may be training an AI language model today, but there may be a better one in six months. Nevertheless, we are adapting to this dynamic environment and are eager to embrace this challenge in order to continue our research in this exciting and promising field.

Altuna Akalin is the head of Bioinformatics and Omics Data Science Platform at the Berlin Institute of Medical Systems Biology, Max Delbrück Center in Berlin

The study: Farieda Gaber, Maqsood Shaik, Fabio Allega, et al. (2025) “Evaluating large language model workflows in clinical decision support for triage and referral and diagnosis,” npj Digital Medicine DOI:10.1038/s41746-025-01684-1

AI helps people and doctors make medical decisions

26.06.2025 Interview: Christian Heinrich

Can language models such as ChatGPT diagnose diseases?

Readers comments

As curious as we are? Discover more.