Interview

AI predicts disease risk

Picture: Greenbutterfly/Shutterstock

A new AI model can predict personal risk for over 1,000 diseases years in advance. It was developed by an international research team led by the European Molecular Biology Laboratory (EMBL) and the German Cancer Research Center (DKFZ). In an interview, bioinformatician Moritz Gerstung explains how AI works, what it can do, and why it is not yet being used in clinics.

Originally, we only wanted to develop models for cancer, with the aim of calculating the risk of different types of cancer for individuals. However, when the first large language models with diverse capabilities emerged a few years ago, we realised that: Why limit our modelling to cancer?' We therefore included most of the diseases listed in the International Classification of Diseases (ICD-10) in the AI. This resulted in more than 1,000 diseases, and our AI can now provide a personalised risk prognosis for each one.

Not quite. Our model, Delphi-2M, is unique in that it is designed specifically for diagnoses; you could call it a large diagnostic model. It processes sequences of diseases that have occurred over a lifetime in the same way that language models process sequences of words. It recognises patterns in the sequence of diagnoses, in a similar way to how grammatical structures are learnt in texts.

A language model proceeds word by word. It considers each word individually and, when presented with many sequences of words, it is trained to recognise the logic of a language. In a sense, it can speak for itself. The principle is similar with the medical data and lifestyle information that we used to train our language model. The AI looks at diagnosis after diagnosis in sequence and recognises patterns, learning about possible future diagnoses, i.e. possible diseases that may occur. A major challenge here was the time component: with words, only the order matters, but with disease risks, time is also important. We therefore had to develop a model that takes into account not only the sequence of diagnoses, but also temporal relationships.

Professor Dr Moritz Gerstung is the head of the Artificial Intelligence in Oncology department at the German Cancer Research Center (DKFZ) in Heidelberg, and is one of the authors of the new study. Picture: Jutta Jung / DKFZ

We trained the AI using data from the UK Biobank, which covers 400,000 people in total. This data naturally included diagnoses, as well as information such as body mass index, alcohol consumption and nicotine use. Put simply, the AI examined all the data and recognised patterns within it. In this way, the AI identified many facts that are now considered basic medical knowledge. For instance, it recognised that smoking is associated with an increased risk of lung cancer. However, the AI was also able to calculate this risk on an individual basis over time. Of course, there is always a risk of bias because a model is only as good as the underlying data.

The AI recognises patterns that do not correspond to reality, but which arise from the peculiarities of the dataset. All 400,000 participants in this UK Biobank selection, for instance, were recruited between the ages of 40 and 70. The AI therefore concluded that it is practically impossible to die before the age of 40. This is clearly untrue, and it remains challenging for us to detect and correct such distortions.

We tested the accuracy of the AI's predictions by trying it out on a further 100,000 people from the UK Biobank who were not yet familiar with the model. For example, we provided the AI with data on patients up to the age of 60, and then asked it to calculate the annual risk of heart attack based on their respective diagnoses and lifestyles. For men aged 60–65, for example, the AI model calculated a heart attack risk ranging from 4 in 10,000 to 100 in 10,000, depending on previous diagnoses and lifestyles. We then compared these individual risks with patient data, finding that the risk was almost exactly the same as in reality. We also tested the model on Danish registry data from 1.9 million people and saw that, with a few minor adjustments, it also works across national borders. That was the acid test – and the AI passed it!

There are still a few hurdles to overcome before that can happen. The requirements for medical use are high, and rightly so. Rather than evaluating the accuracy of AI risk predictions based on past data, we should be looking at future-oriented studies. Of course, that takes a long time. Secondly, we need to establish whether the model is effective at a national population level. If we want to apply the model in Germany, for example, we should ideally train it using German health data.

A lot is happening in this area at the moment, and things are moving in the right direction: the data is being standardised, a corresponding register is being set up, and the data may soon be available in a similar format to that in Denmark or the UK.

We are currently working on this, as it is an important step that we need to take. Our aim is to incorporate blood values, detailed lifestyle information and, potentially, data from fitness trackers and other sources. After all, the more data the AI can process, the better the predictions will be.

Of course, this must all be completely voluntary, and people also have the right not to know. Our work shows that AI has great potential and that there are various possible applications. First, information could be provided about individual risks and modifiable risk factors. Based on this, diagnostic tests could be carried out for clarification, or participation in preventive care programs could be planned. These could be carried out earlier or more frequently for high-risk patients.

That's right. For example, people who have very low disease risks can have the corresponding preventive examinations carried out later or at longer intervals. Ultimately, the aim is to provide more targeted and individualized preventive care and early detection. If successful, this can help on two levels: it can strengthen individual health and it can reduce costs in the healthcare system in some areas.

AI model predicts disease risks decades in advance (DKFZ)

Artem Shmatko, Alexander Wolfgang Jung, Kumar Gaurav, Søren Brunak, Laust Mortensen, Ewan Birney, Tom Fitzgerald & Moritz Gerstung: Learning the natural history of human disease with generative transformers. Nature 2025

Readers comments

As curious as we are? Discover more.