Virtual Cell
A digital twin for the future of medicine

AI-generated image (ChatGPT)
Helmholtz researchers are developing a precise, adaptive digital model of the cell. This model is fed millions of pieces of biological data and is ready for virtual experiments.
Cells are the building blocks of life — tiny yet infinitely complex. For decades, researchers have been trying to understand how they work and communicate with each other. They have also sought to understand what goes wrong when diseases develop. However, the view inside cells has often remained blurred. It's like trying to watch a clockwork mechanism through a frosted glass pane. The VirtualCell project aims to change that. The goal is to create a digital twin of the human cell — a model that simulates processes in real time — not just for one cell, but millions.
Fabian Theis, head of the Computational Health Center and director of the Institute of Computational Biology at Helmholtz Munich, is leading the project. He believes that the vast amounts of data produced in biomedicine have great potential in the era of artificial intelligence (AI) and machine learning. The first full decoding of the human genome took almost a decade and a half and cost nearly US$3 billion through the Human Genome Project (HGP), which ran from 1990 to 2003. Today, sequencing is faster, cheaper, and much more powerful. The Human Cell Atlas, which involves Theis, is another large-scale data project. It has already mapped nearly 60 million human and mouse cells, showing which cell types exist in the body, where they occur, and which genes are active in which cells.
With VirtualCell, Theis and experts from the Max Delbrück Center, the Forschungszentrum Jülich, and the chip manufacturer Nvidia aim to take it even further. Theis refers to it as a "multimodal basic model"—a digital twin of the cell that processes not only genome data, but also information about proteins, spatial structures, and other components and processes within cells. Various levels of information are incorporated, such as the transcriptome, which shows that different genes are active in cells of the same type. The DNA of these genes is translated into RNA. By analyzing the RNA floating around in the cell, researchers can gain insight into gene activity and the cell's location in a tissue. Together, these analyses provide a comprehensive picture of the cell and its environment. "If we can use VirtualCell to visualize this expanded analysis of complex cell processes and interactions, it would revolutionize our understanding of cell functions and disease progression," says Theis. "Multimodal foundation models can capture the molecular states of cells much more accurately than before, across different cell types and conditions," he adds. This would allow for the creation of comprehensive, interconnected maps of cells, genes, and tissues, providing new insights into the organization and function of living systems.
[Translate to Englisch:] Fabian Theis. Bild: Helmholtz / Till Budde
The three-year project, funded as part of the Helmholtz Foundation Model Initiative, will begin by optimizing an existing model developed by Theis. NicheFormer is an AI system that has been trained using over 110 million cell samples from humans and mice in order to predict spatial patterns in tissues. NicheFormer can already recognize disease patterns and simulate how cell clusters react to drugs, eliminating the need for complex microscopy.
VirtualCell surpasses NicheFormer in several ways. It processes a wider variety of data, classifies specific cell types, links cell behavior with spatial structures, and has a broader range of applications in medical practice. VirtualCell could set new standards in medicine by enabling us to understand the cellular basis of diseases ranging from cancer to autoimmune disorders. A realistic simulation model of these processes opens up enormous opportunities: "Drugs can be developed more precisely, disease progression can be predicted more accurately, and therapies can be tailored to individual patients," Theis explains. Similar to how pilots practice critical maneuvers risk-free in a flight simulator, researchers could use VirtualCell to simulate how cells behave when genetically modified, under stress, or in contact with active substances. This can be done without animal testing or risk to patients.
Some of the data entered into VirtualCell is initially masked for training purposes. Certain details, such as individual genes or gene characteristics, are hidden, and the system must reconstruct them from the context. Through countless repetitions, VirtualCell learns to independently fill in these gaps and recognize connections and patterns that would escape even the most trained human eye. Later, VirtualCell will be applied to specific clinical questions. For example, it could be used to predict the course of a disease or design biomarkers for personalized medicine.
"In the future, many major research questions will probably be solved using foundation models. We at the Helmholtz Association want to ensure that these models and data do not end up exclusively in industrial hands, where there is potentially a high degree of opacity," says Theis. As with other HFMI projects, all VirtualCell components — from the code and training data to the results — will be made available to the entire research community as open source, in accordance with the FAIR principle.
VirtualCell attempts to make the invisible visible. It is a digital twin that shows not only what is, but also what could be. If successful, VirtualCell could open new avenues for precisely simulating the effects of molecular perturbations, with the potential to fundamentally rethink disease mechanisms and therapies.
What are foundation models?
Foundation models are powerful AI models trained on large, diverse data sets. Similar to a Swiss Army knife for data analysis, they form the basis for many different tasks. Rather than building a separate model for each task, foundation models are fed so much information that they recognize general patterns, such as how language works, how images are structured, and how cells behave. These models independently learn to recognize similarities, fill gaps, and understand connections without any fixed guidelines. Foundation models can then be fine-tuned to answer specific questions, such as predicting disease progression, climate change, and aspects of the global carbon cycle.
Readers comments