Helmholtz researcher Fabian Theis teaches machines to learn so they can support the work of doctors and scientists. Theis spoke about the benefits offered by big data and machine learning at the Symposium for Individualized Infection Medicine in Hanover, Germany.
Laboratory technologies in the life sciences have progressed rapidly in recent years. A process that took an entire decade and consumed enormous resources in the 1990s – the sequencing of the human genome – is now carried out thousands of times each day in labs around the world. Gene transcripts are likewise sequenced on a routine basis. These short-lived copies of genes carry the blueprint for the proteins that a cell is producing at the time. Analyzing the transcriptome, which is the set of all the transcripts, provides researchers with information on the current condition of a cell, a tissue, or even an entire organism. Micro-scale methods have now developed to the point that the transcriptome or characteristics of individual cells can be examined in detail. This enables researchers to characterize different types of cells, their stages of development, or the ways they react to medications, for example.
However, these analyses create vast quantities of data – a phenomenon also referred to as big data. In addition to almost limitless series of genetic sequences, big data can also include other measured values or microscopic images. But most biologists and medical experts are not specialists in statistics or computer science. In other words, they need support to manage this flood of data.
Interpreting big data
One person they can turn to for this support is Fabian Theis, Director of the Institute for Computational Biology at the Helmholtz Zentrum München. “Big data doesn’t just mean large quantities of data, of course. The term also indicates that the data is complex and heterogeneous, and that it would be virtually impossible to interpret without the help of computers,” Theis explains. And while this data presents a challenge to many researchers in the life sciences, it is a big advantage for Theis’ research in the truest sense of the word: The more data he has access to, the more precise his results.
Theis holds a doctorate in physics and computer science, and one thing about him stands out in particular: He is incredibly enthusiastic about his work. During his lectures, he juggles figures and formulas that would make the average person’s head spin. And expert audiences, whether at Harvard University in the U.S. or in Hanover, Germany, value his expertise very highly. This is because Theis and his team have already developed numerous methods that make it possible to efficiently trawl through mountains of data in search of the latest findings.
His colleague Niklas Köhler, for example, is working on methods for searching through thousands of medical images of the ocular fundus for signs of diseased retinal tissue in order to prevent patients from losing their sight. And Alexander Wolf is mapping the development of stem cells in an organism with the help of big data analytics.
Teaching machines to learn
Theis carries out his work using machine learning, a method applied in the field of artificial intelligence. While computer programs that enable machines to learn from data have existed since the 1960s, a type of algorithm known as artificial neural networks has seen a revival in recent years. Thanks to the increase in their computing power, modern computers can work with software that is significantly more complex. Today’s neural networks sort and categorize characteristics on a number of hierarchical levels and “learn” from their experiences. This method is referred to as "deep learning" due to the numerous levels of learning involved. Deep learning enables neural networks to independently grasp the concepts underlying biological processes, for example.
The researchers in Theis’ working group develop algorithms for machine learning and deep learning so the team can understand how diseases progress or how an organism develops as a whole. And this is where big data comes in: The denser the data, the better the software can learn from it – and the higher the resolution of the pattern it detects in the data. As a result, greater quantities of data lead to more precise descriptions, particularly when continuous processes are being examined.
In addition, Theis and his colleagues first need to train their algorithms for each phenomenon they want to research. They do this by feeding training data into the computer. "We supply this data with our own predictions," says Theis. "The neural network then interpolates them. It essentially gathers experiences and learns to make its own predictions based on the new data we input."
The researchers therefore divide each dataset into two parts: a training set for the new algorithm and an analysis set for gleaning insights. The more data provided to the program for learning purposes, the more precisely it can depict the process being studied. And the more data included in the analysis set, the more detailed the result.
Using this approach, Theis and his colleagues have been able to teach computers to sort cells according to their stage of cell division based on microscopic images. Modern flow cytometry devices collect thousands of images of this type in a very short space of time. Niklas Köhler and Alexander Wolf programmed deep learning algorithms to evaluate cell characteristics such as size, shape, and texture. They used images of over 30,000 cells, whose stage of division they had already determined, to then train their system, which they called “DeepFlow.” This enabled DeepFlow to classify cells according to their specific stage of division. But that’s not all. The system also reconstructed the entire process of cell division based on similarities between the individual cells. DeepFlow had used its observations to draw its own logical conclusions.
"The process of training deep learning based on image recognition has been established for some time," says Theis. Given his curiosity, he is looking forward to new challenges. "What’s exciting for us is applying these methods to data from the genome or proteome as well – for example, the sequences from the serial analysis of entire transcriptomes gained from many individual cells."
Applying deep learning in the field of medicine
Learning algorithms like those developed by Theis’ working group can also prove useful in the diagnostic evaluation of medical data. A wide range of different examinations are taken into account when deciding how a patient will be treated. The results may take the form of images, text, or simply measured values. Going through this material requires a great deal of time and effort on the part of the doctors caring for the patient. Deep learning algorithms can simplify this process by evaluating the available data and presenting the results to the doctors in a clear way.
Nonetheless, medical experts tend to regard these computer-based evaluations with skepticism. "Doctors justify their proposals when discussing various treatment strategies with their colleagues. But the software doesn’t do that, and doctors won’t necessarily understand why it’s come up with that result right away," Theis explains. "They view self-learning algorithms as a black box and are reluctant to put the suggestions into practice during treatment."
Shedding light in the black box
For this reason, Theis came up with a method that lends transparency to the decision-making paths used by a deep learning algorithm. He uses visualization software to show the respective status of the evaluation at the deep levels of the neural network. The visualization illustrates how the algorithm weighed individual characteristics in order to grasp the underlying concept. In this way, Theis hopes to shed light on what is going on in the black box and provide medical experts with reliable software that can support them when making decisions.
In particular, cancer patients or people with viral or bacterial infections could benefit from this. Thanks to modern analysis methods and a smart approach to interpreting data, their treatment could be ideally adapted to meet their individual requirements and the course of the disease.