One more SSF grant to CST in Big Data

Large-scale Machine learning for single cell revolution

Published Feb 17, 2017

This project aims at providing large-scale computational methods for high throughput single cell medical investigations, for instance in cancer and regenerative medicine.

Most people are aware of that their genes control their traits like their appearance and susceptibility to disease. The same, or at least very similar set of twenty thousand genes, resides in the DNA of all the about three billion cells of your body. Still the individual cells appear very different, both in terms of their appearance and what function they fill. The difference between the cells instead sits in how their genes are expressed, that is, how many times the genes’ DNA are read and used as templates for production of proteins, which are the actors in the cell that perform most of their functions. For instance, the genes for controlling your eye-color are present in all your cells, but are possibly only expressed in your eyes. Recent technological developments have made methods available for determining which genes individual cells are expressing, so-called single cell techniques. As the methods are measuring a very large and complex system, these technologies are very data intensive by their nature, The resulting datasets are so large and complex that traditional data processing methods are inadequate. Meanwhile, the development in a branch of computer science, known as machine learning, has made some spectacular progress the last couple of years. Machine learning offers methods for computers to learn from data presented to the methods rather than from explicit instructions from a programer. In a large set of different areas of modern life, machine learning algorithms are used for making predictions, and to help its users to interpret their surroundings. Such methods are already today a prerequisite for the interpretation of molecular biological data. However, in this application we suggest machine learning methods for the interpretation of single cell data, an area where we undoubtedly will see a deluge of data appearing the upcoming couple of years. Such techniques will be particular useful in cancer research, where the technology will help us understand the interaction between the different cells of a tumor, and how to distinguish different types and stages of cancers, which will form a great aid in clinical decisions. In regenerative medicine, such technology will help understand how to alter cells so that they can perform new tasks, potentially replacing damaged cells. For instance one could produce neurons for neurological impaired patients or insulin producing beta-cells for diabetics.


Lukas Käll (Bioskolan KTH)
Johan Hartman (Oncology-Pathology, KI)
Jonas Frisén (Cell and Molecular Biology, KI)

Top page top