New reference model will become crucial to understand the effect of mutations on embryos
The model is a neural net that learns and expands with time, making it able to evolve as new data becomes available, but even now it is an important reference that can be used to interpret the consequences of genetic mutations, and benchmark stem cell models. As infertility and IVF are rooted in this stage of development, these models are an essential reference for benchmarking infertility research and improvements to IVF.
The model integrates single-cell RNA sequencing data sets from various experiments to investigate cell types and gene expression. Already attracting interest from potential collaborators who want to provide further data for the model, this resource can only increase in value to the scientific community as it grows. Beginning with mouse and human data, the future of the model will be extended to include other species with help from data donations from collaborative scientists working in various areas.
“We start individually, each species having its own model. I worked on the mouse while Naz on the human. There have been discussions of mixing them but these are all future projects and for now each species will be its own entity,” said Martin Proks.
Martin Proks’s computational background was crucial for the project's success as the model’s development required programming and the application of new technologies in the artificial intelligence arena. Martin used publicly available methods, including nf-core pipelines, which Martin helped develop during his masters, scvi-tools, and developed a new approach to understanding how these models make decisions using SHAP (SHapley Additive exPlanations) values.
“We picked a bunch of different technologies and tested them in the context of the early embryo. We settled on the nf-core pipelines for data processing, scvi-tools for data integration and cell type classification, and SHAP for model interpretability.
“Afterwards it was about writing many notebooks and adjusting data analysis. The classifier we used was not intended to be used with SHAP and so we had to develop a new variation-adapted version.” said Martin Proks.
The model's success is attributed to the willingness of researchers to share their data and collaborate to achieve common goals.
“We're in a competitive field, and yet people really want to work together too. And there's a very positive view towards this sort of data. People want their research to make more of a difference to more people. They want their data to become as useful as possible and that’s also what we want from these models,” said Professor Brickman.
Read the paper.