COSINUS - Conception et Simulation

New sequential and distributed approaches in algorithmics and computational biology for the analysis of data generated by Next Generation Sequencers. – MAPPI

Submission summary

These last few years, a breakthrough in DNA sequencing techniques has been realized through new approaches and new devices under the name of Next Generation Sequencing (NGS). These sequencers are able to produce small raw sequences (called reads) at such a rate that the volume of data to process becomes staggering, reaching such a point that managing these data for mapping and assembling becomes the main bottleneck of the new technology, the fastest actual software not being able to scale to process that volume. Moreover, these new sequencers also enable heterogeneous sources to be sequenced together. This opens the way to meta analysis, that is, considering at a glance genomes or transcriptomes of all a population of living organisms.

This is the motivation of our 36 months project, which aims to propose new, relevant and efficient sequential, distributed and parallel algorithms to face the challenge of performing intensive computation for mapping, assembling, metassembling the massive volume of reads produced by the Next Generation Sequencers (NGS).

This project gathers four partners. LIAFA (University Paris-Direrot), LIFL (Lille) and IRISA (Rennes) are recognized research groups in computer sciences, whose expertise is complementary on the data and techniques to develop, each being a recognized specialist in at least one of the topics of the project: index data structure, string algorithms, parallel algorithms, biological sequence analysis,\dots These groups will design new algorithms, and develop open source software implementing these algorithms. For these latters not to remain theoretical or only prototypes and for the whole project to have a direct application domain, this project is linked to a huge biological project named Tara Oceans through the Genoscope laboratory (CEA) which is the fourth partner of the MAPPI project. Genoscope is specialised in sequencing techniques and disposes of the latest sequencing devices.

Tara Oceans is a unique multidisciplinary project gathering oceanographers, ecologists, biologists and physicists expert in marine life whose goal is to exhaustively study phytoplankton in several oceans. The role of the Genoscope is to sequence DNA and RNA samples of protits collected at each spot of the expedition from phytoplankton. Tara will provide metagenomic data (DNA from the cells of a whole sample), and metatranscriptomic data (RNA that captures gene expression). The total number of such samples will range between 2000 and 4000 for the whole project, which should induce more than 100TB of data. The software we plan to deliver will be integrated in a bioinformatics pipeline in the Genoscope.

Project coordination

Mathieu RAFFINOT (UNIVERSITE DE PARIS 7) – raffinot@liafa.jussieu.fr

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

Génoscope COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES ET AUX ENERGIES ALTERNATIVES
LIAFA UNIVERSITE DE PARIS 7
LIFL UNIVERSITE DE LILLE I [SCIENCES ET TECHNOLOGIES]
INRIA Rennes - Bretagne Atlantique INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE - INRIA

Help of the ANR 456,830 euros
Beginning and duration of the scientific project: - 39 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter