Methods for efficient detection of biological information from non assembled HTS data. – Colib'read
A few years ago, genomics witnessed an unprecedentedly deep change with the advent of High Throughput Sequencing (HTS), also known as Next Generation Sequencing (NGS). These technologies generate data of a new type in huge volumes. Crucial computational developments are needed to take full advantage of these data. Our project proposes an original way of extracting information from such data. Usually, a generic assembly (pre-treatment) is applied to the data, and then, in a second step, any information of interest is extracted. Our aim is to avoid this protocol that leads to a significant loss of information, or generates chimerical results because of the heuristics used in the assembly. Instead, we will develop a set of innovative methods for extracting information of biological interest from HTS data, we will develop a set of innovative methods that bypass any costly and often inaccurate assembly phase. Importantly, the developed methods will not require the availability of a reference genome. This broadens considerably the spectrum of applications of our methods. Shortly, for each biological question, our general approach will consist in 1) defining a model for the searched elements; 2) detecting in one or several HTS datasets those elements that fit the model; 3) outputting those together with a score and their genomic neighborhood. From a computational viewpoint, our proposal relies on a formal model based on the De-Bruijn graph structure to develop algorithms able to handle huge amount of data. Among others, Colib’read will deliver algorithms based on the De-Bruijn graph, and tools validated by biologists.
This project is at the interface between (i) fundamental computational questions, (ii) algorithmic developments including the design of ad-hoc indexes, parallelization, and (iii) biological applications for validation. Finally (iv) it also includes a large public and educational dissemination.
Project coordination
Pierre Peterlongo (INRIA, centre de recherche de Rennes - Bretagne Atlantique) – pierre.peterlongo@inria.fr
The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.
Partner
INRIA Rennes - Bretagne Atlantique INRIA, centre de recherche de Rennes - Bretagne Atlantique
CR INRIA Grenoble - Rhône Alpes INRIA, Centre de recherche de Grenoble - Rhône-Alpes, EPI Bamboo
CNRS-LIRMM Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier
Help of the ANR 362,391 euros
Beginning and duration of the scientific project:
February 2013
- 36 Months