L'Agence nationale de la recherche Des projets pour la science

Investissements d'avenirProjets financés

Projet clos

Solutions Algorithmiques, Bioinformatiques et Logicielles pour le Séquençage Haut Débit (ABS4NGS)

Action : Bio-informatique

N° de convention : 11-BINF-0001

Informations générales

  • Référence projet : 11-BINF-0001
  • RST : Emmanuel BARILLOT
  • Etablissement Coordinateur : Institut Curie
  • Région du projet : Île-de-France
  • Discipline : 5 - Bio Med
  • Aide allouée : 2 000 000 €
  • Date de début du projet : 01/10/2012
  • Date de fin du projet : 30/03/2017
  • Site web du projet : https://sites.google.com/site/abs4ngs/
  • Mots clés : NGS; epigenomics; genome; transcriptome; single cell sequencing; latent variables; chnage-point detection; chromosome conformation capture; software; read alignment

Résumé du projet

Next Generation Sequencing (NGS) technologies are profoundly impacting many domains of biomedical research, ranging from basic biology to translational research and now to personalized medicine. This will increase our understanding of disease etiology leading to improved prevention, diagnosis and treatments. It will lead to major discoveries of novel drug targets.   NGS raises huge challenges in bioinformatics. The particular nature of the data and the complexity and variety of problems addressed calls for new statistical and machine learning approaches, while the scale of data produced by NGS experiments (Terabytes) calls for computationally extremely efficient procedures. As a result, bioinformatics is increasingly the bottleneck in biomedical and industrial applications of NGS. Both life science research and health activities urgently need efficient tools for making the promises of NGS a reality.   ABS4NGS provides innovative bioinformatics solutions for NGS users including academic and private companies. We identified a series of questions, including genome resequencing, RNA-Seq, ChIP-Seq, Methyl-Seq, High-Throughput Chromosome Conformation Capture (HT-C like 5C, HiC) and Ori-Seq, for which new bioinformatics methods are needed. We also recently added single-cell data analysis and visualisation, which have benefited from rapid developments of NGS sequencing and single cell sorting recently.   ABS4NGS is an original academic-industry multidisciplinary consortium of mathematicians, algorithmicians, bioinformaticians, informaticians, biologists and software developers, who are at the forefront of research in their field and collectively cover all aspects of NGS analysis. For our developments to be sharp and for our methods to meet a broad audience of biologists, our project is focused on accurate biological questions that drive the methodological developments, including the reconstruction of germline and tumor genomes from a series of patients, the study of their transcriptome, the joint study of chromosome conformation, small RNA expression, mono-allelic expression and histone modifications and their inter-relationships during development in mouse stem cells, the investigation of expression control by DNA methylation and small RNA, and the study of replication origins in human and chicken. Technoogical transfer is ensured through usual open distribution for academic tools, and  through the participation of a software development SME (Genostar) specialized in bioinformatics solutions and services.   Below we list a series of remarkable outputs of the ABS4NGS partnership. Series of novel and state-of-the-art algorithms for read mapping and variant calling. We published an in-depth analysis of the ERBB2 amplification mechanism in HER2-positive breast cancers based on SNVs, CNVs and SVs variant analysis. Series of novel and state-of-the-art algorithms for change-point detection, allowing to re-analyze transcription boundaries. New algorithms and statistical assessment techniques for the identification, the quantification and the differential analysis of isoforms based on RNAseq. The ABS4NGS partners now constitute a leading consortium in the field of chromatin conformation data modeling and analysis (both methodological and biological fields) Cutting edge results on the the spatio-temporal program of replication origins in humans, and its chromatinian environment ABS4NGS fostered the development of young researchers group (F. Picard, V. Boeva) First methods for single cell data analysis were developed Variant application is routinely used by Fondation Mérieux. Hundreds of M. tuberculosis genomes have indeed been analyzed and their annotations stored in the database together with the associated metadata.   Finally we summarize the project per WP below: Genome: A series of novel methodologies have been developed to compare the performance of various read mapping tools (RNFtools). We introduced a novel flexible approach to NGS read mapping, called dynamic mapping, further compared different strategies of dynamic mapping and proposed implementation solutions for each of them (DyMas and Ococo). We obtained a major improvement in compact representation of de Bruijn graphs for genome assembly (used in Minia assembler). We produced an in-depth analysis of the ERBB2 amplification process in HER2-positive breast cancers that was further published in Nature Communications. Although the sequencing part of this later work was founded through INCa participation to ICGC program, the contribution of the ABS4NGS consortium improved substantially the level of quality. Transcriptome: A series of novel methodologies have been developed to investigate the transcriptional landscape. For the most typical tasks, such as differential transcriptomic analysis, we took advantage of the expertise of the consortium in latent variable models to propose state-of-the-art differential analysis tools. The expertise of the consortium in change-point detection techniques (both on the algorithmic and the statistical sides) also enabled us to provide a series of modern and cutting-edge algorithms and statistical analyses. This expertise also has a great impact on WP3 for the analysis of HiC data. As for alternative splicing, ABS4NGS combined its bioinformatics, optimization and statistical forces to provide a collection of database, algorithms and statistical assessments for the detection and the functional analysis of alternative splicing events. Finally, the emergence of single cell data provided us with opportunity to develop new tools for multivariate analysis of count data, as a new task for the project. An informal working group has been build by several partners of the consortium to further collaborate on this topic, which is critical for other sequencing-based analyses such as metagenomics. Epigenome: We developed many methodologies to investigate different aspects of gene expression regulation, like chromosome conformation, DNA methylation, histone positioning and small RNAs. We produced computer software like HiTC/HiCPro/HiCSeq/PASTIS packages for conformation data analysis (published in Bioinformatics, NAR, Genome Research, Genome Biology), the Nebula softwares to analyze ChIPSeq data (Bioinformatics). We also developed new statistical methodologies, like 2D segmentation for conformation data (Bioinformatics, Machine Learning and Data Mining in Pattern Recognition), or the functional statistical model to describe reads enrichments along the genome (Journal of Machine Learning Research). Most importantly, this WP gave high impact biological results, like the fine-mapping of replication origins of the human genome (PloS Genetics), the reconstruction of the 3D structure of some genomes, like the genome of P. Falciparum (Genome Research, NAR, Bioinformatics), and the characterization of the developmental dynamics and disease potential of random monoallelic gene expression (Dev Cell). Finally, the ABS4NGS project has constituted the starting point for the creation of the “High Dimensional Statistics for Genomics” in Lyon, headed by F. Picard, who was the WP3 task manager, and during the project, V. Boeva (Institut Curie) also created her group at the Institut Cochin Computational (Epi)-genetics of Cancer. Software product: The work that has been carried out in the WP4 has validated a generic, versatile and expandable software architecture for bioinformatics applications. The access to the application through a web browser (i.e. as SaaS) has been confirmed as an excellent solution both for the users and for the application provider. In this context, hosting the data and the computing resources on the cloud appeared efficient and flexible.  A typical example of a software component of such applications has also been developed, together with its user interface. The generic architecture has been validated through th

(L'auteur de ce résumé est le coordinateur du projet, qui est responsable du contenu de ce résumé. L'ANR décline par conséquent toute responsabilité quant à son contenu.)