MN - Modèles Numériques

Efficient algorithms for realistic large-scale models in the probabilistic context: Fundamental developments and applications. – ProbAlg

Submission summary

Today biology provides amongst the most challenging problems for numerical and algorithmic developments. Notably large-scale genomic projects and the current endeavor to gain an understanding of the functioning of a cell on a structural level necessitate the development of new concepts and algorithmic solutions, beyond what would be possible by simply importing ideas from other fields. The project positions itself in this perspective, building on original algorithmic methods developed by the members of the consortium for problems in biology, making use of probabilistic concepts. The aim of the project is to generalize these developments, notably with the elaboration of new tools emerging from the convergence of previous developments. Extensions of generic algorithmic ideas beyond the original biological fields will be also explored.  

More precisely the project builds on two major methodological developments:
1. SIMEX (SIMulations with EXponentials) and Padé-Laplace methods: The SIMEX method concerns ‘dynamic programming’ models in the probabilistic context, allowing to handle realistic long-range constraints with dramatically reduced calculation times (up to six orders of magnitude). This method, originally formulated in the field of structural biophysics, does not resort to the usual oversimplifications to make computations tractable. The basic ‘computational trick’ in the method concerns the numerical representation of long-range effects as multi-exponential functions. Expressions of this kind are obtained by the Padé-Laplace method, which addresses a general and long-standing problem in Signal Analysis. The SIMEX ideas were recently extended from biophysics to bioinformatics (sequence alignments), as a first illustration for potential generality.

2. ISD (Inferential Structure determination) method: The ISD approach was first developed in the field of structural biology, to convert experimental data  on the structure of a folded protein (obtained by nuclear magnetic resonance) reliably and without bias into statistically meaningful distributions of three-dimensional structures. To achieve this, ISD fully relies on Bayesian probability theory.  The basic underlying algorithm is a multi-parameter generalization of replica-exchange Monte-Carlo schemes, and uses a hybrid Monte-Carlo algorithm to generate new proposal states for the coordinates. More generally speaking, the ISD method is a sampling algorithm to explore probability densities in various Bayesian data analysis problems. 

The development of these numerical methods, constitute an ideal template upon which to build a series of generalizations and to address difficult and important methodological issues with potential generic implications.  

1. Conceptual generalizations of the methodological components: These generalizations will be implemented during the treatment of a series of model-problems of increasing complexity. For example one such model-problem for SIMEX developments will concern RNA molecules; for ISD, the treatment of data generated from large heterogeneous ensembles. 
2. Elaboration of new interfaces between the components: A SIMEX_ISD interface will be developed, leading to new (fully probabilistic) tools. One such concerted development will concern homology modeling with ISD, taking as inputs probabilistic sequence alignments generated with SIMEX.
3. Extensions and adaptation to new application fields: Notably ISD will be extended to the treatment of low-resolution data, for the reconstruction of chromosome organization and dynamics in the nucleus from heterogenous sources of data. 
4. Assessment of generic modeling paradigms: The context of the developments and applications allows to address a generic question, which does not appear to have been explored in any systematic way, concerning the comparative virtues of probabilistic and optimization approaches for the handling of increasingly complex models, with non-linear constraints.

Project coordination

Michael Nilges (INSTITUT PASTEUR) – nilges@pasteur.fr

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

IP INSTITUT PASTEUR
Imod INSTITUT PASTEUR

Help of the ANR 232,000 euros
Beginning and duration of the scientific project: February 2012 - 36 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter