The French National Research Agency Projects for science

Voir cette page en français

ANR funded project

Blanc - SIMI 2 - Science informatique et applications (Blanc SIMI 2)
Edition 2012


ContNomina


Exploitation of context for proper names recognition in the diachronic audio documents

Exploitation of context for proper names recognition in the diachronic audio documents
In the context of diachronic data (data which change over time) new names appear constantly requiring dynamic updates of the lexicons and language models used by the speech recognition system.
As a result, the project ContNomina focuses on the problem of proper names in automatic audio processing systems by exploiting in the most efficient way the context of the processed documents.

Exploitation of context for proper names recognition in the diachronic audio documents
the project will address:
the statistical modeling of contexts and of relationships between contexts and proper names;
the contextualisation of the recognition module through the dynamic adjustment of the lexicon and of the language model in order to make them more accurate and certainly more relevant in terms of lexical coverage, particularly with respect to proper names;
the detection of proper names, on the one hand, in text documents for building lists of proper names, and on the other hand, in the output of the recognition system to identify spoken proper names in the audio / video data.

Exploitation of context for proper names recognition in the diachronic audio documents
the project will address:
the statistical modeling of contexts and of relationships between contexts and proper names;
the contextualisation of the recognition module through the dynamic adjustment of the lexicon and of the language model in order to make them more accurate and certainly more relevant in terms of lexical coverage, particularly with respect to proper names;
the detection of proper names, on the one hand, in text documents for building lists of proper names, and on the other hand, in the output of the recognition system to identify spoken proper names in the audio / video data.

Results

in progress

Outlook

in progress

Scientific outputs and patents

D. Fohr, O. Mella «Combination of Random Indexing based Language Model and N-gram Language Model for Speech recognition«, Interspeech 2013
A. Lorenzo, C. Cerisara « Weakly supervised joint SRL and Dependency Parsing » soumis à l'EMNLP 2013

Partners

LIA LIA

LORIA LABORATOIRE LORRAIN DE RECHERCHE EN INFORMATIQUE ET SES APPLICATIONS

ANR grant: 317 117 euros
Beginning and duration: février 2013 - 42 mois

Submission abstract

The technologies involved in information retrieval in large audio/video databases are often based on the analysis of large, but closed, corpora, and on machine learning techniques and statistical modeling of the written and spoken language. The effectiveness of these approaches is now widely acknowledged, but they nevertheless have major flaws, particularly for what concern new words and proper names, two types of inputs that are crucial for the interpretation of the content but which are extremely difficult to model from the analysis of closed corpora.
In the context of diachronic data (data which change over time) new names appear constantly requiring dynamic updates of the lexicons and language models used by the speech recognition system.
As a result, the project ContNomina focuses on the problem of proper names in automatic audio processing systems by exploiting in the most efficient way the context of the processed documents. To do this, the project will address:
• the statistical modeling of contexts and of relationships between contexts and proper names;
• the contextualisation of the recognition module through the dynamic adjustment of the lexicon and of the language model in order to make them more accurate and certainly more relevant in terms of lexical coverage, particularly with respect to proper names;
• the detection of proper names, on the one hand, in text documents for building lists of proper names, and on the other hand, in the output of the recognition system to identify spoken proper names in the audio / video data.
Resources developed during this project will be made accessible to the scientific community. This will correspond to a lexicon of phonetized proper names (currently such a lexicon is not available in French) and annotations of an audio / video corpus.
A WEB demonstrator will be implemented to validate the scientific developments achieved in the project.

 

ANR Programme: Blanc - SIMI 2 - Science informatique et applications (Blanc SIMI 2) 2012

Project ID: ANR-12-BS02-0009

Project coordinator:
Madame Irina ILLINA (LABORATOIRE LORRAIN DE RECHERCHE EN INFORMATIQUE ET SES APPLICATIONS)
illina@nullloria.fr

Project web site: https://wiki.inria.fr/contNomina/Accueil

 

Back to the previous page

 

The project coordinator is the author of this abstract and is therefore responsible for the content of the summary. The ANR disclaims all responsibility in connection with its content.