The French National Research Agency Projects for science

Voir cette page en français

ANR funded project

Interactions des mondes physiques, de l'humain et du monde numérique (DS0707) 2014

Syntactic parsing and multiword expressions in French

The Project, PARSEME-FR, aims at improving linguistic representativeness, precision and computational efficiency of Natural Language Processing (NLP) applications, notably parsing. The project focuses on the major bottleneck of these applications: Multi-Word Expressions (MWEs), i.e. groups of words with a certain degree of idiomaticity such as “hot dog”, “to kick the bucket”, “San Francisco 49ers” or "to take a haircut".
Despite recent advances during the last years, the state of the art concerning Multiword Expression (MWE) representation and processing is largely unsatisfactory. Current research on MWEs concentrates either on creating MWE lexicons or on the automatic recognition of MWEs in text. Only few approaches address the links between MWEs and a comprehensive linguistic analysis of text. These approaches confirm that a proper treatment of MWEs increases both linguistic precision and robustness. But they are mostly limited to some MWE classes, and to syntactic parsing. This unsatisfactory state is mainly due to a lack of linguistic resources encoding MWE information that would feed the linguistic analysers (in particular, parsers). In French, such resources exist, but are incomplete in terms of syntactic and semantic representation, coverage and/or adequacy for being used in NLP tools.
In this project, we propose to bridge the gap between linguistic precision and computational efficiency in NLP applications by investigating the syntactic and semantic representation of MWEs in language resources, the integration of MWE analysis in (deep) syntactic parsing and its links to semantic processing. Expected deliverables include enhanced language resources (lexicons, grammars and annotated corpora), MWE-aware (deep) parsers and tools linking predicted MWEs to knowledge bases. This proposal is a spin-off of the European IC1207 COST action PARSEME on the same topic.



Inria Paris - Rocquencourt Centre Inria Paris - Rocquencourt

LIF Laboratoire d'Informatique Fondamentale de Marseille

LIFO Laboratoire d'Informatique Fondamentale d'Orléans

LIGM Laboratoire d'informatique Gaspard-Monge

LI Laboratoire d’Informatique de l’Université de Tours

LLF Laboratoire de Linguistique Formelle

ANR grant: 732 028 euros
Beginning and duration: janvier 2016 - 48 mois


ANR Programme: Interactions des mondes physiques, de l'humain et du monde numérique (DS0707) 2014

Project ID: ANR-14-CERA-0001

Project coordinator:


Back to the previous page


The project coordinator is the author of this abstract and is therefore responsible for the content of the summary. The ANR disclaims all responsibility in connection with its content.