The French National Research Agency Projects for science

Voir cette page en français

ANR funded project

Interactions des mondes physiques, de l'humain et du monde numérique (DS0707) 2014
Projet ITALODISCO

Innovative Techniques for the Advanced Learning Of Distributional Compositionality

This project aims to model semantic compositionality in a fully automatic and unsupervised way. Up till now, most work on the automatic acquisition of semantics only deals with individual words. The modeling of meaning beyond the level of individual words - i.e. the combination of words into larger units - has been much less thoroughly explored. This project proposes a data-driven approach that combines a number of important and innovative techniques. First of all, we rely on mathematical objects called tensors - the generalization of matrices - in order to adequately model the multi-way co-occurrences that come into play when dealing with compositionality. In combination with a latent factorization model, tensors are able to induce latent semantics from multi-way co-occurrences, which can subsequently be used for the modeling of compositional expressions. Secondly, we combine a tensor-based approach with advanced machine learning techniques, notably neural networks. Neural network techniques have recently shown impressive performance in a number of natural language processing tasks; by integrating them with our tensor-based approach, we aim to model the multi-way interaction of the various words within a compositional expression in a more profound way. Thirdly, we aim to combine the strengths of both distributional and formal semantics within one integrated approach. By combining the strengths of both approaches within a complementary framework, we expect to develop algorithms that are able to grasp the meaning of larger textual entities in a more profound and elaborate way. The proposed model aims to provide an implementation of compositionality that is entirely data-driven: the model is automatically constructed from large text corpora, and its performance is evaluated quantitatively.

Partners

IRIT Institut de Recherche en Informatique de Toulouse

ANR grant: 158 222 euros
Beginning and duration: octobre 2014 - 36 mois

 

ANR Programme: Interactions des mondes physiques, de l'humain et du monde numérique (DS0707) 2014

Project ID: ANR-14-CE24-0014

Project coordinator:
Monsieur Tim Van de Cruys (Institut de Recherche en Informatique de Toulouse)

 

Back to the previous page

 

The project coordinator is the author of this abstract and is therefore responsible for the content of the summary. The ANR disclaims all responsibility in connection with its content.