DS0708 - Données massives, connaissances, décision, calcul haute performance et simulation numérique

Contextual and Aggregrated Information Retrieval – CAIR

Submission summary

The context of the project CAIR is data management and its objectives are search and intelligible organization of query results. These objectives are in the line of technologies highlighted, e.g., by Serge Abiteboul in his inaugural lecture at the Academy of Sciences, "one must develop technologies to assess, validate, prioritize and organize in intelligent and intelligible way the information." The CAIR project focuses on specific queries, called aggregative, for which the execution is achieved by performing a complex chain of operations to compose relevant pieces of information, each part partially contributing to the answer but together form a complete response. Aggregation therefore aims to select and integrate fragments of information into a richer object, carrying new knowledge about a subject or an event. These queries look for objects that do not exist as such in the sources, but are built by assembling fragments. This type of need is widespread, especially in analytical tasks such as opinion analysis, trend analysis, product comparison, risk analysis, event summarization. Note that some specialized systems, such as bibliometric systems provide aggregative answers similar to those covered by these requests, in the sense that, in addition to the list of publications of an individual, these systems provide more analytical information regarding the rate of citation for each publication, indicators like h -index, the list of co-authors. In this project, we aim to produce, among other things, algorithms and models to support the following types of needs:
•Analytical query: OLAP-like query type producing numerical values resulting from an analysis of documentary sources (e.g., h -index, number of consulted books).
•Entity query : query exploring a set of data sources to extract the salient elements about an individual (e.g., a politician, a scientist), a phenomenon (e.g., global warming) or a concrete or abstract entity (e.g., characteristics of a Smartphone model )
•Summary query: query exploring a set of sources to mainly extract relevant issues of "what is said" about a person, an object or an event (e.g., extract from blogs what is said about same-sex marriage, analyze tweets to follow a rumor).
We will focus on two fundamental challenges. The first is semantic, it concerns both the interpretation of the query (the problems are related to the "vocabulary mismatch", the capture of the intent of the user) and the qualification of the results with regards to the initial user query. The second challenge is computational; it is relative to the combinatorial problem in the choice of fragments and multiple ways to aggregate them.
The expected results will be of three types:
•Methodological: It is to develop a reference workflow (the different tasks, their arrangement in time and the underlying constraints) to perform the evaluation of an aggregative query from its specification until its complete evaluation.
•Algorithmic: We will produce analysis and enrichment algorithms of queries that take into account the context and preferences of the user, query decomposition algorithms, aggregation operators based on the type of data involved. We will work on the definition a formal model for measuring relevance of the resulting aggregate and metrics for assessing the quality of the result.
•Experimental: we will produce benchmarks (data repositories and queries) to assess the effectiveness of our algorithms. We will make these tools available as open sources.

Mohand Boughanem (Université Toulouse III [Paul Sabatier]-IRIT)

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

TSP-SAMOVAR Télécom SudParis laboratoire SAMOVAR
LAMSADE Laboratoire d'Analyse et Modélisation de Systèmes pour l'Aide à la DEcision
PRiSM Laboratoire d'informatique
LIRIS Laboratoire d'InfoRmatique en Image et Systèmes d'information
UT3-IRIT (UMR5505) Université Toulouse III [Paul Sabatier]-IRIT

Help of the ANR 490,192 euros
Beginning and duration of the scientific project: September 2014 - 36 Months

Explorez notre base de projets financés

ANR makes available its datasets on funded projects, click here to find more.