L'Agence nationale de la recherche Des projets pour la science

Translate this page in english

Contenus et Interactions (CONTINT) 2010
Projet Datalift

Un ascenseur pour les données: de la donnée brute publiée vers la donnée sémantique interconnectée.

DataLift's ambition is to act as a catalyst for the emergence of the Web of data. Made of large raw data sources interlinked together, the Web of data takes advantage of semantic Web technologies in order to ensure interoperability and intelligibility of the data. Adding data to the Web of data consists of:

* publishing data as RDF graphs: a very simple data format,
* linking these data sets together, by identifying equivalent resources in other data sources,
* describing the vocabulary used in published data through ontologies.


The Web of data has taken a strong acceleration recently with the publication by UK and US governments of public data (data.gov, data.gov.uk). Similar initiatives are flourishing across the world and, in France, data providers such as INSEE or IGN have already started experiments. Various citizen groups such as the Fondation internet nouvelle génération (FING) and RegardCitoyen.org are willing to take advantage of such data and the Agence du Patrimoine Immatériel de l'État (APIE) aims at providing a "portal" for such public data.

However, if isolated data publication initiatives using semantic Web technologies exist, they remain limited for several reasons:

1. Similarly to the Web, the power of which comes from the interconnection of pages together through hyperlinks, the Web of data will only make sense if the data it contains are interconnected. A few interlinking tools already exist but require too much manual intervention for reaching Web scale.
2. A large number of ontologies covering various domains are quickly appearing, raising the following problems: many ontologies overlap and require to be aligned together for proper interoperability between the data they describe. Selecting the appropriate ontology for describing a dataset is a tedious task. Once an ontology selected, the data to be published eventually needs to be converted in order to be linked to the ontology. Solving these technical problems requires expertise, which leads to publication processes that are not suited to the publication of large amounts of heterogeneous data.
3. In order to ensure a publication space which is at the same time open and giving to each publisher its rights on the published data, it is necessary to provide methods for rights management and data access.
4. Finally, and again analogically with the Web, a critical amount of published data is needed in order to create a snowball effect similar to the one that led the Web to take the importance it has nowadays.

The goal of DataLift is to address these four challenges in an integrated way. More specifically, it will provide a complete path from raw data to fully interlinked, identified, qualified and "certified" linked data sets; it will develop a platform for supporting the processes of:

* selecting ontologies for publishing data;
* converting data to the appropriate format (RDF using the selected ontology);
* interlinking data with other data sources;
* publishing linked data.

In order to achieve this ambitious program, DataLift will unlock key obstacles in the development of the web of data by performing research on ontology selection and evaluation, on automatic link generation and evolution, on right expression and management.

Partenaires

AOI ATOS ORIGIN INTEGRATION

CR INRIA - Grenoble - Rhône-Alpes - EXMO INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE - (INRIA Siège)

EURECOM EURECOM

FING FONDATION INTERNET NOUVELLE GENERATION

IGN INSTITUT GEOGRAPHIQUE NATIONAL

INRIA Sophia Antipolis Méditerranée INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE - (INRIA Siège)

INSEE INSTITUT NATIONAL STATISTIQUE ETUDES ECONOMIQUES( INSEE) - DIRECTION GENERALE

LIRMM UNIVERSITE DE MONTPELLIER II [SCIENCES TECHNIQUES DU LANGUEDOC]

Mondeca MONDECA

Aide de l'ANR 1 101 605 euros
Début et durée du projet scientifique - 36 mois

 

Programme ANR : Contenus et Interactions (CONTINT) 2010

Référence projet : ANR-10-CORD-0009

Coordinateur du projet :
Monsieur Scharffe FRANÇOIS (INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE - (INRIA Siège))
francois.scharffe@nullinria.fr

 

Revenir à la page précédente

 

L'auteur de ce résumé est le coordinateur du projet, qui est responsable du contenu de ce résumé. L'ANR décline par conséquent toute responsabilité quant à son contenu.