Blanc SHS 2 - Blanc - SHS 2 - Développement humain et cognition, langage et communication

Designing spoken corpora for cross-linguistic research – CORTYPO

Submission summary

The aim of the CORTYPO project is the construction of an innovative system of linguistic annotation of natural language corpora in lesser-described spoken languages, with the aim to provide a set of data and tools in order to test linguistic hypotheses in a typological perspective.
In order to achieve this goal a number of fundamental theoretical questions need to be resolved with respect to language form and language functions. Crucially, the project addresses the question of what kind of theoretical apparatus is required for the comparison of languages displaying different formal means and different functions.
By implementing theoretical solutions into corpus-design and database-design, the project provides the basis for the empirical testing and falsification of hypotheses, and allows the elaboration of new hypotheses on language structure and cross-linguistic comparison. By proposing solutions to the problem of linguistic interoperability, it paves the way for large-scale typological work based on first-hand natural language data.

The innovative nature of the project is twofold:
(1) an annotation of sound-indexed texts that is based on the formal means existing in a given language, including prosodic means, linear orders, and phonological and morphological processes allowing the determination of syntactic and functional units in the language ;
(2) the elaboration of a functional database linked with to the corpus. The database contains complex information about the functions grammaticalized in each language and the forms which code those functions. The database is linked to the corpus through a query engine so that forms, and ultimately contextualized examples, can be retrieved.
The data set composed of the corpus and the database is complemented by a Category table that provides terminological information and definitions. This table ensures the transparency and replicability of analyses, and provides input for the ISOcat registry.

The deliverables of the project constitute a pilot solution to cross-linguistic comparison based on empirical data from typologically different languages.

Project coordination

Amina METTOUCHI (Langage, Langues et Cultures d'Afrique Noire (LLACAN)) – mettouchi@vjf.cnrs.fr

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.

Partner

LLACAN Langage, Langues et Cultures d'Afrique Noire (LLACAN)

Help of the ANR 229,992 euros
Beginning and duration of the scientific project: February 2013 - 36 Months

Useful links

Explorez notre base de projets financés

 

 

ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter