Corpus - Corpus, données et outils de la recherche en sciences humaines et sociales

Parallel corpora in languages of the Greater Himalayan area – HimalCo

Submission summary

This project proposes to build parallel corpora for three sub-groups of the Tibeto-Burman family, covering a total of ten little-described oral languages. These corpora will be made up of native texts of similar or near-identical content for each sub-group (Kiranti, from Nepal; Rgyalrong and Naish, from China), drawing from the strong mythological traditions of the Greater Himalayan region. Cross-language comparison of highly similar native material will allow the typologically salient features of each sub-group to come to the fore with greater precision than can be obtained through elicitation.
HimalCo includes the two essential steps of (i) first-hand data collection in the field (in Nepal and China) and (ii) state-of-the-art transcription, annotation and formatting of the entire data set. In addition to classical interlinear morphemic glossing, translation and sound synchronization as implemented in the LACITO Archive (to which the participants are regular contributors), the narratives will be organized into parallel corpora in each subgroup, and the lexical data will serve to prepare talking dictionaries: dictionaries combined with sound recordings of individual entries and of example sentences. The team, which includes an IT specialist, will develop simple interfaces for parallel corpora and for talking dictionaries, allowing the comparison of material in individual languages, across languages of a subgroup, and across subgroups. This project will provide the solid empirical basis which has heretofore been lacking for research on these languages: all the materials will be made freely available online through the LACITO Archive, whose interface will be enriched. These objectively verifiable data will be available for the in-depth investigation of a range of topical issues on these languages, including diachronic as well as synchronic research.

Project coordination

Guillaume Jacques (Centre de recherches sur l'Asie Orientale) –

The author of this summary is the project coordinator, who is responsible for the content of this summary. The ANR declines any responsibility as for its contents.


CRLAO-CNRS Centre de recherches sur l'Asie Orientale
LACITO-CNRS Langues et civilisations à tradition orale

Help of the ANR 198,000 euros
Beginning and duration of the scientific project: December 2012 - 36 Months

Useful links

Explorez notre base de projets financés



ANR makes available its datasets on funded projects, click here to find more.

Sign up for the latest news:
Subscribe to our newsletter