Project NaijaSynCor (A Corpus-based Macro-Syntactic Study of Naija (Nigerian Pidgin)) | ANR - Agence Nationale de la Recherche ANR funded project | ANR - Agence Nationale de la Recherche

The French National Research Agency Projects for science

Voir cette page en français

ANR funded project

Défi des autres savoirs (DS10) 2016
Projet NaijaSynCor

A Corpus-based Macro-Syntactic Study of Naija (Nigerian Pidgin)

NaijaSynCor takes an exhaustive and in-depth look at the structure of Naija (Nigerian Pidgin) in Nigeria today. Spoken by educated Nigerians, it has been proved by Deuber (2005) to develop in Lagos as a discrete language, separate from Nigerian English. This study proposes to assess whether this holds true for the rest of Nigeria where Naija is spoken by over 75 million speakers. It examines diachronic, diatopic, diaphasic, diastratic and genre variation.
The project is a collaborative effort of two research units that have proved their expertise in corpus annotation in previous programmes: Llacan, on lesser-described languages (Corpafroas and Cortypo); Modyco, on the interaction of prosody and syntax in French (ANR Rhapsodie) and the development of large treebanks (ANR Orféo), and two Nigerian leading experts on Naija (F. Egbokhare & C. Ofulue). The macrosyntactic framework developed in the ANR Rhapsodie project (Lacheret, Kahane et al. 2014) has proved to be particularly efficient in dealing with the specificities of oral corpora, e.g. piles stacking, disfluencies, repetitions, discourse markers, overlaps, co-enunciation, false starts, self-repairs and truncations. This method is data-driven, inductive (the relevant units are identified through annotation) and modular.
The tools developed by the research team in these previous corpus study programs are robust and mature enough to focus on the linguistic problem posed by Naija: in its geographical and functional expansion, does Naija maintain its status as a discrete language, separate from Nigerian English, or does it undergo decreolization? While answering this question, the research programme aims at overcoming two remaining technological challenges, (i) automatic identification of illocutionary units based on intonation data as a parameter; (ii) building a parser integrating intonation data as a parameter.
Through the creation of a deeply annotated corpus, the project documents the emergence of Naija as a language at the national level, challenging existing theories of the development of creoles and languages in contact. Capitalizing on the latest developments in the area of corpus annotation, this innovative approach to the dynamics of contact and change in the areas of human behaviour and sociology of language will powerfully impact the methodology and technology of research on emerging languages.


LLACAN Langage, langues et cultures d'Afrique noire

Modyco Modèles, Dynamiques, Corpus, UMR7114

ANR grant: 356 643 euros
Beginning and duration: février 2017 - 42 mois


ANR Programme: Défi des autres savoirs (DS10) 2016

Project ID: ANR-16-CE27-0007

Project coordinator:
Monsieur Bernard CARON (Langage, langues et cultures d'Afrique noire)


Back to the previous page


The project coordinator is the author of this abstract and is therefore responsible for the content of the summary. The ANR disclaims all responsibility in connection with its content.