Methodological and technical challenges of a corpus-based study of Naija
Abstract
This paper presents early refl ections on the NaijaSynCor survey (NSC) fi nanced by the French Agence Nationale de la Recherche. The nature of the language surveyed (Naija, a post-creole spoken in Nigeria as a second language by close to 100 million speakers) has induced a specifi c choice oftheoretical framework (variationist sociolinguistics) and methodology (a corpus-based study using Natural Language Processing). Half-way through the 4 year-study, the initial methodological choices are assessed taking into account the nature of the data that has been collected, and the problems that occurred as early as the initial stages of their annotation.