The RefLex project : documenting and exploring lexical resources in Africa
Abstract
The RefLex project aims at testing a set of fundamental hypotheses concerning the structure and the evolution of African languages that are often mentioned in the literature, but whose validity was never demonstrated on an empirical basis. These hypotheses share the peculiarity that they can only be tested by means of a quantitative approach, which in turn presupposes the existence of a comprehensive documentation. The more than 2,200 languages spoken in Africa are characterized by great typological diversity, but also display some common characteristics, on each level of linguistic analysis, that go beyond the linguistic phyla and areas. So far, it has never been possible to conduct an in-depth study of these characteristics (e.g., logophoric pronouns, labiovelar consonants, etc.), due mainly to a lack of available data on the majority of African languages. Reflex solves this problem by fully exploiting the existing lexical documentation, which is in fact much larger than the grammatical documentation and yet often ignored in especially typological studies. One of the goals of RefLex is to make the scattered and hard to find lexical documentation available to interested researchers. Indeed, the lexical corpus of African languages, which is available on line for the whole scientific community, gives immediate access to a considerable wealth of data (as to june 2013, 460,000 lexical units for more than 370 languages, but we expect more than 1,000,000 entries within the next two years, representing 1,000 languages). This corpus will allow dramatic progress in several domains: typology, phylogeny, lexical semantics, lexical spread, areal linguistics. RefLex will be the largest online comparative database worldwide. Moreover, the database will be different from other existing databases at two crucial levels: (i) the possibility to have a direct online access to the original documents which are the basis of the digital data, which makes this corpus a true r