Decolonising Scientific Writing for Africa

When it comes to scientific communication, language matters. Jantjies (2016) demonstrates how language matters when it comes to STEM education: students perform better when taught mathematics in their home language. Language matters, in scientific communication, in how it can dehumanise the people it chose to study - Robyn Humphreys, at the #LanguageMatters seminar at UCT Heritage 2020, noted the following “During the continent’s colonial past, language – including scientific language – was used to control and subjugate and justify marginalisation and invasive research practices”.

The ability of science being discussed in local indigenous languages not only has the ability to reach more people who do not speak English as a first language, it also has the ability to integrate the facts and methods of science into cultures that have been denied it in the past. As sociology professor Kwesi Kwaa Prah put it in a 2007 report to the Foundation for Human Rights in South Africa, “Without literacy in the languages of the masses, science and technology cannot be culturally-owned by Africans. Africans will remain mere consumers, incapable of creating competitive goods, services and value-additions in this era of globalization.” (Prah, Kwesi Kwaa, 2007). When science becomes "foreign" or something non-African, when one has to assume another identity just to theorize and practice science, it's a subjugation of the mind - mental colonization.

There is a substantial amount of distrust in science, in particular by many black South Africans who can cite many examples of how it has been abused for oppression in the past. In addition, the communication and education of science was weaponized by the oppressive apartheid government in South Africa, and that has left many seeds of distrust in citizens who only experience science being discussed in English.

Through government-funded efforts, European derived Languages such as Afrikaans, English, French, and Portuguese, have been used as vessels of science, but African indigenous languages have not been given the same treatment. Modern digital tools like machine learning offer new, low-cost opportunities for scientific terms and ideas to be communicated in African indigenous languages.

During the COVID19 pandemic, many African governments did not communicate about COVID19 in the most wide-spread languages in their country. ∀ et al (2020) demonstrated the difficulty in translating COVID19 surveys since the only data that was available to train the models was religious data. Furthermore, they noted that scientific words did not exist in the respective African languages.

Thus, we propose to build a multilingual scientific parallel corpora of African research, by translating African papers released on AfricArxiv into multiple African languages.

Use cases:
- A machine translation tool for AfricArxiv to aid translation of their research to and from African languages
- Terminology developed will be submitted to respective boards for addition to official language glossaries for further improvements to scientific communication
- A machine translation tool for African universities to ensure accessibility of their publications
- A machine translation tool for scientific journalists to assist in widely distributing their work on the African continent
- More generally, the datasets developed would be a welcome addition

The selection of languages for this grant was based on the following factors:
- Prevalence of usage of the languages in question
- Existing relationships with co-ordinators, trusted translation partners, journalists and linguists for the languages in the Masakhane community
- The lack of existing open-source translation data in non-religious contexts
- The geographic diversity of the languages to be representative of the continent

Based on the above, we have selected the following 6 languages: Zulu, Northern Sotho, Yoruba, Hausa, Luganda, Amharic.

