I am a Research Associate at the University of Pretoria’s Data Science for Social Impact Research Group. My work focuses on improving Natural Language Processing (NLP) for underrepresented African languages. I hold a PhD from Bayero University Kano, where I focused on using domain awareness, quality estimation, and transfer learning to optimize approaches for leveraging monolingual data to enhance machine translation in low-resource languages. I currently serve as Principal Investigator of AfriGemma, advancing African-language NLP for research, healthcare, and societal benefit.
I have led and contributed to a range of impactful projects, including developing NLP resources and techniques for tasks such as LLM building, sentiment analysis, emotion analysis, semantic relatedness, hate and offensive speech analysis, machine translation, named-entity recognition, and sentence alignment. I actively collaborate with international research communities such as Masakhane, HausaNLP, LITHME, and the Open Language Data Initiative, and have contributed to several Semantic Evaluation shared tasks: AfriSenti 2023, SemRel 2024, BRIGHTER 2025, Dim-ABSA 2026, and POLAR 2026.
I’m passionate about mentoring students and early-career researchers, and I strive to integrate ethical, human-centric approaches into my work, aligning with my commitment to advancing NLP for social good. My research is driven by the belief that AI should be inclusive and accessible, and I aim to make meaningful contributions to the field while addressing the needs of underrepresented communities.
Research focus
Translation & Speech
Low-resource neural machine translation, back-translation, and automatic speech recognition & synthesis for African languages.
Sentiment, Emotion & Safety
Datasets and models for sentiment, emotion, semantic relatedness, and hate & offensive language analysis.
LLMs & Open Resources
African-centric large language models (e.g. AfriGemma), benchmarks, and openly available language resources.
News
| Jun 02, 2026 | Returning to AIMS South Africa as Visiting Professor to teach NLP & LLMs for the upcoming Nov–Dec 2026 cohort. |
|---|---|
| May 30, 2026 | New preprint: Temporal Simultaneity Predicts Annotation Quality in Sentiment Corpora. 📄 |
| May 28, 2026 | New preprint: AfriScience-MT — Towards Decolonizing Science in Africa through Text Translation. 📄 |
| Mar 25, 2026 | Several papers at AfricaNLP 2026 (Rabat), where I also serve as a workshop co-organiser. |
| Jan 15, 2026 | Co-organising two SemEval-2026 shared tasks — multilingual online polarization detection, and dimensional aspect-based sentiment & stance. |
Selected publications
- ACLBRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 LanguagesIn Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025
- LRECHausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine TranslationIn Proceedings of the Language Resources and Evaluation Conference, Jun 2022
- arXiv
- arXiv