Idris Abdulmumin

Idris Abdulmumin, PhD

Research Associate, University of Pretoria DSFSI.

I build open datasets, models, and benchmarks that bring Natural Language Processing — machine translation, speech, and sentiment, emotion & safety analysis — to African and other low-resource languages.

I am a Research Associate at the University of Pretoria’s Data Science for Social Impact Research Group. My work focuses on improving Natural Language Processing (NLP) for underrepresented African languages. I hold a PhD from Bayero University Kano, where I focused on using domain awareness, quality estimation, and transfer learning to optimize approaches for leveraging monolingual data to enhance machine translation in low-resource languages. I currently serve as Principal Investigator of AfriGemma, advancing African-language NLP for research, healthcare, and societal benefit.

I have led and contributed to a range of impactful projects, including developing NLP resources and techniques for tasks such as LLM building, sentiment analysis, emotion analysis, semantic relatedness, hate and offensive speech analysis, machine translation, named-entity recognition, and sentence alignment. I actively collaborate with international research communities such as Masakhane, HausaNLP, LITHME, and the Open Language Data Initiative, and have contributed to several Semantic Evaluation shared tasks: AfriSenti 2023, SemRel 2024, BRIGHTER 2025, Dim-ABSA 2026, and POLAR 2026.

I’m passionate about mentoring students and early-career researchers, and I strive to integrate ethical, human-centric approaches into my work, aligning with my commitment to advancing NLP for social good. My research is driven by the belief that AI should be inclusive and accessible, and I aim to make meaningful contributions to the field while addressing the needs of underrepresented communities.

Research focus

Translation & Speech

Low-resource neural machine translation, back-translation, and automatic speech recognition & synthesis for African languages.

Sentiment, Emotion & Safety

Datasets and models for sentiment, emotion, semantic relatedness, and hate & offensive language analysis.

LLMs & Open Resources

African-centric large language models (e.g. AfriGemma), benchmarks, and openly available language resources.

News

Jun 02, 2026 Returning to AIMS South Africa as Visiting Professor to teach NLP & LLMs for the upcoming Nov–Dec 2026 cohort.
May 30, 2026 New preprint: Temporal Simultaneity Predicts Annotation Quality in Sentiment Corpora. 📄
May 28, 2026 New preprint: AfriScience-MT — Towards Decolonizing Science in Africa through Text Translation. 📄
Mar 25, 2026 Several papers at AfricaNLP 2026 (Rabat), where I also serve as a workshop co-organiser.
Jan 15, 2026 Co-organising two SemEval-2026 shared tasks — multilingual online polarization detection, and dimensional aspect-based sentiment & stance.

Selected publications

  1. ACL
    BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages
    Shamsuddeen Hassan Muhammad, Nedjma Ousidhoum, Idris Abdulmumin, and 45 more authors
    In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025
  2. DiB
    ZASCA-Sum: A Dataset of the South Africa Supreme Courts of Appeal Judgments and Media Summaries for Legal Documents Summarization Research
    Idris Abdulmumin and Vukosi Marivate
    Data in Brief, Jul 2025
  3. Mach. Trans.
    Tag-less back-translation
    Idris Abdulmumin, Bashir Shehu Galadanci, and Garba Aliyu
    Machine Translation, Dec 2021
  4. LREC
    Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation
    Idris Abdulmumin, Satya Ranjan Dash, Musa Abdullahi Dawud, and 7 more authors
    In Proceedings of the Language Resources and Evaluation Conference, Jun 2022
  5. arXiv
    AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation
    Idris Abdulmumin, Tajuddeen Gwadabe, Shamsuddeen Hassan Muhammad, and 11 more authors
    Jun 2026
  6. arXiv
    Temporal Simultaneity Predicts Annotation Quality in Sentiment Corpora
    Idris Abdulmumin, Mokgadi Penelope Matloga, Tadesse Destaw Belay, and 5 more authors
    Jun 2026

Latest posts