Publications

My publications.

2026

  1. arXiv
    AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation
    Idris Abdulmumin, Tajuddeen Gwadabe, Shamsuddeen Hassan Muhammad, and 11 more authors
    2026
  2. arXiv
    Temporal Simultaneity Predicts Annotation Quality in Sentiment Corpora
    Idris Abdulmumin, Mokgadi Penelope Matloga, Tadesse Destaw Belay, and 5 more authors
    2026
  3. arXiv
    DimStance: Multilingual Datasets for Dimensional Stance Analysis
    Jonas Becker, Liang-Chih Yu, Shamsuddeen Hassan Muhammad, and 14 more authors
    2026
  4. arXiv
    CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data
    Pedro Ortiz Suarez, Laurie Burchell, Catherine Arnett, and 94 more authors
    2026
  5. arXiv
    Afri-MCQA: Multimodal Cultural Question Answering for African Languages
    Atnafu Lambebo Tonja, Srija Anand, Emilio Villa-Cueva, and 16 more authors
    2026
  6. arXiv
    Swivuriso: The South African Next Voices Multilingual Speech Dataset
    Vukosi Marivate, Kayode Olaleye, Sitwala Mundia, and 19 more authors
    2026
  7. AfricaNLP
    Full Fine-Tuning vs. Parameter-Efficient Adaptation for Low-Resource African ASR: A Controlled Study with Whisper-Small
    Sukairaj Hafiz Imam, Muhammad Yahuza Bello, Hadiza Ali Umar, and 4 more authors
    In Proceedings of the 7th Workshop on African Natural Language Processing (AfricaNLP 2026), Mar 2026
  8. AfricaNLP
    Trust but Check: LLM-Assisted Review of Human Translations in African Languages
    Tadesse Destaw Belay, Henok Biadglign Ademtew, Idris Abdulmumin, and 24 more authors
    In 7th Workshop on African Natural Language Processing, Mar 2026
  9. AfricaNLP
    The Rise of AfricaNLP: Contributions, Contributors, and Community Impact (2005–2025)
    Tadesse Destaw Belay, Kedir Yassin Hussen, Sukairaj Hafiz Imam, and 10 more authors
    In 7th Workshop on African Natural Language Processing, Mar 2026
  10. arXiv
    Beyond Majority Voting: Agreement-Based Clustering to Model Annotator Perspectives in Subjective NLP Tasks
    Tadesse Destaw Belay, Ibrahim Said Ahmad, Idris Abdulmumin, and 6 more authors
    May 2026
  11. arXiv
    NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages
    Marie Maltais, Yejin Jeon, Min Ma, and 7 more authors
    Apr 2026
  12. SemEval
    SemEval-2026 Task 3: Dimensional Aspect-Based Sentiment Analysis (DimABSA)
    Liang-Chih Yu, Jonas Becker, Shamsuddeen Hassan Muhammad, and 14 more authors
    Apr 2026
  13. SemEval
    SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization
    Usman Naseem, Robert Geislinger, Juan Ren, and 31 more authors
    Apr 2026

2025

  1. NAACL
    AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
    Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, and 24 more authors
    In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Apr 2025
  2. ACL
    BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages
    Shamsuddeen Hassan Muhammad, Nedjma Ousidhoum, Idris Abdulmumin, and 45 more authors
    In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025
  3. DiB
    ZASCA-Sum: A Dataset of the South Africa Supreme Courts of Appeal Judgments and Media Summaries for Legal Documents Summarization Research
    Idris Abdulmumin and Vukosi Marivate
    Data in Brief, Jul 2025
  4. WMT
    Findings of the WMT 2025 Shared Task of the Open Language Data Initiative
    David Dale, Laurie Burchell, Jean Maillard, and 4 more authors
    In Proceedings of the Tenth Conference on Machine Translation, Nov 2025
  5. EMNLP
    AfroXLMR-Social: Adapting Pre-trained Language Models for African Languages Social Media Text
    Tadesse Destaw Belay, Israel Abebe Azime, Ibrahim Said Ahmad, and 5 more authors
    In Findings of the Association for Computational Linguistics: EMNLP 2025, Nov 2025
  6. SemEval
    SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection
    Shamsuddeen Hassan Muhammad, Nedjma Ousidhoum, Idris Abdulmumin, and 18 more authors
    In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), Jul 2025
  7. IWSLT
    Findings of the IWSLT 2025 Evaluation Campaign
    Idris Abdulmumin, Victor Agostinelli, Tanel Alumäe, and 49 more authors
    In Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025), Jul 2025
  8. IWSLT
    QUESPA Submission for the IWSLT 2025 Dialectal and Low-resource Speech Translation Task
    John E. Ortega, Rodolfo Joel Zevallos, William Chen, and 1 more author
    In Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025), Jul 2025
  9. AfricaNLP
    Automatic Speech Recognition for African Low-Resource Languages: Challenges and Future Directions
    Sukairaj Hafiz Imam, Babangida Sani, Dawit Ketema Gete, and 6 more authors
    In Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025), Jul 2025
  10. SemEval
    HausaNLP at SemEval-2025 Task 2: Entity-Aware Fine-tuning vs. Prompt Engineering in Entity-Aware Machine Translation
    Abdulhamid Abubakar, Hamidatu Abdulkadir, Rabiu Ibrahim, and 9 more authors
    In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), Jul 2025
  11. SemEval
    HausaNLP at SemEval-2025 Task 3: Towards a Fine-Grained Model-Aware Hallucination Detection
    Maryam Bala, Amina Abubakar, Abdulhamid Abubakar, and 6 more authors
    In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), Jul 2025
  12. AfricaNLP
    Who Wrote This? Identifying Machine vs Human-Generated Text in Hausa
    Babangida Sani, Aakansha Soy, Sukairaj Hafiz Imam, and 5 more authors
    In Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025), Jul 2025
  13. AfricaNLP
    HausaNLP: Current Status, Challenges and Future Directions for Hausa Natural Language Processing
    Shamsuddeen Hassan Muhammad, Ibrahim Said Ahmad, Idris Abdulmumin, and 8 more authors
    In Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025), Jul 2025
  14. arXiv
    POLAR: A Benchmark for Multilingual, Multicultural, and Multi-Event Online Polarization
    Usman Naseem, Juan Ren, Saba Anwar, and 14 more authors
    Jul 2025
  15. arXiv
    Automatic Speech Recognition (ASR) for African Low-Resource Languages: A Systematic Literature Review
    Sukairaj Hafiz Imam, Tadesse Destaw Belay, Kedir Yassin Husse, and 7 more authors
    Jul 2025

2024

  1. SemEval
    SemEval Task 1: Semantic Textual Relatedness for African and Asian Languages
    Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, and 14 more authors
    In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), Jun 2024
  2. LREC-COLING
    Mitigating Translationese in Low-resource Languages: The Storyboard Approach
    Garry Kuwanto, Eno-Abasi E. Urua, Priscilla Amondi Amuok, and 21 more authors
    In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024
  3. SIGIR
    CIRAL: A Test Collection for CLIR Evaluation in African Languages
    Mofetoluwa Adeyemi, Akintunde Oladipo, Xinyu Zhang, and 20 more authors
    In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2024
  4. SACAIR
    Analysing Public Transport User Sentiment on Low Resource Multilingual Data
    Rozina Myoya, Vukosi Marivate, and Idris Abdulmumin
    In Proceedings of the Fifth Southern African Conference for Artificial Intelligence Research, Jul 2024
  5. WOAH
    HausaHate: An Expert Annotated Corpus for Hausa Hate Speech Detection
    Francielle Vargas, Samuel Guimarães, Shamsuddeen Hassan Muhammad, and 6 more authors
    In Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024), Jun 2024
  6. WMT
    Correcting FLORES Evaluation Dataset for Four African Languages
    Idris Abdulmumin, Sthembiso Mkhwanazi, Mahlatse Mbooi, and 7 more authors
    In Proceedings of the Ninth Conference on Machine Translation, Nov 2024
  7. WMT
    Findings of WMT2024 English-to-Low Resource Multimodal Translation Task
    Shantipriya Parida, Ondřej Bojar, Idris Abdulmumin, and 2 more authors
    In Proceedings of the Ninth Conference on Machine Translation, Nov 2024
  8. ACL
    SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 13 Languages
    Nedjma Ousidhoum, Shamsuddeen Muhammad, Mohamed Abdalla, and 24 more authors
    In Findings of the Association for Computational Linguistics: ACL 2024, Aug 2024

2023

  1. ACL
    HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
    Shantipriya Parida, Idris Abdulmumin, Shamsuddeen Hassan Muhammad, and 7 more authors
    In Findings of the Association for Computational Linguistics: ACL 2023, Jul 2023
  2. SemEval
    HausaNLP at SemEval-2023 Task 10: Transfer Learning, Synthetic Data and Side-information for Multi-level Sexism Classification
    Saminu Mohammad Aliyu, Idris Abdulmumin, Shamsuddeen Hassan Muhammad, and 4 more authors
    In Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), Jul 2023
  3. ICCAIT
    Analyzing COVID-19 Vaccination Sentiments in Nigerian Cyberspace: Insights from a Manually Annotated Twitter Dataset
    Ibrahim Said Ahmad, Lukman Jibril Aliyu, Auwal Abubakar Khalid, and 6 more authors
    In Proceedings of the International Conference on Computing and Advances in Information Technology (ICCAIT 2023), Nov 2023
  4. ICCAIT
    Leveraging Closed-Access Multilingual Embedding for Automatic Sentence Alignment in Low Resource Languages
    Idris Abdulmumin, Auwal Abubakar Khalid, Shamsuddeen Hassan Muhammad, and 5 more authors
    In Proceedings of the International Conference on Computing and Advances in Information Technology (ICCAIT 2023), Nov 2023
  5. SemEval
    SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)
    Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Seid Muhie Yimam, and 7 more authors
    In Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), Jul 2023
  6. IJCNLP
    MasakhaNEWS: News Topic Classification for African languages
    David Ifeoluwa Adelani, Marek Masiak, Israel Abebe Azime, and 62 more authors
    In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, Nov 2023
  7. EMNLP
    AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages
    Shamsuddeen Muhammad, Idris Abdulmumin, Abinew Ayele, and 24 more authors
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023

2022

  1. LREC
    NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis
    Shamsuddeen Hassan Muhammad, David Ifeoluwa Adelani, Sebastian Ruder, and 8 more authors
    In Proceedings of the Language Resources and Evaluation Conference, Jun 2022
  2. NAACL
    A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation
    David Adelani, Jesujoba Alabi, Angela Fan, and 42 more authors
    In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jul 2022
  3. AfricaNLP
    NECAT-CLWE: A Simple But Efficient Parallel Data Generation Approach for Unsupervised and Semi-Supervised Neural Machine Translation
    Rabiu Abdullahi Ibrahim and Idris Abdulmumin
    In 3rd Workshop on African Natural Language Processing, Jul 2022
  4. AfricaNLP
    The African Stopwords Project: Curating Stopwords for African Languages
    Chris Chinenye Emezue, Hellina Hailu Nigatu, Cynthia Thinwa, and 12 more authors
    In 3rd Workshop on African Natural Language Processing, Jul 2022
  5. WiNLP
    Domain-Specific Lexicon-Based Sentiment Analysis using Contextual Shifter Patterns
    Shamsuddeen Muhammad, Pavel Brazdil, and Idris Abdulmumin
    In Proceedings of the Sixth Workshop on Widening Natural Language Processing, Dec 2022
  6. WiNLP
    HERDPhobia: A Dataset for Hate Speech Detection against Fulani Herdsmen in Nigeria
    Saminu Aliyu, Gregory Wajiga, Muhammad Murtala, and 3 more authors
    In Proceedings of the Sixth Workshop on Widening Natural Language Processing, Dec 2022
  7. EMNLP
    MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
    David Ifeoluwa Adelani, Graham Neubig, Sebastian Ruder, and 42 more authors
    In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Dec 2022
  8. IEEE
    Quantity vs. Quality of Monolingual Source Data in Automatic Text Translation: Can It Be Too Little If It Is Too Good?
    Idris Abdulmumin, Bashir Shehu Galadanci, Shamsuddeen Hassan Muhammad, and 1 more author
    In 2022 IEEE Nigeria 4th International Conference on Disruptive Technologies for Sustainable Development (NIGERCON), Dec 2022
  9. LREC
    Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation
    Idris Abdulmumin, Satya Ranjan Dash, Musa Abdullahi Dawud, and 7 more authors
    In Proceedings of the Language Resources and Evaluation Conference, Jun 2022
  10. WMT
    Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages
    Idris Abdulmumin, Michael Beukman, Jesujoba Alabi, and 8 more authors
    In Proceedings of the Seventh Conference on Machine Translation, Dec 2022
  11. arXiv
    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
    Teven Le Scao, Angela Fan, Christopher Akiki, and 387 more authors
    Dec 2022

2021

  1. Mach. Trans.
    Tag-less back-translation
    Idris Abdulmumin, Bashir Shehu Galadanci, and Garba Aliyu
    Machine Translation, Dec 2021
  2. IAENG EL
    A hybrid approach for improved low resource neural machine translation using monolingual data
    Idris Abdulmumin, Bashir Shehu Galadanci, Abubakar Isah, and 2 more authors
    Engineering Letters, Nov 2021
  3. LNCS
    Data Selection as an Alternative to Quality Estimation in Self-Learning for Low Resource Neural Machine Translation
    Idris Abdulmumin, Bashir Shehu Galadanci, Ibrahim Said Ahmad, and 1 more author
    In Computational Science and Its Applications – ICCSA 2021, Nov 2021
  4. CCIS
    Enhanced Back-Translation for Low Resource Neural Machine Translation Using Self-training
    Idris Abdulmumin, Bashir Shehu Galadanci, and Abubakar Isa
    In Information and Communication Technology and Applications, Nov 2021

2019

  1. IEEE
    HauWE: Hausa Words Embedding for Natural Language Processing
    Idris Abdulmumin and Bashir Shehu Galadanci
    In 2019 2nd International Conference of the IEEE Nigeria Computer Chapter, NigeriaComputConf 2019, Nov 2019