Machine learning approaches for detecting hate speech in African languages on social media: a systematic literature review

Banchale Adhi Gufu, Audrey Mbogho, Edward Ombui

Abstract


Significant research efforts has been made towards the development of machine learning models to detect hate speech worldwide. However, for Africa, which is home to over 2,000 languages with diverse linguistic dialects, there is an urgent need for inclusive natural language processing (NLP) tools tailored to the continent's linguistic diversity. More specifically, the literature review reveals that limited research has been conducted in hate speech detection in African languages, thus providing a strong justification for this study. Whereas hate speech has continued to intrigue African communities, detection has been hampered by the complexity of multiple languages, thus calling for a localised approach to solving the problem.

The study adopted the PRISMA guidelines for a systematic literature review (SLR) approach, synthesising findings from research published between 2019 and 2024 focusing on machine learning detection techniques in African low-resourced languages. The study contributes to the theoretical literature reviews and the development of Natural Language Processing (NLP) for African languages by providing a comprehensive review of research gaps in the machine learning models and datasets, highlighting the importance of multiple approaches and the need for collaborative, community-based measures to address the menace of hate speech perpetrated on social media. The findings reveal that machine learning models, including SVM, BiLSTM, mBERT, and XLM-RoBERTa, show significant potential in detecting hate speech in African languages. However, their performance is often constrained by the scarcity and limitations of available datasets. These findings provide valuable insights into the current state of hate speech detection for African languages and underscore the need to develop more comprehensive machine learning models and datasets for widely spoken African languages.

 

Received on, 30 July 2025

Accepted on, 18 October 2025

Published on, 26 November 2025


Keywords


Machine Learning; Hate Speech Detection; African Languages; Social Media; Natural Language Processing; Deep Learning; Text Classification; Multilingual NLP

Full Text:

PDF

References


C. Sinyangwe, D. Kunda, and W. P. Abwino, “Detecting Hate Speech and Offensive Language using Machine Learning in Published Online Content,” Zambia ICT Journal, vol. 7, no. 1, pp. 79–84, Mar. 2023, doi: 10.33260/zictjournal.v7i1.143.

F. M. Ndahinda and A. S. Mugabe, “Streaming Hate: Exploring the Harm of Anti-Banyamulenge and Anti-Tutsi Hate Speech on Congolese Social Media,” J Genocide Res, vol. 26, no. 1, pp. 48–72, Jan. 2024, doi: 10.1080/14623528.2022.2078578.

U. K. Schmid, A. S. Kümpel, and D. Rieger, “How social media users perceive different forms of online hate speech: A qualitative multi-method study,” New Media Soc, vol. 26, no. 5, pp. 2614–2632, May 2024, doi: 10.1177/14614448221091185.

C. Silva and P. Carvalho, “When Can Compliments and Humour Be Considered Hate Speech? A Perspective From Target Groups in Portugal,” Comunicacao e Sociedade, vol. 43, 2023, doi: 10.17231/COMSOC.43(2023).4135.

Y. Musa and G. Asuquo, “HATE SPEECH AND HUMAN SOCIETY: A CRITICAL ANALYSIS,” Humanities and Education Journal (SHE Journal), vol. 2, no. 3, 2021.

E. Ombui, L. Muchemi, and P. Wagacha, “Building and Annotating a Codeswitched Hate Speech Corpora,” International Journal of Information Technology and Computer Science, vol. 13, no. 3, 2021, doi: 10.5815/ijitcs.2021.03.03.

G. Njovangwa and G. Justo, “Automated Detection of Bilingual Obfuscated Abusive Words on Social Media Forums: A Case of Swahili and English Texts,” Tanzania Journal of Science, vol. 47, no. 4, 2021, doi: 10.4314/tjs.v47i4.2.

S. H. Muhammad et al., “AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages,” in EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings, 2023. doi: 10.18653/v1/2023.emnlp-main.862.

F. Vargas et al., “HausaHate: An Expert Annotated Corpus for Hausa Hate Speech Detection,” in Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024), Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, pp. 52–58. doi: 10.18653/v1/2024.woah-1.5.

S. H. Muhammad et al., “NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis,” in 2022 Language Resources and Evaluation Conference, LREC 2022, 2022.

T. M. Ababu, M. M. Woldeyohannis, and E. B. Getaneh, “Bilingual hate speech detection on social media: Amharic and Afaan Oromo,” J Big Data, vol. 12, no. 1, p. 30, Feb. 2025, doi: 10.1186/s40537-024-01044-y.

A. L. Tonja et al., “InkubaLM: A small language model for low-resource African languages,” arXiv preprint arXiv:2408.17024 , 2024.

A. G. Debele and M. M. Woldeyohannis, “Multimodal Amharic Hate Speech Detection Using Deep Learning,” in 2022 International Conference on Information and Communication Technology for Development for Africa, ICT4DA 2022, 2022. doi: 10.1109/ICT4DA56482.2022.9971436.

E. Ombui, L. Muchemi, and P. Wagacha, “Psychosocial Features for Hate Speech Detection in Code-switched Texts,” International Journal of Information Technology and Computer Science, vol. 13, no. 6, 2021, doi: 10.5815/ijitcs.2021.06.03.

A. O. Ridwanullah, S. Y. Sule, B. Usman, and L. U. Abdulsalam, “Politicization of Hate and Weaponization of Twitter/X in a Polarized Digital Space in Nigeria,” J Asian Afr Stud, vol. 60, no. 5, 2025, doi: 10.1177/00219096241230500.

E. Kotzé and B. Senekal, “Employing sentiment analysis for gauging perceptions of minorities in multicultural societies: An analysis of Twitter feeds on the Afrikaner community of Orania in South Africa,” The Journal for Transdisciplinary Research in Southern Africa, vol. 14, no. 1, 2018, doi: 10.4102/td.v14i1.564.

S. M. Gashe, S. M. Yimam, and Y. Assabie, “Hate Speech Detection and Classification in Amharic Text with Deep Learning,” 2024, doi: https://doi.org/10.48550/ARXIV.2408.03849.

D. I. Adelani et al., “MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, 2022. doi: 10.18653/v1/2022.emnlp-main.298.

O. Oriola and E. Kotzé, “Automatic detection of toxic south african tweets using support vector machines with N-gram features,” in 2019 6th International Conference on Soft Computing and Machine Intelligence, ISCMI 2019, 2019. doi: 10.1109/ISCMI47871.2019.9004298.

A. L. Tonja et al., “EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation,” arXiv:2403.13737, 2024, doi: https://doi.org/10.48550/arXiv.2403.13737.

S. Ranathunga, E. S. A. Lee, M. Prifti Skenduli, R. Shekhar, M. Alam, and R. Kaur, “Neural Machine Translation for Low-resource Languages: A Survey,” ACM Comput Surv, vol. 55, no. 11, 2023, doi: 10.1145/3567592.

M. S. Jahan and M. Oussalah, “A systematic review of hate speech automatic detection using natural language processing,” 2023. doi: 10.1016/j.neucom.2023.126232.

A. M. U. D. Khanday, B. Bhushan, R. H. Jhaveri, Q. R. Khan, R. Raut, and S. T. Rabani, “NNPCov19: Artificial Neural Network-Based Propaganda Identification on Social Media in COVID-19 Era,” Mobile Information Systems, vol. 2022, 2022, doi: 10.1155/2022/3412992.

N. S. Mullah and W. M. N. W. Zainon, “Advances in Machine Learning Algorithms for Hate Speech Detection in Social Media: A Review,” 2021. doi: 10.1109/ACCESS.2021.3089515.

R. Duwairi, A. Hayajneh, and M. Quwaider, “A Deep Learning Framework for Automatic Detection of Hate Speech Embedded in Arabic Tweets,” Arab J Sci Eng, vol. 46, no. 4, 2021, doi: 10.1007/s13369-021-05383-3.

A. Tontodimamma, E. Nissi, A. Sarra, and L. Fontanella, “Thirty years of research into hate speech: topics of interest and their evolution,” Scientometrics, vol. 126, no. 1, 2021, doi: 10.1007/s11192-020-03737-6.

F. E. Ayo, O. Folorunso, F. T. Ibharalu, and I. A. Osinuga, “Machine learning techniques for hate speech classification of twitter data: State-of-The-Art, future challenges and research directions,” 2020. doi: 10.1016/j.cosrev.2020.100311.

A. S. Alammary, “BERT Models for Arabic Text Classification: A Systematic Review,” 2022. doi: 10.3390/app12115720.

A. Vaswani et al., “Attention Is All You Need,” 2017, Advances in Neural Information Processing Systems (NeurIPS 2017), Curran Associates, Inc. [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

J. Devlin, M.-W. Chang, K. Lee, K. T. Google, and A. I. Language, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Naacl-Hlt 2019, no. Mlm, 2018.

A. Radford, K. Narasimhan, I. Sutskever, and T. Salimans, “Improving Language Understanding by Generative Pre-Training,” 2018. [Online]. Available: https://cdn.openai.com/research-covers/language

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised Multitask Learners,” 2019. [Online]. Available: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

O. Oriola and E. Kotzé, “Exploring Neural Embeddings and Transformers for Isolation of Offensive and Hate Speech in South African Social Media Space,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022. doi: 10.1007/978-3-031-10522-7_44.

S. M. Aliyu, “Beyond English: Offensive Language Detection in Low-Resource Nigerian Languages,” in 5th Workshop on African Natural Language Processing, 2022.

A. A. Sosimi, O. Ipinnimo, C. O. Folorunso, B. A. Adim, and E. Onoyom-Ita, “Hate Speech Identification in West Africa using Machine-Learning Techniques,” Arid Zone Journal of Engineering, Technology and Environment (AZOJETE), vol. 20, no. 2, pp. 491–508, 2024.

S. G. Tesfaye and K. Kakeba, “Automated Amharic Hate Speech Posts and Comments Detection Model Using Recurrent Neural Network,” Dec. 01, 2020. doi: 10.21203/rs.3.rs-114533/v1.

T. M. Ababu and M. M. Woldeyohannis, “Afaan Oromo Hate Speech Detection and Classification on Social Media,” in 2022 Language Resources and Evaluation Conference, LREC 2022, 2022.

A. S. Mohammad, G. M. Wajiga, M. Murtala, S. H. Muhammad, I. Abdulmumin, and I. S. Ahmad, “HERDPhobia: A Dataset for Hate Speech against Fulani in Nigeria,” 2022. [Online]. Available: https://arxiv.org/abs/2211.15262

E. Ombui, L. Muchemi, and P. Wagacha, “Hate Speech Detection in Code-switched Text Messages,” in 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies, ISMSIT 2019 - Proceedings, 2019. doi: 10.1109/ISMSIT.2019.8932845.

M. J. Page et al., “The PRISMA 2020 statement: an updated guideline for reporting systematic reviews,” BMJ, p. n71, Mar. 2021, doi: 10.1136/bmj.n71.

M. Diallo, C. Fourati, and H. Haddad, “Bambara Language Dataset for Sentiment Analysis,” 2021. [Online]. Available: https://arxiv.org/abs/2108.02524?utm_source

G. O. Ganfure, “Comparative analysis of deep learning based Afaan Oromo hate speech detection,” J Big Data, vol. 9, no. 1, 2022, doi: 10.1186/s40537-022-00628-w.

F. N. Njung’e, A. M. Oirere, and R. N. Ndung’u, “A Comparative Study of Transformer-based Models for Hate-Speech Detection in English-Kiswahili Code-Switched Social Media Text,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 13, no. 5, pp. 181–186, Oct. 2024, doi: 10.30534/ijatcse/2024/011352024.

S. Akrah and T. Pedersen, “DuluthNLP at SemEval-2023 Task 12: AfriSenti-SemEval: Sentiment Analysis for Low-resource African Languages using Twitter Dataset,” in Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), Stroudsburg, PA, USA: Association for Computational Linguistics, 2023, pp. 1697–1701. doi: 10.18653/v1/2023.semeval-1.236.

I. Ahmed, M. Abbas, R. Hatem, A. Ihab, and M. W. Fahkr, “Fine-Tuning Arabic Pre-Trained Transformer Models for Egyptian-Arabic Dialect Offensive Language and Hate Speech Detection and Classification,” in Proceedings of the 20th Conference on Language Engineering, ESOLEC 2022, 2022. doi: 10.1109/ESOLEC54569.2022.10009167.

M. Tonneau et al., “NaijaHate: Evaluating Hate Speech Detection on Nigerian Twitter Using Representative Data,” 2024. [Online]. Available: https://arxiv.org/abs/2403.19260

C. E. Ilevbare, J. O. Alabi, D. I. Adelani, F. D. Bakare, O. B. Abiola, and O. A. Adeyemo, “EkoHate: Abusive Language and Hate Speech Detection for Code-switched Political Discussions on Nigerian Twitter,” 2024. [Online]. Available: https://arxiv.org/abs/2404.18180

N. Arlim, S. Riyanto, R. Rodiah, and A. Hafiz, “GunadarmaXBRIN at SemEval-2023 Task 12: Utilization of SVM and AfriBERTa for Monolingual, Multilingual, and Zero-shot Sentiment Analysis in African Languages,” Association for Computational Linguistics, pp. 869–877, Nov. 2023, doi: 10.18653/v1/2023.semeval-1.120.

A. A. Ayele, E. A. Jalew, A. C. Ali, S. M. Yimam, and C. Biemann, “Exploring Boundaries and Intensities in Offensive and Hate Speech: Unveiling the Complex Spectrum of Social Media Discourse,” 2024. [Online]. Available: https://arxiv.org/abs/2404.12042

T. Davidson, D. Bhattacharya, and I. Weber, “Racial Bias in Hate Speech and Abusive Language Detection Datasets,” arXiv:1905.12516 [cs], Nov. 2019, [Online]. Available: https://arxiv.org/abs/1905.12516

N. James et al., “Limits of Language in Nigeria: The Hatred on Igbo Language,” SSRN Electronic Journal, 2024, doi: 10.2139/ssrn.4946190.

A. C. Mazari and H. Kheddar, “Deep Learning-based Analysis of Algerian Dialect Dataset Targeted Hate Speech, Offensive Language and Cyberbullying,” International Journal of Computing and Digital Systems, vol. 13, no. 1, pp. 965–972, Apr. 2023, doi: 10.12785/ijcds/130177.

F. M. Adam, A. Y. Zandam, and I. Inuwa-Dutse, “Detection and Analysis of Offensive Online Content in Hausa Language,” in Proceedings of the Second IJCAI AI for Good Symposium in Africa hosted by Deep Learning Indaba, California: International Joint Conferences on Artificial Intelligence Organization, Aug. 2024, pp. 2–14. doi: 10.24963/ijcai.aai4g.2024/1.

M. Degu, A. Tesfahun, and H. Takele, “Amharic Language Hate Speech Detection System from Facebook Memes Using Deep Learning System,” SSRN Electronic Journal, 2023, doi: 10.2139/ssrn.4389914.

C. Fourati, H. Haddad, A. Messaoudi, M. B. H. Hmida, A. B. E. Mabrouk, and M. Naski, “Introducing A large Tunisian Arabizi Dialectal Dataset for Sentiment Analysis,” in WANLP 2021 - 6th Arabic Natural Language Processing Workshop, Proceedings of the Workshop, 2021.

A. Benali, M. H. Maaloul, and L. H. Belguith, “Automatic Processing of Algerian Dialect: Corpus Construction and Segmentation,” SN Comput Sci, vol. 4, no. 5, p. 597, Aug. 2023, doi: 10.1007/s42979-023-02097-1.

S. H. Muhammad et al., “SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval),” arXiv (Cornell University), Nov. 2023, doi: 10.18653/v1/2023.semeval-1.315.

D. I. Adelani et al., “IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models,” 2024. [Online]. Available: https://arxiv.org/abs/2406.03368

E. D. Kingawa, K. Tasew, M. Sholaye, and S. Hailu, “HATE SPEECH DETECTION USING MACHINE LEARNING: A SURVEY,” Academy Journal of Science and Engineering (AJSE), vol. 17, no. 1, pp. 88–109, 2023.

O. Oriola and E. Kotze, “Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets,” IEEE Access, vol. 8, pp. 21496–21509, 2020, doi: 10.1109/ACCESS.2020.2968173.

J. Mussandi and A. Wichert, “NLP Tools for African Languages: Overview,” in Proceedings of the 16th International Conference on Computational Processing of Portuguese (PROPOR 2024), Volume 2, Lisbon, Portugal: Association for Computational Processing of Portuguese Languages, 2024, pp. 73–82.

C. Jacobs, N. C. Rakotonirina, E. A. Chimoto, B. A. Bassett, and H. Kamper, “Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2023. doi: 10.21437/Interspeech.2023-421.

W. B. Demilie and A. O. Salau, “Detection of fake news and hate speech for Ethiopian languages: a systematic review of the approaches,” J Big Data, vol. 9, no. 1, p. 66, Dec. 2022, doi: 10.1186/s40537-022-00619-x.

United Nations, “United Nations Strategy and Plan of Action on Hate Speech,” 2019. [Online]. Available: https://www.un.org/en/hate-speech/strategy-plan-action




DOI: https://dx.doi.org/10.21622/ACE.2025.05.2.1524

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Banchale Adhi Gufu, Audrey Mbogho, Edward Ombui


Advances in Computing and Engineering

E-ISSN: 2735-5985

P-ISSN: 2735-5977

 

Published by:

Academy Publishing Center (APC)

Arab Academy for Science, Technology and Maritime Transport (AASTMT)

Alexandria, Egypt

ace@aast.edu