Extremism Detection in the Iraqi Dialect Based on Machine Learning

Redhaa Fadhil  Sabri; Nada A. Z.  Abdullah

doi:10.24996/ijs.2025.66.2.25

Authors

Redhaa Fadhil Sabri Department of Computer Science, College of Science, University of Baghdad, Baghdad, Iraq https://orcid.org/0009-0007-1693-6494
Nada A. Z. Abdullah Department of Computer Science, College of Science, University of Baghdad, Baghdad, Iraq

DOI:

https://doi.org/10.24996/ijs.2025.66.2.25

Keywords:

Extremism detection, NLP, word embedding, Machine Learning, Iraqi Dialect

Abstract

Extremism detection is an important area of natural language processing (NLP). It is used to detect hate speech, sectarianism, and terrorism on social media. This field has been discussed and studied in many international languages, especially Arabic and English, as many studies touched on languages in particular, but dialects were not addressed even though users of social networking sites write in their dialect. One of the most difficult Arabic dialects is the Iraqi dialect. Because the Iraqi dialect has few sources on the Internet regarding available data that can be used by researchers, this research aims to detect extremism in Iraqi texts using machine learning. The data was pre-processed by deleting suffixes and prefixes for Iraqi words, deleting repeated letters in the word, and deleting Iraqi stop words. Pre-trained embedding as well as embedding using Gensim Word2vec and FastText were used to represent the words in the embedding step. Also, four learning classifiers were used: Support Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbor (KNN), and Gaussian Naive Bayes (GNB). The experiments were conducted on two Iraqi datasets collected from social media platforms related to extremism: the Iraqi Facebook Comments Dataset (IFCD) and the Iraqi Tweets Dataset (ITD). The performance of all models was evaluated using accuracy, macro-average precision, macro-average recall, and macro-average F1-score; the best F1-score is 0.9521, while recall and precision are 0.95 and 0.955, respectively. In addition, the models presented in this research were tested on an Iraqi data set related to hate speech available on the Internet, and the results obtained were compared with the results of the work that provided this data set.