Gender Classification Based on Iraqi Names Using Machine Learning

Authors

  • Huda Hallawi Department of Information Technology, College of Computers Science & Information Technology, University of Kerbala, Karbala, Iraq
  • Ahmed F Almukhtar Department of Information Technology, College of Computers Science & Information Technology, University of Kerbala, Karbala, Iraq
  • Dhamyaa A. Nasrawi Department of Computer Science, College of Computers Science & Information Technology, University of Kerbala, Karbala, Iraq
  • Ali Durr Salah Department of Computer Science, College of Computers Science & Information Technology, University of Kerbala, Karbala, Iraq
  • Tariq Zaid Faisal Department of Computer Science, College of Computers Science & Information Technology, University of Kerbala, Karbala, Iraq

DOI:

https://doi.org/10.24996/ijs.2024.65.11.42

Keywords:

Gender classification, machine learning techniques, Iraqi names, multi features, unique dataset

Abstract

In machine learning, the classification task is about building a model to predict a class of elements based on their attributes and set of examples. This work aims to classify people based on their names. Two models were developed; the former is based on a single feature that is represented by a name. Whereas the latter is built upon nine features derived from the name itself, which are: is_longname, is_vowelend, is_vowelbegin, 2_gramend, 2_grambegin, 1_gramend, 1_grambegin, is_contain_abo, and is_contain_abed. Furthermore, two datasets were utilized: the first was collected from the Ministry of Labor and Social Affairs, while the second was gathered from the Iraqi university website. There are a lot of strange IRAQI names in two datasets, as well as spelling errors, which represent a real challenge in the classification process. Five machine learning methods were applied and tested within the developed models, including Random Forest, Naive Bayes, Logistic Regression, Multilayer Perceptron, and Extreme Gradient Boost. Ultimately, the experimental results demonstrate an increase in accuracy when applying the model to the original dataset, which includes names and their frequencies. The Multilayer Perceptron has achieved 97% accuracy in one feature model, while the Extreme Gradient Boost has achieved 97% accuracy in the multi-feature model. On the other hand, the results do not exceed 79% when the models are applied to the unique dataset (names without their frequencies).

Downloads

Published

2024-11-30

Issue

Section

Computer Science

How to Cite

Gender Classification Based on Iraqi Names Using Machine Learning. (2024). Iraqi Journal of Science, 65(11), 6725-6737. https://doi.org/10.24996/ijs.2024.65.11.42

Similar Articles

1-10 of 1422

You may also start an advanced similarity search for this article.