Data Mining Methods for Extracting Rumors Using Social Analysis Tools

Manahil  Zayno; Abdulkareem Merhej  Radhi

doi:10.24996/ijs.2022.63.8.36

Authors

Manahil Zayno Department of Computer Science, College of Science, Al-Nahrain University, Jadriya, Baghdad, Iraq https://orcid.org/0000-0002-2062-529X
Abdulkareem Merhej Radhi Department of Computer Science, College of Science, Al-Nahrain University, Jadriya, Baghdad, Iraq

DOI:

https://doi.org/10.24996/ijs.2022.63.8.36

Keywords:

Machine learning, Text classification, Naïve Byes, RF, KNN, DT, Natural language processing, SGD

Abstract

Rumors are typically described as remarks whose true value is unknown. A rumor on social media has the potential to spread erroneous information to a large group of individuals. Those false facts will influence decision-making in a variety of societies. In online social media, where enormous amounts of information are simply distributed over a large network of sources with unverified authority, detecting rumors is critical. This research proposes that rumor detection be done using Natural Language Processing (NLP) tools as well as six distinct Machine Learning (ML) methods (Nave Bayes (NB), random forest (RF), K-nearest neighbor (KNN), Logistic Regression (LR), Stochastic Gradient Descent (SGD) and Decision Tree (DT)). The data set size for the suggested experiment was 16,865 samples. For pre-processing tokenization was used to separates each one of the tokens from the others. Normalization that removes all non-word tokens, deleting stop words was utilized to remove all unnecessary words, and stemming was used to obtain the stem of the tokens. Prior to using the six classification algorithms, the major feature extraction approach Term Frequency- Inverse Document Frequency (TF-IDF) was applied. The RF classifier performed better compared to all other classifiers with an accuracy of 99%, according to the data.

Keywords: Machine learning, Text classification, Naïve Byes, RF, KNN, DT, Natural language processing, SGD).