Detecting Fake News in Social Media: An Approach Utilizing Machine Learning to Uncover Disinformation

Mustafa Abdul-Razzaq  Kareem; Amer Abdulmajeed  Abdulrahman

doi:10.24996/ijs.2025.66.7.27

Authors

Mustafa Abdul-Razzaq Kareem Department of Computer Science, College of Science, University of Baghdad, Baghdad, Iraq
Amer Abdulmajeed Abdulrahman Department of Computer Science, College of Science, University of Baghdad, Baghdad, Iraq

DOI:

https://doi.org/10.24996/ijs.2025.66.7.27

Keywords:

fake news detection, short texts classification, Machine Learning, Natural language processing, social media platforms, Random Forest, K-nearest neighbors, Logistic Regression, Stochastic Gradient Descent, Decision Tree, Naive Bayes

Abstract

The increasing number of untruths on social media has become a critical concern, affecting public sentiment and confidence. The broad spread of misleading information on the Internet and other social platforms presents a substantial barrier, exerting an impact on public sentiment, influencing political discussions, and eroding the reliability of information sources. Identifying false information on the X platform, previously known as Twitter, is an intricate task because of the network's attributes, such as conciseness, swift spread, and varied user engagements. Extracting crucial information from brief texts, such as tweets, is challenging, even with precise labeling. This study focuses on recognizing misinformation on social media platforms. The CIC Truth-Seeker Dataset 2023, one of the most extensive datasets in its category, contains over 134,000 labeled tweets. The study introduces novel methods in the field of short text classification, incorporating machine learning and natural language processing techniques (NLP). These techniques involve feature extraction using the term frequency-inverse document frequency (TF-IDF) algorithm after the dataset is preprocessed. The study then tests a number of machine learning models, including Random Forest RF, K-Nearest Neighbor KNN, Decision Tree DT, Logistic Regression LR, Naive Bayes NB, and stochastic gradient descent SGD, to see which ones can tell the most accurate difference between real and fake tweets. The findings demonstrated significant advancements in models designed to handle short text effectively, effectively addressing a practical issue such as automatically identifying fake content on social media platforms. Furthermore, we have achieved a significant advantage over previous research on the same dataset. When implementing the models on the news data, the random forest method attained the utmost accuracy at 93%, while the K-Nearest Neighbor strategy yielded a lower accuracy of 68%. This research paper aims to offer helpful information and practical answers to recognizing and reducing false news on social media platforms, specifically focusing on the X platform. Through a Truth-Seeker dataset, we will utilize machine learning methods to enhance previous text classification models.