Leveraging Vectorization Techniques for Malicious Website Detection With Machine Learning

Saleem Raja  Abdul Samad; Sundaravadivazhagan  Balasubramaniyan; Pradeepa  Ganesan; Chitra  P; Poongothai  K

doi:10.24996/ijs.2025.66.2.27

Authors

Saleem Raja Abdul Samad IT Department, University of Technology and Applied Sciences-Shinas, Sultanate of Oman
Sundaravadivazhagan Balasubramaniyan IT Department, University of Technology and Applied Sciences-Shinas, Sultanate of Oman
Pradeepa Ganesan IT Department, University of Technology and Applied Sciences-Musannah, Sultanate of Oman
Chitra P Department of Computer Science, School of Sciences, GITAM University, Bangalore, India
Poongothai K Department of Computer Science, Shri Sakthikailassh Women's College, Salem, Tamil Nadu, India

DOI:

https://doi.org/10.24996/ijs.2025.66.2.27

Keywords:

Vectorizer, Malicious URL, Machine learning , Phishing, Word2Vec

Abstract

Malicious websites are those that are created to harm visitors or exploit their information for illegal purposes. These websites are commonly utilized in attacks, such as phishing, malware distribution, and scams. Clicking on a malicious URL can result in catastrophic outcomes, such as data breaches, financial losses, and identity theft. Detecting and blocking these websites is essential for protecting individuals and organizations from online threats, preserving data security, and sustaining confidence in online platforms. Researchers presented several methods for detecting malicious websites. Due to the threat's evolution, the problem remains unsolved. This paper presents a machine-learning model for malicious website identification. To process the textual contents, the experiment uses four different text vectorization techniques such as Count, TF-IDF, Hashing and word embedding (average Word2Vec). Eight machine learning models are tested to assess performance. The outcome demonstrates that the average word2vec embedding with extreme gradient boosting and random forest obtains 94.76% and 94.70% accuracy, respectively.