Leveraging Vectorization Techniques for Malicious Website Detection With Machine Learning
DOI:
https://doi.org/10.24996/ijs.2025.66.2.27Keywords:
Vectorizer, Malicious URL, Machine learning , Phishing, Word2VecAbstract
Malicious websites are those that are created to harm visitors or exploit their information for illegal purposes. These websites are commonly utilized in attacks, such as phishing, malware distribution, and scams. Clicking on a malicious URL can result in catastrophic outcomes, such as data breaches, financial losses, and identity theft. Detecting and blocking these websites is essential for protecting individuals and organizations from online threats, preserving data security, and sustaining confidence in online platforms. Researchers presented several methods for detecting malicious websites. Due to the threat's evolution, the problem remains unsolved. This paper presents a machine-learning model for malicious website identification. To process the textual contents, the experiment uses four different text vectorization techniques such as Count, TF-IDF, Hashing and word embedding (average Word2Vec). Eight machine learning models are tested to assess performance. The outcome demonstrates that the average word2vec embedding with extreme gradient boosting and random forest obtains 94.76% and 94.70% accuracy, respectively.