Email spam continues to be a significant challenge due to the increasing volume of unsolicited and malicious messages. This paper presents a comparative analysis of classical machine learning algorithms for email spam detection using the UCI Spambase dataset. Several widely used classifiers, including Naïve Bayes, Support Vector Machine, Logistic Regression, Decision Tree, and Random Forest, are evaluated under identical experimental conditions. Performance is measured by precision, recall, and F1-score ,accuracy. Experimental results demonstrate that the Random Forest classifier outperforms other models, achieving an accuracy of 94.57% with a high precision and balanced recall. In addition to performance evaluation, model interpretability is enhanced using SHAP (SHapley Additive exPlanations) to analyse feature contributions influencing spam classification decisions. The findings indicate that classical machine learning models, when combined with explainability techniques, can provide reliable and interpretable solutions for email spam filtering.
Keywords
Email Spam Detection, Machine Learning, Random Forest, SHAP, Text Classification.
Conclusion
This study presented a comparative evaluation of classical machine learning algorithms for email spam detection using the UCI Spambase dataset. Multiple classifiers were analysed to assess their effectiveness in distinguishing spam from legitimate emails. Among the evaluated models, the Random Forest classifier demonstrated superior performance in terms of accuracy, precision, recall, and F1-score, indicating its robustness and reliability for spam filtering tasks. Furthermore, the application of SHAP provided meaningful insights into feature importance and model behaviour, improving transparency and interpretability of the classification decisions. The results confirm that classical machine learning approaches, when supported by explainable AI techniques, remain effective for real-world spam detection systems. Future work may explore hybrid models or deep learning approaches to further enhance detection performance on more diverse and evolving datasets.
References
1.Dua, D., & Graff, C. (2019). UCI Machine Learning Repository. University of California, Irvine.
2.Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., & Spyropoulos, C. D. (2000). An evaluation of naive Bayesian anti-spam filtering. Proceedings of the Workshop on Machine Learning in the New Information Age.
3.Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
4.Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
5.Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.
6.Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.
7.Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.
8.Pedregosa, F., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
9.Raschka, S., & Mirjalili, V. (2019). Python Machine Learning (3rd ed.). Packt Publishing.
10.Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O’Reilly Media.
11.Alzahrani, S. M., & Alotaibi, S. S. (2021). Spam email detection using machine learning techniques. Journal of Information Security and Applications, 58, 102712.
12.Kumar, A., & Sharma, R. (2020). Email spam classification using ensemble learning methods. International Journal of Computer Applications, 176(20), 1–6.
13.Singh, P., & Kaur, G. (2022). A comparative study of machine learning algorithms for spam detection. International Journal of Computer Science and Information Security, 20(2), 45–52.
14.Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.
15.Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD Conference.
16.Zhang, Y., & Wallace, B. (2017). A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. Proceedings of IJCNLP.
17.Saini, R., & Kaur, P. (2023). Explainable artificial intelligence techniques in text classification: A review. Applied Artificial Intelligence, 37(1), 1–20.
Bansal, S., & Gupta, N. (2021). Machine learning-based approaches for spam detection: A survey. International Journal of Information Technology, 13(4), 1375–1385.
How to Cite This Paper
Dhanna Singh (2025). A Comparative Analysis of Classical Machine Learning Algorithms for Email Spam Detection. International Journal of Computer Techniques, 12(6). ISSN: 2394-2231.