AI-Based Fake News Detection System Using TF-IDF Vectorization and Logistic Regression | IJCT Volume 13 – Issue 3 | IJCT-V13I3P21

International Journal of Computer Techniques
ISSN 2394-2231
Volume 13, Issue 2  |  Published: March – April 2026

Author

Mr. K. Senthil Kumararaja, A. TharunTeja Reddy, B. Sairam Krishna, Ch. Abhiram Swarup

Abstract

The rapid proliferation of misinformation across digital news platforms poses a significant threat to public discourse, democratic institutions, and societal well-being. With billions of users consuming information through social media and online portals, the distinction between credible journalism and fabricated content has become increasingly difficult. This paper presents a comprehensive AI-based Fake News Detection System that leverages Term Frequency-Inverse Document Frequency (TF-IDF) vectorization combined with a Logistic Regression classifier to automatically distinguish real news from fabricated content. The proposed system is trained and evaluated on a combined dataset of over 44,000 labeled news articles drawn from both reputable journalistic outlets and known misinformation sources. The complete pipeline incorporates a rigorous multi-stage text preprocessing module, TF-IDF feature extraction with corpus-level normalization, and a discriminatively trained Logistic Regression model with L2 regularization. Experimental results demonstrate an overall classification accuracy of 98.73%, with an F1-score of 0.99 achieved for both the real and fake news classes, confirming balanced detection performance. A detailed ablation study further validates each preprocessing step’s contribution to final performance. The system additionally incorporates model persistence via Python’s pickle module, enabling efficient real-time inference without retraining overhead. An interactive command-line interface (CLI) supports real-world deployment for end-user news verification. Comparative benchmarking against Naive Bayes, Decision Tree, and Random Forest classifiers under identical experimental conditions demonstrates that Logistic Regression achieves the best accuracy-efficiency tradeoff.

Keywords

Fake News Detection; TF-IDF Vectorization; Logistic Regression; Natural Language Processing; Misinformation; Text Classification; Machine Learning; Feature Engineering; Binary Classification.

Conclusion

This paper presented a comprehensive end-to-end AI-based Fake News Detection System employing TF-IDF vectorization and Logistic Regression. The system was rigorously evaluated on a 44,919-article labeled dataset, achieving 98.73% overall accuracy with an F1-score of 0.99 for both the real and fake news classes. Ablation experiments validated the contribution of each preprocessing stage, and comparative benchmarking confirmed that Logistic Regression provides the optimal accuracy-efficiency tradeoff among classical classifiers. The system’s computational efficiency — training in ~4 seconds, inference in <50 ms — and its persistence architecture make it immediately deployable for real-world fake news verification applications.

References

[1]S. Vosoughi, D. Roy, and S. Aral, “The spread of true and false news online,” Science, vol. 359, no. 6380, pp. 1146–1151, Mar. 2018. [2]K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake news detection on social media: A data mining perspective,” ACM SIGKDD Explor. Newsl., vol. 19, no. 1, pp. 22–36, 2017. [3] N. Ruchansky, S. Seo, and Y. Liu, “CSI: A hybrid deep model for fake news detection,” in Proc. ACM CIKM, 2017, pp. 797–806. [4]H. Ahmed, I. Traore, and S. Saad, “Detection of online fake news using N-gram analysis and machine learning techniques,” in Proc. ISDDC, 2017, pp. 127–138. [5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL-HLT, 2019, pp. 4171–4186. [6]W. Y. Wang, “‘Liar, liar pants on fire’: A new benchmark dataset for fake news detection,” in Proc. ACL, 2017, pp. 422–426. [7] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011. [8]V. Pérez-Rosas, B. Kleinberg, A. Lefevre, and R. Mihalcea, “Automatic detection of fake news,” in Proc. COLING, 2018, pp. 3391–3401. [9] C. Castillo, M. Mendoza, and B. Poblete, “Information credibility on Twitter,” in Proc. WWW, 2011, pp. 675–684. [10]S. Kula, M. Choraś, R. Kozik, P. Ksieniewicz, and M. Woźniak, “Sentiment analysis for fake news detection by means of neural networks,” in Proc. ICCS, 2020, pp. 653–666. [11]A. Zubiaga, A. Aker, K. Bontcheva, M. Liakata, and R. Procter, “Detection and resolution of rumours in social media: A survey,” ACM Comput. Surv., vol. 51, no. 2, pp. 1–36, 2018. [12]M. Granik and V. Mesyura, “Fake news detection using naive Bayes classifier,” in Proc. UKRPROG, 2017, pp. 1–4.

How to Cite This Paper

Mr. K. Senthil Kumararaja, A. TharunTeja Reddy, B. Sairam Krishna, Ch. Abhiram Swarup (2026). AI-Based Fake News Detection System Using TF-IDF Vectorization and Logistic Regression. International Journal of Computer Techniques, 13(2). ISSN: 2394-2231.

© 2026 International Journal of Computer Techniques (IJCT). All rights reserved.

Submit Your Paper