Ransomware attacks have emerged as one of the most devastating cybersecurity threats, causing billions of dollars in damage annually and disrupting critical infrastructure worldwide. Traditional signature-based detection methods prove inadequate against evolving ransomware variants that employ sophisticated obfuscation techniques and zero-day exploits. This research presents a comprehensive machine learning-based approach for ransomware detection utilizing static analysis of Portable Executable (PE) file features, addressing the critical need for proactive threat identification without requiring malware execution.
The correlation heatmap demonstrated complex inter-feature relationships, with DllCharacteristics emerging as the most discriminative feature for ransomware classification. Feature importance analysis through Random Forest revealed that DllCharacteristics contributed over 20% of the predictive power, followed by DebugRVA, DebugSize, and MajorLinkerVersion, indicating the critical role of PE structural characteristics in malware identification.
Three state-of-the-art ensemble learning algorithms were implemented and rigorously evaluated: Random Forest, Gradient Boosting Classifier, and XGBoost. The models utilized a stratified 70-30 train-test split to maintain class balance and ensure generalization capability. Performance evaluation encompassed multiple metrics including accuracy, precision, recall, and F1-score to provide comprehensive assessment of classification effectiveness.
Experimental results demonstrate exceptional performance across all implemented models. Random Forest achieved outstanding results with 99.58% accuracy, 99.69% precision, 99.35% recall, and 99.52% F1-score, establishing it as the top-performing classifier. XGBoost closely followed with 99.55% accuracy, 99.63% precision, 99.32% recall, and 99.48% F1-score. Gradient Boosting Classifier obtained 98.94% accuracy, 99.19% precision, 98.37% recall, and 98.78% F1-score. These results significantly exceed existing benchmarks in ransomware detection literature and demonstrate the efficacy of PE-based feature engineering.
This research presents a comprehensive machine learning approach for ransomware detection using static analysis of Portable Executable (PE) file features. By leveraging a large and diverse dataset, careful feature engineering, and cutting-edge ensemble classifiers—Random Forest, Gradient Boosting, and XGBoost—the study achieves remarkable classification accuracy exceeding 99%, with Random Forest emerging as the top-performing model. The extensive exploratory data analysis provided deep insights into PE feature importance, highlighting DllCharacteristics, DebugRVA, and DebugSize as primary indicators of ransomware presence. The robust performance across precision, recall, and F1-score metrics confirms the models’ ability to balance false positives and false negatives effectively, a critical requirement for operational cybersecurity environments.
Moreover, the developed models demonstrated excellent generalization on a stratified test set and practical usability through successful deployment and real-time single-sample prediction experiments. This underscores the potential for integrating static PE feature-based machine learning detectors into existing endpoint and network security infrastructures, providing fast, reliable, and scalable ransomware identification without executing suspicious code. The work advances the state-of-the-art in static malware detection by rigorously combining statistical analysis, ensemble modeling, and practical deployment considerations.
Despite these promising results, several areas remain ripe for future research. First, enhancing adversarial robustness against sophisticated evasion tactics—such as crafted PE headers designed to bypass static detectors—remains a pressing concern. Integrating dynamic behavior analysis alongside static features may improve resilience against such threats. Second, exploring lightweight and optimized model architectures will facilitate deployment in resource-constrained environments such as IoT and mobile platforms. Third, extending the current binary classification framework to multiclass settings will enable differentiation among ransomware families, improving threat attribution and response strategies. Fourth, continual learning approaches adapting to evolving ransomware variants over time can mitigate model degradation due to concept drift. Lastly, expanding evaluation using real-world, chronologically ordered ransomware samples will strengthen empirical validation and deployment readiness.
In summary, this work corroborates the efficacy of machine learning-based static analysis for ransomware detection and lays a strong foundation for future efforts that incorporate hybrid analysis techniques, adversarial defenses, and real-time adaptive learning to address the rapidly evolving cybersecurity landscape.
References
[1] K. Kunku, A.N.K. Zaman, and K. Roy, “Ransomware Detection and Classification using Machine Learning,” in 2023 IEEE Symposium on Computational Intelligence in Cyber Security (IEEE CICS), 2023, pp. 1-8. Available: https://arxiv.org/abs/2311.16143
[2] S. Kim, “PE Header Analysis for Malware Detection,” Master’s thesis, San José State University, 2018. Available: https://scholarworks.sjsu.edu/etd_projects/624/
[3] J. Bai, J. Wang, and G. Zou, “A Malware Detection Scheme Based on Mining Format Information,” Scientific Programming, vol. 2014, Article ID 260905, 2014. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC4060536/
[4] S. Gujar and S. Patil, “A Machine Learning-Based PE Header Analysis for Malware Detection,” International Journal of Innovative Science and Research Technology, vol. 9, no. 3, pp. 1671-1679, March 2024. Available: https://ijisrt.com/assets/upload/files/IJISRT24MAR615.pdf
[5] A. Shalaginov, S. Banin, A. Dehghantanha, and K. Franke, “Machine Learning Aided Static Malware Analysis: A Survey and Tutorial,” arXiv preprint arXiv:1808.01201, August 2018. Available: https://arxiv.org/abs/1808.01201
[6] M. Mimura, “Evaluation of printable character-based malicious PE file-detection method,” Forensic Science International: Digital Investigation, vol. 39, Article 301308, March 2022. Available: https://www.sciencedirect.com/science/article/pii/S2542660522000245
[7] C. Iwendi, S. Khan, J.H. Anajemba, M. Mittal, M. Alenezi, and M. Alazab, “The Use of Ensemble Models for Multiple Class and Binary Class Classification for Improving Intrusion Detection Systems,” Sensors, vol. 20, no. 9, Article 2559, April 2020. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC7249012/
[8] S. Soni, “Ensemble Learning approach to Enhancing Binary Classification in Intrusion Detection System for Internet of Things,” International Journal of Electronics and Telecommunications, vol. 70, no. 2, pp. 149-156, June 2024. Available: https://ijet.pl/index.php/ijet/article/view/10.24425-ijet.2024.149567
[9] A. Al-Dujaili, A. Huang, E. Hemberg, and U.-M. O’Reilly, “Adversarial Deep Learning for Robust Detection of Binary Encoded Malware,” in 2018 IEEE Security and Privacy Workshops (SPW), 2018, pp. 76-82. Available: https://www.mecs-press.org/ijcnis/ijcnis-v14-n2/v14n2-2.html
[10] R. Ravi and M. Munir, “Malware Analysis and Classification: A Survey,” Journal of Information Security, vol. 5, no. 2, pp. 56-64, February 2014. Available: https://www.scirp.org/journal/paperinformation?paperid=44440
How to Cite This Paper
Rohan Tyagi, Eshan Sharma, Gaurav Yadav, Toshit , Sonika Jalhotra (2026). RANSOMWARE DETECTION USING MACHINE LEARNING. International Journal of Computer Techniques, 13(2). ISSN: 2394-2231.