Predicting Sepsis Using Correlation Based Clustering of Patient Features | IJCT Volume 13 – Issue 3 | IJCT-V13I3P91

International Journal of Computer Techniques
ISSN 2394-2231
Volume 13, Issue 3  |  Published: May – June 2026

Author

Kavya M, Sindhu S, Savitha C K, Venkatesh U C

Abstract

Sepsis is a life-threatening condition representing the body’s extreme reaction to infection, capable of causing organ damage and septic shock leading to death. This paper presents a machine learning framework for early sepsis prediction using clinical data from the PhysioNet Computing in Cardiology Challenge 2019, comprising approximately 40,000 patient records with 41 parameters. The proposed system addresses the challenges of high-dimensional clinical data through a correlation-based hierarchical clustering approach. Features are filtered by a 60% missing-value threshold and imputed using the fancyimpute library. A mixed-type correlation matrix is computed using Pearson Rho, Cramer’s V, and Correlation Ratio. The resulting clusters are scored and fed into a Decision Tree Classifier trained with balanced class weights and evaluated using Stratified Group K-Fold Cross-Validation. Results demonstrate an AUC-ROC exceeding 0.84, indicating strong predictive capability for early sepsis detection.

Keywords

Sepsis prediction, machine learning, hierarchical clustering, decision tree, correlation matrix, PhysioNet, clinical data, dimensionality reduction.

Conclusion

This project successfully demonstrates a correlation-based hierarchical clustering approach for early sepsis prediction. The system transforms 41 high-dimensional clinical parameters into refined cluster scores, enabling a Decision Tree Classifier to achieve an AUC-ROC above 0.84. The deployed Streamlit application bridges the gap between data science and clinical practice, providing real-time, interpretable risk assessments with a seamless data persistence layer. Future work will integrate centralized encrypted databases (e.g., PostgreSQL) for multi-ward synchronization, explore LSTM-based temporal modeling of vital sign trends, implement IoT-based automated data ingestion from ICU monitors, and extend the diagnostic scope with NLP analysis of clinical notes.

References

[1] M. A. Reyna et al., “Early Prediction of Sepsis from Clinical Data: The PhysioNet Computing in Cardiology Challenge 2019,” Critical Care Medicine, vol. 48, no. 2, pp. 210–217, Feb. 2020. [2] C. W. Seymour et al., “Assessment of Clinical Criteria for Sepsis: For the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3),” JAMA, vol. 315, no. 8, pp. 762–774, 2016. [3] M. Singer et al., “The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3),” JAMA, vol. 315, no. 8, pp. 801–810, 2016. [4] J. H. Ward, “Hierarchical Grouping to Optimize an Objective Function,” J. Amer. Statistical Assoc., vol. 58, no. 301, pp. 236–244, 1963. [5] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Machine Learning Research, vol. 12, pp. 2825–2830, 2011. [6] O. Troyanskaya et al., “Missing Value Estimation Methods for DNA Microarrays,” Bioinformatics, vol. 17, no. 6, pp. 520–525, 2001. [7] R. G. Acharya et al., “Machine Learning for Sepsis Prediction in ICU Patients: A Systematic Review,” J. Critical Care, vol. 68, pp. 42–51, 2022. [8] A. E. W. Johnson et al., “MIMIC-III, a Freely Accessible Critical Care Database,” Scientific Data, vol. 3, p. 160035, 2016. [9] P. Mao et al., “Feature Selection for Sepsis Diagnosis Using a Gradient Boosting Classifier,” IEEE J. Biomedical and Health Informatics, vol. 24, no. 5, pp. 1477–1484, 2020.

How to Cite This Paper

Kavya M, Sindhu S, Savitha C K, Venkatesh U C (2026). Predicting Sepsis Using Correlation Based Clustering of Patient Features. International Journal of Computer Techniques, 13(3). ISSN: 2394-2231.

© 2026 International Journal of Computer Techniques (IJCT). All rights reserved.

Submit Your Paper