Multi-dimensional Correlation Analysis for Pollution Process Identification: Advanced Clustering and Source Fingerprinting in the Santiago River Basin | IJCT Volume 12 – Issue 5 | IJCT-V12I5P72

International Journal of Computer Techniques Logo
International Journal of Computer Techniques
ISSN 2394-2231
Volume 12, Issue 5  |  Published: September – October 2025
Author
José Miguel Morán Loza , Alicia García Arreola , J. de Jesús Hernández Barragán , Jaime F. Almaguer Medina

Abstract

Monitoring water quality in contaminated river systems requires an understanding of the complex relationships between multiple pollutants. This study presents a novel multidimensional correlation analysis framework for identifying similar pollution processes in the Santiago River Basin in Mexico. Advanced clustering algorithms that combine linear correlations, temporal lag analysis, and spatial correlation patterns were developed using water quality data from 13 monitoring stations covering 39 physicochemical and biological parameters over three years (2012–2015). This methodology surpasses traditional Pearson correlation by implementing nonlinear correlation measures (Maximal Information Coefficient), dynamic time warping for temporal analysis, and graph-based clustering techniques. The results revealed seven distinct clusters of pollution processes with correlation coefficients ranging from 0.57 to 0.91, suggesting the presence of common contamination sources or transport mechanisms. The total chlorides group (alkalinity, sodium, total dissolved solids, and sulfates) exhibited the strongest internal correlations (R > 0.82), indicating industrial discharge patterns. Temporal lag analysis identified cascade contamination processes with delays of two to seven days between related pollutants. Spatial correlation mapping revealed three contamination zones with distinct profiles along the 475-km river system. The proposed Pollution Process Similarity Index (PPSI) successfully classified 89% of contamination events into recognized origin categories. This framework enables the automated identification of pollution sources and optimized monitoring strategies, as well as the development of early warning systems. It has demonstrated potential for transferability to other contaminated watersheds globally.

Keywords

Water quality monitoring, pollution correlation analysis, source identification, machine learning, environmental clustering, Santiago River, contamination fingerprinting.

Conclusion

V.Conclusion This study presents a comprehensive multi-dimensional correlation analysis framework for identifying pollution processes in contaminated river systems, demonstrating significant advances over traditional single-parameter approaches. Application to the Santiago River Basin revealed seven distinct pollution process clusters with unique temporal, spatial, and correlation signatures.

References

[1]World Health Organization, “World Health Statistics 2019: Monitoring Health for the SDGs,” WHO, Geneva, Switzerland, 2019. [2]O. Arellano-Aguilar, L. Ortega Elorza, P. Gesundheit Montero, and Greenpeace, “Estudio de la contaminación en la cuenca del Río Santiago y la salud pública en la región”, Greenpeace México, 2012. [3]M. A. Pérez Cisneros, L. J. M. Morán and A. G. Arreola, “Artificial neural networks applied in the forecast of pollutants into the Río Santiago, based on the sample of a pollutant, by data fusion”, 2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA), Hefei, China, 2016, pp. 1135-1138, doi: 10.1109/ICIEA.2016.7603754. [4]M. A. Pérez Cisneros, A. García Arreola and L. J. M. Morán, “Forecast of pollutants in the río santiago, using data fusion technique using statistical and regression methods”, 2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA), Hefei, China, 2016, pp. 1858-1862, doi: 10.1109/ICIEA.2016.7603890 [5]INECC-CCA, “Manual de métodos de muestreo y preservación de muestras de las sustancias prioritarias para las matrices prioritarias del PRONAME”, Instituto Nacional de Ecología y Cambio Climático, México, Rev. 2.5, 2012. [6]D. E. Hinkle, W. Wiersma, and S. G. Jurs, Applied Statistics for the Behavioral Sciences, 5th ed. Boston, MA, USA: Houghton Mifflin, 2003. [7]D. N. Reshef et al., “Detecting novel associations in large data sets”, Science, vol. 334, no. 6062, pp. 1518-1524, Dec. 2011, doi: 10.1126/science.1205438. [8]G. J. Székely, M. L. Rizzo, and N. K. Bakirov, “Measuring and testing dependence by correlation of distances”, Ann. Statist., vol. 35, no. 6, pp. 2769-2794, Dec. 2007, doi: 10.1214/009053607000000505. [9]V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” J. Stat. Mech., vol. 2008, no. 10, Art. no. P10008, Oct. 2008, doi: 10.1088/17425468/2008/10/ P10008. P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis”, J. Comput. Appl. Math., Vol. 20, pp. 53-65, Nov. 1987, doi: 10.1016/0377-0427(87)90125-7.

Journal Covers

Official IJCT Front Cover
Official Front Cover
Download
Official IJCT Back Cover
Official Back Cover
Download

IJCT Important Links

© 2025 International Journal of Computer Techniques (IJCT).