Journal of Machine and Computing


Integrating Canonical Correlation Analysis with Random Forest for Heart Disease Prediction



Journal of Machine and Computing

Received On : 22 March 2024

Revised On : 01 July 2024

Accepted On : 29 August 2024

Published On : 05 October 2024

Volume 04, Issue 04

Pages : 1180-1194


Abstract


Heart disease, a leading global cause of death over the past several decades, encompasses a range of disorders affecting the heart. Researchers use various data mining and machine learning techniques to analyze complex medical data, aiding healthcare professionals in predicting cardiac conditions. Despite these advances, existing models often struggle with effectively modelling non-linear relationships, maximizing feature correlation, and addressing challenges related to dimensionality and overfitting. This research paper introduces the Hybrid CCRF model for heart disease prediction, which integrates Canonical Correlation Analysis (CCA) with Random Forest. The proposed model generates polynomial features to capture non-linear relationships and applies Canonical Correlation Analysis to identify canonical variables that maximize correlations between heart disease features and chronic condition features. By combining these canonical variables into a single feature set, the model enhances prediction accuracy. The objectives of the Hybrid CCRF model are threefold: 1) To capture complex non-linear relationships between heart disease and chronic condition features by integrating polynomial feature generation with Canonical Correlation Analysis, thereby improving the model’s ability to represent intricate data patterns; 2) To use CCA to identify and integrate canonical variables that enhance feature correlation, creating a more informative feature set; and 3) To address high-dimensional data and overfitting issues by combining canonical variables with polynomial features in a Random Forest model, balancing complexity and performance for improved generalization and robustness across various datasets. The proposed model achieved an accuracy of 99.45%, with a sensitivity of 98.53%, specificity of 99.54%, precision of 95.73%, and an F1 Score of 0.9711, outperforming all existing models.


Keywords


Heart Disease, Disease Prediction, Canonical Correlation, Random Forest, Non-Linear Relationship.


  1. Rubini P. E., Dr. C. A. Subasini, Dr. A. Vanitha Katharine, V. Kumaresan, S. Gowdham Kumar, T. M. Nithya, “A Cardiovascular Disease Prediction using Machine Learning Algorithms”, Annals of RSCB, vol. 25, no. 2, pp. 904–912, Mar. 2021.
  2. A. S. Kumar and R. Rekha, “An improved hawks optimizer based learning algorithms for cardiovascular disease prediction,” Biomedical Signal Processing and Control, vol. 81, p. 104442, Mar. 2023, doi: 10.1016/j.bspc.2022.104442.
  3. C. Krittanawong et al., “Machine learning prediction in cardiovascular diseases: a meta-analysis,” Scientific Reports, vol. 10, no. 1, Sep. 2020, doi: 10.1038/s41598-020-72685-1.
  4. W. Sun, P. Zhang, Z. Wang, and D. Li, “Prediction of Cardiovascular Diseases based on Machine Learning,” ASP Transactions on Internet of Things, vol. 1, no. 1, pp. 30–35, May 2021, doi: 10.52810/tiot.2021.100035.
  5. M. M. Ali, B. K. Paul, K. Ahmed, F. M. Bui, J. M. W. Quinn, and M. A. Moni, “Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison,” Computers in Biology and Medicine, vol. 136, p. 104672, Sep. 2021, doi: 10.1016/j.compbiomed.2021.104672.
  6. Y. Zhao, E. P. Wood, N. Mirin, S. H. Cook, and R. Chunara, “Social Determinants in Machine Learning Cardiovascular Disease Prediction Models: A Systematic Review,” American Journal of Preventive Medicine, vol. 61, no. 4, pp. 596–605, Oct. 2021, doi: 10.1016/j.amepre.2021.04.016.
  7. I. M. El-Hasnony, O. M. Elzeki, A. Alshehri, and H. Salem, “Multi-Label Active Learning-Based Machine Learning Model for Heart Disease Prediction,” Sensors, vol. 22, no. 3, p. 1184, Feb. 2022, doi: 10.3390/s22031184.
  8. E. D. Adler et al., “Improving risk prediction in heart failure using machine learning,” European Journal of Heart Failure, vol. 22, no. 1, pp. 139–147, Nov. 2019, doi: 10.1002/ejhf.1628.
  9. A. K. Gárate-Escamila, A. Hajjam El Hassani, and E. Andrès, “Classification models for heart disease prediction using feature selection and PCA,” Informatics in Medicine Unlocked, vol. 19, p. 100330, 2020, doi: 10.1016/j.imu.2020.100330.
  10. Y. Muhammad, M. Tahir, M. Hayat, and K. T. Chong, “Early and accurate detection and diagnosis of heart disease using intelligent computational model,” Scientific Reports, vol. 10, no. 1, Nov. 2020, doi: 10.1038/s41598-020-76635-9.
  11. Vetrithangam, D., Senthilkumar, V., Kumar, A. R., Naresh, P., & Sharma, M, “Coronary Artery Disease Prediction Based on Optimal Feature Selection Using Improved Artificial Neural Network with Meta-Heuristic Algorithm.” Journal of Theoretical and Applied Information Technology, vol.100. no.24, p.4771-4782, (2022).
  12. A. Garg, B. Sharma, and R. Khan, “Heart disease prediction using machine learning techniques,” IOP Conference Series: Materials Science and Engineering, vol. 1022, no. 1, p. 012046, Jan. 2021, doi: 10.1088/1757-899x/1022/1/012046.
  13. C. M. Bhatt, P. Patel, T. Ghetia, and P. L. Mazzeo, “Effective Heart Disease Prediction Using Machine Learning Techniques,” Algorithms, vol. 16, no. 2, p. 88, Feb. 2023, doi: 10.3390/a16020088.
  14. S. Subramani et al., “cardiovascular diseases prediction by machine learning incorporation with deep learning,” Frontiers in Medicine, vol. 10, Apr. 2023, doi: 10.3389/fmed.2023.1150933.
  15. O. Taylan, A. Alkabaa, H. Alqabbaa, E. Pamukçu, and V. Leiva, “Early Prediction in Classification of Cardiovascular Diseases with Machine Learning, Neuro-Fuzzy and Statistical Methods,” Biology, vol. 12, no. 1, p. 117, Jan. 2023, doi: 10.3390/biology12010117.
  16. E. I. Elsedimy, S. M. M. AboHashish, and F. Algarni, “New cardiovascular disease prediction approach using support vector machine and quantum-behaved particle swarm optimization,” Multimedia Tools and Applications, vol. 83, no. 8, pp. 23901–23928, Aug. 2023, doi: 10.1007/s11042-023-16194-z.
  17. A. Khan, M. Qureshi, M. Daniyal, and K. Tawiah, “A Novel Study on Machine Learning Algorithm-Based Cardiovascular Disease Prediction,” Health & Social Care in the Community, vol. 2023, pp. 1–10, Feb. 2023, doi: 10.1155/2023/1406060.
  18. A. G, B. Ganesh, A. Ganesh, C. Srinivas, Dhanraj, and K. Mensinkal, “Logistic regression technique for prediction of cardiovascular disease,” Global Transitions Proceedings, vol. 3, no. 1, pp. 127–130, Jun. 2022, doi: 10.1016/j.gltp.2022.04.008.
  19. J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor, “Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare,” IEEE Access, vol. 8, pp. 107562–107582, 2020, doi: 10.1109/access.2020.3001149.
  20. V. Chang, V. R. Bhavani, A. Q. Xu, and M. Hossain, “An artificial intelligence model for heart disease detection using machine learning algorithms,” Healthcare Analytics, vol. 2, p. 100016, Nov. 2022, doi: 10.1016/j.health.2022.100016.
  21. F. Ali et al., “A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion,” Information Fusion, vol. 63, pp. 208–222, Nov. 2020, doi: 10.1016/j.inffus.2020.06.008.
  22. H. Ahmed, E. M. G. Younis, A. Hendawi, and A. A. Ali, “Heart disease identification from patients’ social posts, machine learning solution on Spark,” Future Generation Computer Systems, vol. 111, pp. 714–722, Oct. 2020, doi: 10.1016/j.future.2019.09.056.
  23. R. Spencer, F. Thabtah, N. Abdelhamid, and M. Thompson, “Exploring feature selection and classification methods for predicting heart disease,” Digital Health, vol. 6, p. 205520762091477, Jan. 2020, doi: 10.1177/2055207620914777.
  24. I. D. Mienye, Y. Sun, and Z. Wang, “An improved ensemble learning approach for the prediction of heart disease risk,” Informatics in Medicine Unlocked, vol. 20, p. 100402, 2020, doi: 10.1016/j.imu.2020.100402.
  25. S. I. Ayon, Md. M. Islam, and Md. R. Hossain, “Coronary Artery Heart Disease Prediction: A Comparative Study of Computational Intelligence Techniques,” IETE Journal of Research, vol. 68, no. 4, pp. 2488–2507, Jan. 2020, doi: 10.1080/03772063.2020.1713916.
  26. S. Mohan, C. Thirumalai, and G. Srivastava, “Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques,” IEEE Access, vol. 7, pp. 81542–81554, 2019, doi: 10.1109/access.2019.2923707.
  27. N. L. Fitriyani, M. Syafrudin, G. Alfian, and J. Rhee, “HDPM: An Effective Heart Disease Prediction Model for a Clinical Decision Support System,” IEEE Access, vol. 8, pp. 133034–133050, 2020, doi: 10.1109/access.2020.3010511.
  28. P. Rani, R. Kumar, N. M. O. S. Ahmed, and A. Jain, “A decision support system for heart disease prediction based upon machine learning,” Journal of Reliable Intelligent Environments, vol. 7, no. 3, pp. 263–275, Jan. 2021, doi: 10.1007/s40860-021-00133-6.
  29. M. S. Pathan, A. Nag, M. M. Pathan, and S. Dev, “Analyzing the impact of feature selection on the accuracy of heart disease prediction,” Healthcare Analytics, vol. 2, p. 100060, Nov. 2022, doi: 10.1016/j.health.2022.100060.

Acknowledgements


The authors would like to thank to the reviewers for nice comments on the manuscript.


Funding


No funding was received to assist with the preparation of this manuscript.


Ethics declarations


Conflict of interest

The authors would like to thank to the reviewers for nice comments on the manuscript.


Availability of data and materials


Data sharing is not applicable to this article as no new data were created or analysed in this study.


Author information


Contributions

All authors have equal contribution in the paper and all authors have read and agreed to the published version of the manuscript.


Corresponding author


Rights and permissions


Open Access This article is licensed under a Creative Commons Attribution NoDerivs is a more restrictive license. It allows you to redistribute the material commercially or non-commercially but the user cannot make any changes whatsoever to the original, i.e. no derivatives of the original work. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/


Cite this article


Vetrithangam D, Sivaneasan Bala Krishnan, Siva Shankar S and Prasun Chakrabarti, “Integrating Canonical Correlation Analysis with Random Forest for Heart Disease Prediction”, Journal of Machine and Computing, pp. 1180-1194, October 2024. doi:10.53759/7669/jmc202404109.


Copyright


© 2024 Vetrithangam D, Sivaneasan Bala Krishnan, Siva Shankar S and Prasun Chakrabarti. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.