Explainable machine learning for coronary artery disease risk assessment and prevention

Authors

DOI:

https://doi.org/10.56294/dm202365

Keywords:

Coronary Artery Disease, Explainable Machine Learning, Risk Factors, Data Augmentation

Abstract

Coronary Artery Disease (CAD) is an increasingly prevalent ailment that has a significant impact on both longevity and quality of life. Lifestyle, genetics, nutrition, and stress are all significant contributors to rising mortality rates. CAD is preventable through early intervention and lifestyle changes. As a result, low-cost automated solutions are required to detect CAD early and help healthcare professionals treat chronic diseases efficiently. Machine learning applications in medicine have increased due to their ability to detect data patterns. Employing machine learning to classify the occurrence of coronary artery disease could assist doctors in reducing misinterpretation. The research project entails the creation of a coronary artery disease diagnosis system based on machine learning. Using patient medical records, we demonstrate how machine learning can help identify if an individual will acquire coronary artery disease. Furthermore, the study highlights the most critical risk factors for coronary artery disease. We used two machine learning approaches, Catboost and LightGBM classifiers, to predict the patient with coronary artery disease. We employed various data augmentation methods, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAE), to solve the imbalanced data problem. Optuna was applied to optimize hyperparameters. The proposed method was tested on the real-world dataset Z-Alizadeh Sani. The acquired findings were satisfactory, as the model could predict the likelihood of cardiovascular disease in a particular individual by combining Catboost with VAE, which demonstrated good accuracy compared to the other approaches. The proposed model is evaluated using a variety of metrics, including accuracy, recall, f-score, precision, and ROC curve. Furthermore, we used the SHAP values and Boruta Feature Selection (BFS) to determine essential risk factors for coronary artery disease

References

1. Coronary Heart Disease in Morocco [Internet]. World Life Expectancy. Available from: https://www.worldlifeexpectancy.com/morocco-coronary-heart-disease.

2. Roth GA, Mensah GA, Johnson CO, Addolorato G, Ammirati E, Baddour LM, et al. Global Burden of Cardiovascular Diseases and Risk Factors, 1990-2019: Update From the GBD 2019 Study. Journal of the American College of Cardiology 2020;76:2982–3021. https://doi.org/10.1016/j.jacc.2020.11.010. DOI: https://doi.org/10.1016/j.jacc.2020.11.010

3. Edgardo Olvera Lopez, Jan A. Cardiovascular Disease. National Library of Medicine 2019. https://www.ncbi.nlm.nih.gov/books/NBK535419/.

4. H. Yang, Z. Chen, H. Yang and M. Tian . Predicting coronary heart disease using an improved LightGBM model: Performance analysis and comparison. IEEE Access, vol. 11, pp. 23366-23380, 2023. DOI: https://doi.org/10.1109/ACCESS.2023.3253885

5. Rahaman A, Ashit Kumar Dutta. Developing a Deep-Learning-Based Coronary Artery Disease Detection Technique Using Computer Tomography Images. Diagnostics 2023;13:1312–2. https://doi.org/10.3390/diagnostics13071312. DOI: https://doi.org/10.3390/diagnostics13071312

6. A. Rajkomar, J. Dean, I. Kohane, Machine learning in medicine, N. Engl. J. Med. 380 (14) (2019) 1347–1358. DOI: https://doi.org/10.1056/NEJMra1814259

7. M. Motwani, D. Dey, D.S. Berman, et al., Machine learning for prediction of allcause mortality in patients with suspected coronary artery disease: a 5-year multicentre prospective registry analysis, Eur. Heart J. 38 (2016) 500–507. DOI: https://doi.org/10.1093/eurheartj/ehw188

8. C. Frederic, P.J. Slomka, G. Markus, et al., Machine learning to predict the longterm risk of myocardial infarction and cardiac death based on clinical risk, coronary calcium, and epicardial adipose tissue: a prospective study, Cardiovasc. Res. 116 (14) (2019) 2216–2225. DOI: https://doi.org/10.1093/cvr/cvz321

9. B. Saa, C. Bjm, D. Ag, et al., Machine learning prediction of mortality and hospitalization in heart failure with preserved ejection fraction, JACC (J. Am. Coll. Cardiol.): Heart Fail. 8 (1) (2020) 12–21. DOI: https://doi.org/10.1016/j.jchf.2019.06.013

10. E. Zihni, V.I. Madai, M. Livne, et al., Opening the black box of artificial intelligence for clinical decision support: a study predicting stroke outcome, PloS One (2020) 15. DOI: https://doi.org/10.1371/journal.pone.0231166

11. M. Athanasiou, K. Sfrintzeri, K. Zarkogianni, et al., An Explainable XGBoost–Based Approach towards Assessing the Risk of Cardiovascular Disease in Patients with Type 2 Diabetes Mellitus[C]//2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE), IEEE, 2020. DOI: https://doi.org/10.1109/BIBE50027.2020.00146

12. S.M. Lundberg, B. Nair, M.S. Vavilala, et al., Explainable machine-learning predictions for the prevention of hypoxaemia during surgery, Nature Biomedical Engineering 2 (10) (2018) 749–760. DOI: https://doi.org/10.1038/s41551-018-0304-0

13. F. Cabitza, R. Rasoini, G.F. Gensini, Unintended consequences of machine learning in medicine, J. Am. Med. Assoc. 318 (2017) 517–518. DOI: https://doi.org/10.1001/jama.2017.7797

14. S. Lundberg, S.I. Lee, A Unified Approach to Interpreting Model Predictions[C]// Nips, 2017, pp. 4765–4774.

15. Danso SO, Zeng Z, Muniz-Terrera G, Ritchie CW. Developing an Explainable Machine Learning-Based Personalised Dementia Risk Prediction Model: A Transfer Learning Approach With Ensemble Learning Algorithms. Frontiers in Big Data 2021;4. https://doi.org/10.3389/fdata.2021.613047. DOI: https://doi.org/10.3389/fdata.2021.613047

16. O. Goldman, O. Raphaeli, E. Goldman, and M. Leshno, ‘‘Improvement in the prediction of coronary heart disease risk by using artificial neural networks,’’ Qual. Manage. Health Care, vol. 30, no. 4, pp. 244–250, Jul. 2021, doi: 10.1097/qmh.0000000000000309. DOI: https://doi.org/10.1097/QMH.0000000000000309

17. Z. Du, Y. Yang, J. Zheng, Q. Li, D. Lin, Y. Li, J. Fan, W. Cheng, X.-H. Chen, and Y. Cai, ‘‘Accurate prediction of coronary heart disease for patients with hypertension from electronic health records with big data and machine-learning methods: Model development and performance evaluation,’’ JMIR Med. Informat., vol. 8, no. 7, Jul. 2020, Art. no. e17257, doi: 10.2196/17257. DOI: https://doi.org/10.2196/17257

18. D. Han, K. K. Kolli, S. J. Al’Aref et al., “Machine learning framework to identify individuals at risk of rapid progression of coronary atherosclerosis: from the PARADIGM registry,” Journal of American Heart Association, vol. 9, no. 5, Article ID e013958, 2020. DOI: https://doi.org/10.1161/JAHA.119.013958

19. G. Joo, Y. Song, H. Im, and J. Park, “Clinical implication of machine learning in predicting the occurrence of cardiovascular disease using big data (Nationwide Cohort Data in Korea),” IEEE Access, vol. 8, pp. 157643–157653, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.3015757

20. A. Akella and S. Akella, ‘‘Machine learning algorithms for predicting coronary artery disease: Efforts toward an open source solution,’’ Future Sci. OA, vol. 7, no. 6, Jul. 2021, Art. no. FSO698, doi: 10.2144/fsoa-2020- 0206. DOI: https://doi.org/10.2144/fsoa-2020-0206

21. L. J. Muhammad, I. Al-Shourbaji, A. A. Haruna, I. A. Mohammed, A. Ahmad, and M. B. Jibrin, ‘‘Machine learning predictive models for coronary artery disease,’’ Social Netw. Comput. Sci., vol. 2, no. 5, p. 350, Sep. 2021, doi: 10.1007/s42979-021-00731-4. DOI: https://doi.org/10.1007/s42979-021-00731-4

22. C. A. U. Hassan, J. Iqbal, R. Irfan, S. Hussain, A. D. Algarni, S. S. H. Bukhari, N. Alturki, and S. S. Ullah, ‘‘Effectively predicting the presence of coronary heart disease using machine learning classifiers,’’ Sensors, vol. 22, no. 19, p. 7227, Sep. 2022, doi: 10.3390/s22197227. DOI: https://doi.org/10.3390/s22197227

23. Louridi, Nabaouia Douzi, Samira Ouahidi, Bouabid. (2021). Machine learning-based identification of patients with a cardiovascular defect. Journal of Big Data. 8. 10.1186/s40537-021-00524-9. DOI: https://doi.org/10.1186/s40537-021-00524-9

24. Louridi, Nabaouia Amar, Meryem Ouahidi, Bouabid. (2019). Identification of Cardiovascular Diseases Using Machine Learning. 1-6. 10.1109/CMT.2019.8931411 DOI: https://doi.org/10.1109/CMT.2019.8931411

25. Benchaji, Ibtissam Douzi, Samira Ouahidi, Bouabid. (2019). NOVEL LEARNING STRATEGY BASED ON GENETIC PROGRAMMING FOR CREDIT CARD FRAUD DETECTION IN BIG DATA. 3-10. 10.33965/bigdaci2019201907L001. DOI: https://doi.org/10.33965/bigdaci2019_201907L001

26. El Asry, Chadia Douzi, Samira Ouahidi, Bouabid.Toward a new IDS based on PV-DM (Paragraph Vector-Distributed Memory Approach).

27. Moreno-Sanchez, P.A. Development of an Explainable Prediction Model of Heart Failure Survival by Using Ensemble Trees. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 4902–4910. DOI: https://doi.org/10.1109/BigData50022.2020.9378460

28. Graham, S.A.; Lee, E.E.; Jeste, D.V.; Van Patten, R.; Twamley, E.W.; Nebeker, C.; Depp, C.A. Artificial intelligence approaches to predicting and detecting cognitive decline in older adults: A conceptual review. Psychiatry Res. 2020, 284, 112732. DOI: https://doi.org/10.1016/j.psychres.2019.112732

29. Peng, J.; Zou, K.; Zhou, M.; Teng, Y.; Zhu, X.; Zhang, F.; Xu, J. An Explainable Artificial Intelligence Framework for the Deterioration Risk Prediction of Hepatitis Patients. J. Med. Syst. 2021, 45, 61. DOI: https://doi.org/10.1007/s10916-021-01736-5

30. https://archive.ics.uci.edu/dataset/412/z+alizadeh+sani

31. Alizadehsani, R.; Habibi, J.; Hosseini, M.J.; Mashayekhi, H.; Boghrati, R.; Ghandeharioun, A.; Bahadorian, B.; Sani, Z.A. A Data Mining Approach for Diagnosis of Coronary Artery Disease. Comput. Methods Programs Biomed. 2013, 111, 52–61. [Google Scholar] [CrossRef] [PubMed] DOI: https://doi.org/10.1016/j.cmpb.2013.03.004

32. https://databasecamp.de/en/ml/minmax-scaler-en

33. Plesovskaya, E. & Ivanov, S. An empirical analysis of KDE-based generative models on small datasets. Procedia Comput. Sci. 193, 442–452 (2021). DOI: https://doi.org/10.1016/j.procs.2021.10.046

34. Hernandez-Matamoros, A., Fujita, H. & Perez-Meana, H. A novel approach to create synthetic biomedical signals using BiRNN. Inf. Sci. 541, 218–241 (2020). DOI: https://doi.org/10.1016/j.ins.2020.06.019

35. Han, C., Hayashi, H., Rundo, L., Araki, R., Shimoda, W., Muramatsu, S., Nakayama, H. et al. GAN-based synthetic brain MR image generation, in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), 734–738 (IEEE, 2018). DOI: https://doi.org/10.1109/ISBI.2018.8363678

36. Guan, J., Li, R., Yu, S., & Zhang, X. Generation of synthetic electronic medical record text, in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 374–380 (IEEE, 2018). DOI: https://doi.org/10.1109/BIBM.2018.8621223

37. Xu, L., Skoularidou, M., Cuesta-Infante, A., &Veeramachaneni, K. Modeling tabular data using conditional gan. Adv. Neural Inform. Process. Syst 32 (2019).

38. Kellner, L., Stender, M., Polach, F. V. B. & Ehlers, S. Predicting compressive strength and behavior of ice and analyzing feature importance with explainable machine learning models. Ocean Eng. 255, 111396 (2022). DOI: https://doi.org/10.1016/j.oceaneng.2022.111396

39. Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative Adversarial Networks: An Overview. IEEE Signal Process. Mag. 2018, 35, 53–65. DOI: https://doi.org/10.1109/MSP.2017.2765202

40. Park, S.-W.; Ko, J.-S.; Huh, J.-H.; Kim, J.-C. Review on Generative Adversarial Networks: Focusing on Computer Vision and Its Applications. Electronics 2021, 10, 1216. DOI: https://doi.org/10.3390/electronics10101216

41. Cauli, N.; Recupero, D.R. Survey on Videos Data Augmentation for Deep Learning Models. Futur. Internet 2022, 14, 93. DOI: https://doi.org/10.3390/fi14030093

42. Ali-Gombe, A.; Elyan, E.; Savoye, Y.; Jayne, C. Few-shot classifier GAN. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. DOI: https://doi.org/10.1109/IJCNN.2018.8489387

43. Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006). DOI: https://doi.org/10.1126/science.1127647

44. Higgins, I. et al. Beta-vae: Learning basic visual concepts with a constrained variational framework (2016).

45. Kingma, D. P., Max, W. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)

46. Srinivas and R. Katarya, "hyOPTXg: OPTUNA hyper-parameter optimization framework for predicting cardiovascular disease using XGBoost", Biomed. Signal Process. Control, vol. 73, Mar. 2022. DOI: https://doi.org/10.1016/j.bspc.2021.103456

47. https://machinelearningmastery.com/repeated-k-fold-cross-validation-withpython

48. 48. Shahid, A.H.; Singh, M.P. A Novel Approach for Coronary Artery Disease Diagnosis using Hybrid Particle Swarm Optimization based Emotional Neural Network. Biocybern. Biomed. Eng. 2020, 40, 1568–1585. DOI: https://doi.org/10.1016/j.bbe.2020.09.005

49. Zhang, S.; Yuan, Y.; Yao, Z.; Wang, X.; Lei, Z. Improvement of the Performance of Models for Predicting Coronary Artery Disease Based on XGBoost Algorithm and Feature Processing Technology. Electronics 2022, 11, 315. DOI: https://doi.org/10.3390/electronics11030315

50. Molnar, C. Interpretable Machine Learning. Lulu.com (2020).

51. S. Lundberg and S. Lee, “A Unified approach to interpreting model predictions,” in 31st Conf. on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, pp. 1–10, 2017.

52. Kursa MB, Rudnicki WR. Feature selection with the boruta package. J Stat Softw. 2010;36:1–13. DOI: https://doi.org/10.18637/jss.v036.i11

53. Brown JC, Gerhardt TE, Kwon E. Risk Factors For Coronary Artery Disease. PubMed 2023. https://www.ncbi.nlm.nih.gov/books/NBK554410/.

54. Jousilahti P, Vartiainen E, Tuomilehto J, Puska P. Sex, Age, Cardiovascular Risk Factors, and coronary heart disease. Circulation 1999;99:1165–72. https://doi.org/10.1161/01.cir.99.9.1165. DOI: https://doi.org/10.1161/01.CIR.99.9.1165

55. Xia T, Li Y, Huang F, Chai H, Huang B, Li Q, et al. The triglyceride paradox in the mortality of coronary artery disease. Lipids in Health and Disease 2019;18. https://doi.org/10.1186/s12944-019-0972-0. DOI: https://doi.org/10.1186/s12944-019-0972-0

56. Dzoyem JP, Kuete V, Eloff JN. 23 - Biochemical Parameters in Toxicological Studies in Africa: Significance, Principle of Methods, Data Interpretation, and Use in Plant Screenings. ScienceDirect 2014:659–715. https://www.sciencedirect.com/science/article/abs/pii/B9780128000182000236. DOI: https://doi.org/10.1016/B978-0-12-800018-2.00023-6

57. Kim Y, Kim TJ, Lee S-H. Cardiac wall motion abnormality as a predictor for undetermined stroke with embolic lesion-pattern. Clinical Neurology and Neurosurgery 2020;191:105677. https://doi.org/10.1016/j.clineuro.2020.105677. DOI: https://doi.org/10.1016/j.clineuro.2020.105677

58. National Heart, Lung and Blood Institute . Angina (Chest Pain) - Causes and Risk Factors | NHLBI, NIH. Wwwnhlbinihgov 2022. https://www.nhlbi.nih.gov/health/angina/causes.

59. Moezi A, Soltani M, Kazemi T, Bizahem SK, Amirabadizadeh N, Hanafi N, et al. Risk Factors Associated With the Extent of Coronary Vessel Involvement Across the Spectrum of Coronary Artery Disease. Modern Care Journal 2020;17. https://doi.org/10.5812/modernc.104261. DOI: https://doi.org/10.5812/modernc.104261

60. Petrie JR, Sattar N. Excess Cardiovascular Risk in Type 1 Diabetes Mellitus. Circulation 2019;139:744–7. https://doi.org/10.1161/circulationaha.118.038137. DOI: https://doi.org/10.1161/CIRCULATIONAHA.118.038137

61. Ye Z, Lu H, Li L. Reduced Left Ventricular Ejection Fraction Is a Risk Factor for In-Hospital Mortality in Patients after Percutaneous Coronary Intervention: A Hospital-Based Survey. Biomed Res Int. 2018 Dec 5;2018:8753176. doi: 10.1155/2018/8753176. DOI: https://doi.org/10.1155/2018/8753176

62. Gaviño Contreras, J., Ultreras Rodríguez, A., & Sánchez Gaviño, A. (2023). Organizational Behavior for the Integral Human Balance since NOM-035 in post-COVID-19 pandemic scenario. Revista Científica Empresarial Debe-Haber, 1(2), 41–57.

63. Rodríguez-Pérez JA. Strengthening the Implementation of the One Health Approach in the Americas: Interagency Collaboration, Comprehensive Policies, and Information Exchange. Seminars in Medical Writing and Education 2022;1:11-11. https://doi.org/10.56294/mw202211. DOI: https://doi.org/10.56294/mw202211

64. Farhaoui, Y., “Intrusion prevention system inspired immune systems” Indonesian Journal of Electrical Engineering and Computer Science 2016; 2(1):168–179. DOI: https://doi.org/10.11591/ijeecs.v2.i1.pp168-179

65. Inastrilla CRA. Big Data in Health Information Systems. Seminars in Medical Writing and Education 2022;1:6-6. https://doi.org/10.56294/mw20226. DOI: https://doi.org/10.56294/mw20226

66. Farhaoui, Y. and All, Big Data Mining and Analytics, 2022, 5(4), pp. I IIDOI: 10.26599/BDMA.2022.9020004 DOI: https://doi.org/10.26599/BDMA.2022.9020004

67. Alaoui, S.S., and all. "Hate Speech Detection Using Text Mining and Machine Learning", International Journal of Decision Support System Technology, 2022, 14(1), 80. DOI: 10.4018/IJDSST.286680 DOI: https://doi.org/10.4018/IJDSST.286680

68. Alaoui, S.S., and all. ,"Data openness for efficient e-governance in the age of big data", International Journal of Cloud Computing, 2021, 10(5-6), pp. 522–532, https://doi.org/10.1504/IJCC.2021.120391 DOI: https://doi.org/10.1504/IJCC.2021.120391

69. El Mouatasim, A., and all. "Nesterov Step Reduced Gradient Algorithm for Convex Programming Problems", Lecture Notes in Networks and Systems, 2020, 81, pp. 140–148. https://doi.org/10.1007/978-3-030-23672-4_11 DOI: https://doi.org/10.1007/978-3-030-23672-4_11

70. Tarik, A., and all."Recommender System for Orientation Student" Lecture Notes in Networks and Systems, 2020, 81, pp. 367–370. https://doi.org/10.1007/978-3-030-23672-4_27 DOI: https://doi.org/10.1007/978-3-030-23672-4_27

71. Sossi Alaoui, S., and all. "A comparative study of the four well-known classification algorithms in data mining", Lecture Notes in Networks and Systems, 2018, 25, pp. 362–373. https://doi.org/10.1007/978-3-319-69137-4_32 DOI: https://doi.org/10.1007/978-3-319-69137-4_32

Downloads

Published

2023-12-29

Issue

Section

Original

How to Cite

1.
Nabaouia L, Douzi S, Bouabid EO. Explainable machine learning for coronary artery disease risk assessment and prevention. Data and Metadata [Internet]. 2023 Dec. 29 [cited 2026 Jan. 9];2:65. Available from: https://dm.ageditor.ar/index.php/dm/article/view/159