Advanced Ensemble Machine Learning Techniques for Optimizing Diabetes Mellitus Prognostication: A Detailed Examination of Hospital Data
DOI:
https://doi.org/10.56294/dm2024.363Keywords:
Machine Learning Algorithms, Ensemble Models, Diabetes Prediction, Data Mining, Predictive Accuracy, Health InformaticsAbstract
Diabetes is a chronic disease that affects millions of people worldwide. Early diagnosis and effective management are crucial for reducing its complications. Diabetes is the fourth-highest cause of mortality due to its association with various comorbidities, including heart disease, nerve damage, blood vessel damage, and blindness. The potential of machine learning algorithms in predicting Diabetes and related conditions is significant, and mining diabetes data is an efficient method for extracting new insights.
The primary objective of this study is to develop an enhanced ensemble model to predict Diabetes with improved accuracy by leveraging various machine learning algorithms.
This study tested several popular machine learning algorithms commonly used in diabetes prediction, including Naive Bayes (NB), Generalized Linear Model (GLM), Logistic Regression (LR), Fast Large Margin (FLM), Deep Learning (DL), Decision Tree (DT), Random Forest (RF), Gradient Boosted Trees (GBT), and Support Vector Machine (SVM). The performance of these algorithms was compared, and two different ensemble techniques—stacking and voting—were used to build a more accurate predictive model.
The top three algorithms based on accuracy were Deep Learning, Naive Bayes, and Gradient Boosted Trees. The machine learning algorithms revealed that individuals with Diabetes are significantly affected by the number of chronic conditions they have, as well as their gender and age. The ensemble models, particularly the stacking method, provided higher accuracy than individual algorithms. The stacking ensemble model achieved a slightly better accuracy of 99.94% compared to 99.34% for the voting method.
Building an ensemble model significantly increased the accuracy of predicting Diabetes and related conditions. The stacking ensemble model, in particular, demonstrated superior performance, highlighting the importance of combining multiple machine learning approaches to enhance predictive accuracy
References
1. World Health Organization. Diabetes [Internet]. 2021 [cited 2021 Jan 4]. Available from: https://www.who.int/news-room/fact-sheets/detail/diabetes
2. Runkler TA. Data Mining. Wiesbaden: Vieweg+Teubner; 2010.
3. Chaves L, Gonçalo M. Data mining techniques for early diagnosis of diabetes: A comparative study. Appl Sci. 2021;11(5):2218.
4. Guariguata L, Whiting DR, Hambleton I, Beagley J, Linnenkamp U, Shaw JE. Global estimates of diabetes prevalence for 2013 and projections for 2035. Diabetes Res Clin Pract. 2014;103(2):137–49.
5. Fiarni C, Sipayung EM, Maemunah S. Analysis and prediction of diabetes complication disease using data mining algorithm. Procedia Comput Sci. 2019;161:449–57.
6. Ajlouni K, et al. Time trends in diabetes mellitus in Jordan between 1994 and 2017. Diabet Med. 2019;36(9):1176–82.
7. Blair M. Diabetes mellitus review. Urol Nurs. 2016;36(1).
8. Yuvaraj D, Mavaluru D, Sivaram M, Nageswari S. An efficient data mining process on temporal data using relevance feedback method. World Rev Sci Technol Sustain Dev. 2022;18(1):20–30.
9. Munoz-Gama J, et al. Process mining for healthcare: Characteristics and challenges. J Biomed Inform. 2022;127:103994.
10. Kuatbayeva AA, Izteleuov NE, Kabdoldin A, Abdyzhalilova R. Data mining models for healthcare. Adv Technol Comput Sci. 2022;3:11–7.
11. Durugkar SR, Raja R, Nagwanshi KK, Kumar S. Introduction to data mining. In: Data Mining and Machine Learning Applications. 2022. p. 1–19.
12. Mavrogiorgou A, Kiourtis A, Manias G, Kyriazis D. An optimized KDD process for collecting and processing ingested and streaming healthcare data. In: 12th Int Conf on Information and Communication Systems (ICICS). IEEE; 2021. p. 49–56.
13. Traymbak S, Issar N. Data Mining Algorithms in Knowledge Management for Predicting Diabetes After Pregnancy by Using R. Indian J Comput Sci Eng. 2021;12(6).
14. Healthcare Cost and Utilization Project (HCUP). Agency for Healthcare Research and Quality. [Internet]. 2021 [cited 2021 Jan 4]. Available from: https://www.ahrq.gov/data/hcup/index.html
15. Al-Shanableh N. Using data mining to investigate hospitalization experiences of Parkinson’s disease patients. ProQuest Dissertations Publishing; 2018.
16. Herle H, Padmaja KV. Relative merits of data mining algorithms of chronic kidney diseases. Int J Adv Comput Sci Appl. 2021;12(6):575–83.
17. Karrar AE. Investigate the ensemble model by intelligence analysis to improve the accuracy of the classification data in the diagnostic and treatment interventions for prostate cancer. Int J Adv Comput Sci Appl. 2022;13(1):181–8.
18. Tarawneh O, Otair M, Husni M, Abuaddous HY, Tarawneh M, Almomani MA. Comparative analysis of machine learning algorithms for heart disease predictions. Int J Adv Comput Sci Appl. 2022;13(4):1340–4.
19. Maliha SK, Mahmood MA. An efficient model for early prediction of diabetes utilizing classification algorithm. In: 6th Int Conf on Intelligent Computing and Control Systems (ICICCS). IEEE; 2022. p. 1607–11.
20. Anil KS, Jain R. Data mining techniques in diabetes prediction and diagnosis: A review. In: 6th Int Conf on Trends in Electronics and Informatics (ICOEI). IEEE; 2022. p. 1696–701.
21. The Middle East and North Africa. In: IDF Diabetes Atlas. 10th ed. 2022. p. 2000–45.
22. Huang K, Yang H, Zhu X, et al. Ensemble deep learning for COVID-19 diagnosis using chest CT scan images. IEEE Trans Med Imaging. 2020;39(8):2572–83.
23. Al Diabat M, Al-Shanableh N. Ensemble learning model for screening autism in children. Int J Comput Sci Inf Technol. 2019;11:45–62.
24. Alzyoud M, et al. Diagnosing diabetes mellitus using machine learning techniques. Int J Data Netw Sci. 2024;8(1):179–88.
25. Alsubihat D, Al-shanableh N. Predicting Student’s Performance Using Combined Heterogeneous Classification Models. Int J Eng Res Appl. 2023;13(4):206–18.
26. Al-shanableh N, et al. Data Mining to Reveal Factors Associated with Quality of life among Jordanian Women with Breast Cancer. 2023;6:1–6.
27. Ababneh A, Al-shanableh N, Alzyoud M. A Review of Algorithms and Techniques for Analyzing Big Data. Int J Emerg Trends Eng Res. 2021;9(6):695–702.
28. Abu Salimeh A, Al-shanableh N, Alzyoud M. Natural Language Processing and Parallel Computing for Information Retrieval from Electronic Health Records. In: ITM Web Conf. 2022;42:01013.
29. Alghamdi A, Alshammari I. Diabetes Prediction Using Machine Learning Techniques. In: 2nd Int Conf on Computer Applications & Information Security (ICCAIS). IEEE; 2020. p. 1–6.
30. Yadav N, Tiwari A, Pal NR. Machine Learning Based Diabetes Prediction Using Clinical Data. In: 9th Int Conf on Cloud Computing Data Science & Engineering - Confluence. IEEE; 2019. p. 424–9.
31. Qureshi MA, Azad AKMA. Diabetes risk factor identification using machine learning techniques. In: Int Conf on Electrical Computer and Communication Engineering (ECCE). IEEE; 2019. p. 1–6.
32. Dheeraj K, Murugesan PR. Machine Learning based Risk Prediction for Type 2 Diabetes. In: Int Conf on Intelligent Techniques and Control (ITC). IEEE; 2020. p. 1–6.
33. Bano S, Siddiqui MH, Raza M, Raza MA. Diabetes Prediction and Risk Factors Identification using Machine Learning. In: Int Conf on Computer and Communication Technologies (IC3T). IEEE; 2020. p. 1–6.
34. Chen H, Li H, Huang G, Liu X, Xu J. A hybrid deep learning approach for accurate breast cancer diagnosis. IEEE Access. 2019;7:76314–23.
35. Surya DSK, Bhowmik SK, Kundu MK. Prediction of Heart Disease Using Machine Learning Algorithms: A Survey. IEEE Access. 2020;8:160504–18.
36. Rashid NS, Yahya SW, Razak RA, Hanafi FF. Deep Learning Techniques for Disease Detection and Classification: A Survey. IEEE Access. 2020;8:149937–65.
37. Qureshi MA, Islam MA, Ali MI. Machine Learning Techniques for Disease Diagnosis: A Review. In: 2nd Int Conf on Computing Mathematics and Engineering Technologies (iCoMET). IEEE; 2019. p. 1–6.
38. Chowdary SGS, Annapurna RGVJL. Machine Learning Algorithms for Disease Diagnosis: A Comprehensive Review. In: 5th Int Conf on Advanced Computing & Communication Systems (ICACCS). IEEE; 2019. p. 1009–14.
39. Wu Y, Liu X, Zhang C, et al. An ensemble model for the prediction of breast cancer using gene expression data. IEEE Access. 2018;6:16103–11.
40. Salehi M, Gandomi AH, Aghaei AH, Mirjalili SA. A novel ensemble machine learning approach for diagnosing and treating diseases. IEEE Access. 2019;7:55256–64
Published
Issue
Section
License
Copyright (c) 2024 Najah Al-shanableh , Mazen Alzyoud , Raya Yousef Al-husban , Nail M. Alshanableh , Ashraf Al-Oun , Mohammad Subhi Al-Batah , Mowafaq Salem Alzboon (Author)
This work is licensed under a Creative Commons Attribution 4.0 International License.
The article is distributed under the Creative Commons Attribution 4.0 License. Unless otherwise stated, associated published material is distributed under the same licence.