Improving Student Graduation Timeliness Prediction Using SMOTE and Ensemble Learning with Stacking and GridSearchCV Optimization

Authors

DOI:

https://doi.org/10.56294/dm2025917

Keywords:

Student Graduation, Ensemble Learning, SMOTE, Stacking, GridSearchCV, Machine Learning

Abstract

Introduction: Timely graduation is a key performance indicator in higher education. This study aims to improve the accuracy of predicting student graduation timeliness using ensemble machine learning techniques combined with SMOTE and hyperparameter optimization.
Methods: This is a quantitative predictive study. The population includes students and alumni of Universitas Islam Riau. A sample of 160 respondents was obtained via purposive sampling. Data were collected using structured questionnaires encompassing academic variables (e.g., GPA, credits, attendance) and non-academic variables (e.g., stress, social support, extracurricular activity). After preprocessing and label encoding, SMOTE was applied to balance class distribution. Several classifiers (Naïve Bayes, SVM, Decision Tree, KNN) were tested, with ensemble learning (voting and stacking) implemented and optimized using GridSearchCV.
Results: The stacking ensemble model optimized with GridSearchCV achieved the highest performance with an accuracy of 99.37%, precision and recall above 0.99, and minimal misclassification. This outperformed individual models and previous approaches in the literature. 
Conclusions: The integration of SMOTE, ensemble methods, and GridSearchCV significantly enhances predictive accuracy for student graduation timeliness. The resulting model provides a robust framework for academic risk detection and early intervention.

References

1. Bakri R, Astuti NP, Ahmar AS. Machine Learning Algorithms with Parameter Tuning to Predict Students’ Graduation-on-time: A Case Study in Higher Education. J Appl Sci Eng Technol Educ. 2022 Dec 30;4(2):259–65.

2. Casanova VS, Pullido ML. Factors Of Graduate Students’ Attrition And Retention In Occidental Mindoro State College Graduate School. IJERSC. 3(2):826–31.

3. López-Meneses E, López-Catalán L, Pelícano-Piris N, Mellado-Moreno PC. Artificial Intelligence in Educational Data Mining and Human-in-the-Loop Machine Learning and Machine Teaching: Analysis of Scientific Knowledge. Applied Sciences. 2025 Jan 14;15(2):1–21.

4. Ersozlu Z, Taheri S, Koch I. A review of machine learning methods used for educational data. Educ Inf Technol. 2024 Nov;29(16):22125–45.

5. Taye MM. Understanding of Machine Learning with Deep Learning: Architectures, Workflow, Applications and Future Directions. Computers. 2023 Apr 25;12(5):1–26.

6. Mehta S. Playing Smart with Numbers: Predicting Student Graduation Using the Magic of Naive Bayes. ITALIC. 2023 Nov 23;2(1):60–75.

7. Darenoh NV, Bachtiar FA, Perdana RS. Prediction of On-time Student Graduation with Deep Learning Method: -. J ICT Res Appl. 2024 Jun 27;18(1):1–20.

8. Desfiandi A, Soewito B. Student Graduation Time Prediction Using Logistic Regression, Decision Tree, Support Vector, and Adaboost Ensemble Learning. International Journal of Information System and Computer Science. 7(3):195–9.

9. Haikal MF, Palupi I. Predicting Employability of University Graduates Using Support Vector Machine Classification. Building of Informatics, Technology and Science. 2024;6(2).

10. Rismayati R, Ismarmiaty I, Hidayat S. Esemble Implementation for Predicting Student Graduation withClassification Algorithm. IJECSA. 1(1):35–42.

11. Anam MK, Putra PP, Malik RA, Putra TA, Elva Y, Mahessya RA, et al. Enhancing the Performance of Machine Learning Algorithm for Intent Sentiment Analysis on Village Fund Topic. Journal of Applied Data Sciences. 2025;6(2):1102–15.

12. Sharma H, Pangaonkar S, Gunjan R, Rokade P. Sentimental Analysis of Movie Reviews Using Machine Learning. Shah H, Patel R, Patel N, Buyya R, Chatterjee I, editors. ITM Web Conf. 2023;53:02006.

13. Anam MK, Firdaus MB, Suandi F, Lathifah, Nasution T, Fadly S. Performance Improvement of Machine Learning Algorithm Using Ensemble Method on Text Mining. In: 2024 International Conference on Future Technologies for Smart Society (ICFTSS) [Internet]. Kuala Lumpur, Malaysia: IEEE; 2024 [cited 2025 Mar 16]. p. 90–5. Available from: https://ieeexplore.ieee.org/document/10691363/

14. Putra PP, Anam MK, Defit S, Yunianta A. Enhancing the Decision Tree Algorithm to Improve Performance Across Various Datasets. intensif. 2024 Aug 1;8(2):200–12.

15. Danny M, Muhidin A, Jamal A. Application of the K-Nearest Neighbor Machine Learning Algorithm to Preduct Sales of Best-Selling Products. Brilliance. 2024 Jun 28;4(1):255–64.

16. Anam MK, Lestari TP, Yenni H, Nasution T, Firdaus MB. Enhancement of Machine Learning Algorithm in Fine-grained Sentiment Analysis Using the Ensemble. ECTI-CIT Transactions. 2025 Mar 8;19(2):159–67.

17. Van Fc LL, Anam MK, Bukhori S, Mahamad AK, Saon S, Nyoto RLV. The Development of Stacking Techniques in Machine Learning for Breast Cancer Detection. J Appl Data Sci. 2024 Jan 1;6(1):71–85.

18. Munthe IR, Rambe BH, Hanum F, Amanda AT, Hutagaol ASR, Harianto R. Implementation of Stacking Technique Combining Machine Learning and Deep Learning Algorithms Using SMOTE to Improve Stock Market Prediction Accuracy. J Appl Data Sci. 2024 Dec 1;5(4):2079–91.

19. Anam MK, Van Fc LL, Hamdani H, Rahmaddeni R, Junadhi J, Firdaus MB, et al. Sara Detection on Social Media Using Deep Learning Algorithm Development. JAETS. 2024 Dec 15;6(1):225–37.

20. Alemerien K, Alsarayreh S, Altarawneh E. Diagnosing Cardiovascular Diseases using Optimized Machine Learning Algorithms with GridSearchCV. J Appl Data Sci. 2024 Dec 1;5(4):1539–52.

21. Bolikulov F, Nasimov R, Rashidov A, Akhmedov F, Cho YI. Effective Methods of Categorical Data Encoding for Artificial Intelligence Algorithms. Mathematics. 2024 Aug 18;12(16):1–21.

22. Hien DTT, Thi C, Kim T, The D, Nguyen C. Optimize the Combination of Categorical Variable Encoding and Deep Learning Technique for the Problem of Prediction of Vietnamese Student Academic Performance. IJACSA. 2020;11(11):274–80.

23. Anam MK, Munawir M, Efrizoni L, Fadillah N, Agustin W, Syahputra I, et al. Improved Performance of Hybrid GRU-BiLSTM for Detection Emotion on Twitter Dataset. J Appl Data Sci. 2024 Jan 1;6(1):354–65.

24. Suandi F, Anam MK, Firdaus MB, Fadli S, Lathifah L, Yumami E, et al. Enhancing Sentiment Analysis Performance Using SMOTE and Majority Voting in Machine Learning Algorithms. In: Lumombo L, Rahmi A, Suwarno S, Ardi N, Kurniawan DE, editors. Proceedings of the 7th International Conference on Applied Engineering (ICAE 2024) [Internet]. Dordrecht: Atlantis Press International BV; 2024 [cited 2025 Mar 25]. p. 126–38. (Advances in Engineering Research; vol. 251). Available from: https://www.atlantis-press.com/doi/10.2991/978-94-6463-620-8_10

25. Herianto H, Kurniawan B, Hartomi ZH, Irawan Y, Anam MK. Machine Learning Algorithm Optimization using Stacking Technique for Graduation Prediction. J Appl Data Sci. 2024 Sep 1;5(3):1272–85.

26. Jumanto J, Rofik R, Sugiharti E, Alamsyah A, Arifudin R, Prasetiyo B, et al. Optimizing Support Vector Machine Performance for Parkinson’s Disease Diagnosis Using GridSearchCV and PCA-Based Feature Extraction. J Inf Syst Eng Bus Intell. 2024 Feb 28;10(1):38–50.

27. Rahmaddeni R, Anam MK, Irawan Y, Susanti S, Jamaris M. Comparison of Support Vector Machine and XGBSVM in Analyzing Public Opinion on Covid-19 Vaccination. Ilk J Ilm. 2022 Apr 30;14(1):32–8.

28. Ulfah AN, Anam MK, Sidratul Munti NY, Yaakub S, Firdaus MB. Sentiment Analysis of the Convict Assimilation Program on Handling Covid-19. JUITA. 2022 Nov 14;10(2):209–16.

29. Rahmiati R, Anam MK, Paradila D, Mardainis M, Machdalena M. Application of Naïve Bayes Algorithm for Non-Cash Food Assistance Recipients in Kampar Regency. SinkrOn. 2023 Jan 4;8(1):433–41.

30. Putra PP, Anam MK, Chan AS, Hadi A, Hendri N, Masnur A. Optimizing Sentiment Analysis on Imbalanced Hotel Review Data Using SMOTE and Ensemble Machine Learning Techniques. J Appl Data Sci. 2025 May 1;6(2):921–35.

Downloads

Published

2025-04-25

Issue

Section

Original

How to Cite

1.
Efendi A, Fitri I, Nurcahyo GW. Improving Student Graduation Timeliness Prediction Using SMOTE and Ensemble Learning with Stacking and GridSearchCV Optimization. Data and Metadata [Internet]. 2025 Apr. 25 [cited 2025 May 23];4:917. Available from: https://dm.ageditor.ar/index.php/dm/article/view/917