Predicting Surabaya's Rainfall: A Comparative Study of Naïve Bayes, K-Nearest Neighbor, and Random Forest

Authors

  • Arip Ramadan Information System Study Program, School of Industrial and System Engineering, Telkom University Surabaya Campus. Surabaya, 60231, Indonesia. Author https://orcid.org/0009-0000-6762-7498
  • Muhammad Axel Syahputra Data Science Technology Study Program, Faculty of Advanced Technology and Multidiscipline, Universitas Airlangga. Surabaya, 60115, Indonesia. Author
  • Dwi Rantini Data Science Technology Study Program, Faculty of Advanced Technology and Multidiscipline, Universitas Airlangga. Surabaya, 60115, Indonesia. Author https://orcid.org/0000-0002-5408-3038
  • Ratih Ardiati Ningrum Data Science Technology Study Program, Faculty of Advanced Technology and Multidiscipline, Universitas Airlangga. Surabaya, 60115, Indonesia. Author https://orcid.org/0000-0002-2659-9328
  • Muhammad Noor Fakhruzzaman Data Science Technology Study Program, Faculty of Advanced Technology and Multidiscipline, Universitas Airlangga. Surabaya, 60115, Indonesia. Author https://orcid.org/0000-0002-9981-0809
  • Aziz Fajar Data Science Technology Study Program, Faculty of Advanced Technology and Multidiscipline, Universitas Airlangga. Surabaya, 60115, Indonesia. Author https://orcid.org/0000-0002-1393-3470
  • Maryamah Data Science Technology Study Program, Faculty of Advanced Technology and Multidiscipline, Universitas Airlangga. Surabaya, 60115, Indonesia. Author https://orcid.org/0000-0001-9540-4427
  • Muhammad Mahdy Yandra Data Science Technology Study Program, Faculty of Advanced Technology and Multidiscipline, Universitas Airlangga. Surabaya, 60115, Indonesia. Author https://orcid.org/0009-0007-1843-1465
  • Najma Attaqiya Alya Data Science Technology Study Program, Faculty of Advanced Technology and Multidiscipline, Universitas Airlangga. Surabaya, 60115, Indonesia. Author https://orcid.org/0009-0002-6262-6212
  • Mochammad Fahd Ali Hillaby Data Science Technology Study Program, Faculty of Advanced Technology and Multidiscipline, Universitas Airlangga. Surabaya, 60115, Indonesia. Author
  • Alhassan Sesay Faculty of Transformative Education, the United Methodist University, Sierra Leone. Author

DOI:

https://doi.org/10.56294/dm20251075

Keywords:

Rainfall, Classification, Naïve Bayes Classifier, K-Nearest Neighbor, Random Forest, Climate Prediction

Abstract

Introduction:
Accurate rainfall prediction plays a critical role in climate change adaptation, particularly in mitigating the risks of extreme droughts and floods. Reliable forecasts support sustainable water resource and agricultural management, contributing to reduced socio-economic vulnerability. This study aims to analyze rainfall conditions in Surabaya City and evaluate the performance of three classification methods to determine the most effective model for rainfall classification.
Methods:
This is a descriptive observational study using secondary data from the Meteorology, Climatology, and Geophysics Agency Maritime Station in Surabaya, covering the period from January 2019 to December 2023. The dataset consists of 1,822 daily weather observations, including rainfall, sunshine duration, temperature, wind speed, and humidity. After preprocessing, the rainfall variable was categorized into multiple classes. Three classification methods—Naïve Bayes, K-Nearest Neighbor, and Random Forest—were applied. Model performance was evaluated using accuracy, precision, recall, AUC-ROC, and loss function values.
Results:
All models achieved high accuracy, exceeding 0.93. Although Naïve Bayes showed slightly lower accuracy than the other two methods, it had the highest AUC-ROC and the lowest loss function value, indicating better class discrimination and generalization.
Conclusions:
The Naïve Bayes classifier is the most effective method for rainfall classification in Surabaya City. Among the predictor variables, sunshine duration is identified as the most influential factor in rainfall classification, followed by humidity, temperature, and wind speed

References

1. Bluestein HB, Carr FH, Goodman SJ. Atmospheric observations of weather and climate. Atmosphere-Ocean. Taylor & Francis; 2022;60(3 4):149 87.

2. Fowler HJ, Ali H, Allan RP, Ban N, Barbero R, Berg P, et al. Towards advancing scientific knowledge of climate change impacts on short-duration rainfall extremes. Philosophical Transactions of the Royal Society A. The Royal Society Publishing; 2021;379(2195):20190542.

3. Firdiyan N, Muntini MS. The Effect Of Rainfall On The Detection Of Standing Water On Runway. Dans: Journal of Physics: Conference Series. IOP Publishing; 2021. p. 012034.

4. Xu T, Liang F. Machine learning for hydrologic sciences: An introductory overview. Wiley Interdisciplinary Reviews: Water. Wiley Online Library; 2021;8(5):e1533.

5. Mistry MN, Schneider R, Masselot P, Royé D, Armstrong B, Kyselý J, et al. Comparison of weather station and climate reanalysis data for modelling temperature-related mortality. Sci Rep. Nature Publishing Group UK London; 2022;12(1):5178.

6. Huang M, Lin R, Huang S, Xing T. A novel approach for precipitation forecast via improved K-nearest neighbor algorithm. Advanced Engineering Informatics. Elsevier; 2017;33:89 95.

7. Pandey R, Upadhya M, Singh M. Rainfall Prediction Using Logistic Regression and Random Forest Algorithm. Dans: 2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT). IEEE; 2024. p. 663 8.

8. Shaji A, Amritha AR, Rajalakshmi VR. Weather prediction using machine learning algorithms. Dans: 2022 International Conference on Intelligent Controller and Computing for Smart Power (ICICCSP). IEEE; 2022. p. 1 5.

9. Chen PY, Tung CP, Tsao JH, Chen CJ. Assessing future rainfall intensity–duration–frequency characteristics across Taiwan using the k-nearest neighbor method. Water (Basel). MDPI; 2021;13(11):1521.

10. Berrar D. Bayes’ theorem and naive Bayes classifier. Elsevier (In Press); 2025;

11. Farid DM, Zhang L, Rahman CM, Hossain MA, Strachan R. Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks. Expert Syst Appl. Elsevier; 2014;41(4):1937 46.

12. Suyal M, Goyal P. A review on analysis of k-nearest neighbor classification machine learning algorithms based on supervised learning. International Journal of Engineering Trends and Technology. Seventh Sense Research Group; 2022;70(7):43 8.

13. Dhanabal S, Chandramathi S. A review of various k-nearest neighbor query processing techniques. Int J Comput Appl. Citeseer; 2011;31(7):14 22.

14. Deng Z, Zhu X, Cheng D, Zong M, Zhang S. Efficient kNN classification algorithm for big data. Neurocomputing. Elsevier; 2016;195:143 8.

15. Salman HA, Kalakech A, Steiti A. Random forest algorithm overview. Babylonian Journal of Machine Learning. 2024;2024:69 79.

16. Krishnan R, Sivakumar G, Bhattacharya P. Extracting decision trees from trained neural networks. Pattern Recognit. Elsevier; 1999;32(12).

17. Pham TA, Tran VQ. Developing random forest hybridization models for estimating the axial bearing capacity of pile. PLoS One. Public Library of Science San Francisco, CA USA; 2022;17(3):e0265747.

18. Fahmy Amin M. Confusion matrix in three-class classification problems: A step-by-step tutorial. Journal of Engineering Research. Tanta University, Faculty of Engineering; 2023;7(1):0.

19. Espino-Salinas CH, Galván-Tejada CE, Luna-García H, Gamboa-Rosales H, Celaya-Padilla JM, Zanella-Calzada LA, et al. Two-dimensional convolutional neural network for depression episodes detection in real time using motor activity time series of depresjon dataset. Bioengineering. MDPI; 2022;9(9):458.

20. Wardhani SG, Kurniawati A. Implementation of K-Nearest Neighbor Algorithm for Creditworthiness Analysis Using Methods Cross-Industry Standard Process for Data Mining (CRISP-DM). Science (1979). 2025;10(1):152 7.

21. Gowdra N, Sinha R, MacDonell S, Yan W. Maximum Categorical Cross Entropy (MCCE): A noise-robust alternative loss function to mitigate racial bias in Convolutional Neural Networks (CNNs) by reducing overfitting. 2021;

Downloads

Published

2025-06-12

Issue

Section

Original

How to Cite

1.
Ramadan A, Syahputra MA, Rantini D, Ningrum RA, Fakhruzzaman MN, Fajar A, et al. Predicting Surabaya’s Rainfall: A Comparative Study of Naïve Bayes, K-Nearest Neighbor, and Random Forest. Data and Metadata [Internet]. 2025 Jun. 12 [cited 2025 Jul. 4];4:1075. Available from: https://dm.ageditor.ar/index.php/dm/article/view/1075