Predicting Surabaya's Rainfall: A Comparative Study of Naïve Bayes, K-Nearest Neighbor, and Random Forest
DOI:
https://doi.org/10.56294/dm20251075Keywords:
Rainfall, Classification, Naïve Bayes Classifier, K-Nearest Neighbor, Random Forest, Climate PredictionAbstract
Introduction:
Accurate rainfall prediction plays a critical role in climate change adaptation, particularly in mitigating the risks of extreme droughts and floods. Reliable forecasts support sustainable water resource and agricultural management, contributing to reduced socio-economic vulnerability. This study aims to analyze rainfall conditions in Surabaya City and evaluate the performance of three classification methods to determine the most effective model for rainfall classification.
Methods:
This is a descriptive observational study using secondary data from the Meteorology, Climatology, and Geophysics Agency Maritime Station in Surabaya, covering the period from January 2019 to December 2023. The dataset consists of 1,822 daily weather observations, including rainfall, sunshine duration, temperature, wind speed, and humidity. After preprocessing, the rainfall variable was categorized into multiple classes. Three classification methods—Naïve Bayes, K-Nearest Neighbor, and Random Forest—were applied. Model performance was evaluated using accuracy, precision, recall, AUC-ROC, and loss function values.
Results:
All models achieved high accuracy, exceeding 0.93. Although Naïve Bayes showed slightly lower accuracy than the other two methods, it had the highest AUC-ROC and the lowest loss function value, indicating better class discrimination and generalization.
Conclusions:
The Naïve Bayes classifier is the most effective method for rainfall classification in Surabaya City. Among the predictor variables, sunshine duration is identified as the most influential factor in rainfall classification, followed by humidity, temperature, and wind speed
References
1. Bluestein HB, Carr FH, Goodman SJ. Atmospheric observations of weather and climate. Atmosphere-Ocean. Taylor & Francis; 2022;60(3 4):149 87.
2. Fowler HJ, Ali H, Allan RP, Ban N, Barbero R, Berg P, et al. Towards advancing scientific knowledge of climate change impacts on short-duration rainfall extremes. Philosophical Transactions of the Royal Society A. The Royal Society Publishing; 2021;379(2195):20190542.
3. Firdiyan N, Muntini MS. The Effect Of Rainfall On The Detection Of Standing Water On Runway. Dans: Journal of Physics: Conference Series. IOP Publishing; 2021. p. 012034.
4. Xu T, Liang F. Machine learning for hydrologic sciences: An introductory overview. Wiley Interdisciplinary Reviews: Water. Wiley Online Library; 2021;8(5):e1533.
5. Mistry MN, Schneider R, Masselot P, Royé D, Armstrong B, Kyselý J, et al. Comparison of weather station and climate reanalysis data for modelling temperature-related mortality. Sci Rep. Nature Publishing Group UK London; 2022;12(1):5178.
6. Huang M, Lin R, Huang S, Xing T. A novel approach for precipitation forecast via improved K-nearest neighbor algorithm. Advanced Engineering Informatics. Elsevier; 2017;33:89 95.
7. Pandey R, Upadhya M, Singh M. Rainfall Prediction Using Logistic Regression and Random Forest Algorithm. Dans: 2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT). IEEE; 2024. p. 663 8.
8. Shaji A, Amritha AR, Rajalakshmi VR. Weather prediction using machine learning algorithms. Dans: 2022 International Conference on Intelligent Controller and Computing for Smart Power (ICICCSP). IEEE; 2022. p. 1 5.
9. Chen PY, Tung CP, Tsao JH, Chen CJ. Assessing future rainfall intensity–duration–frequency characteristics across Taiwan using the k-nearest neighbor method. Water (Basel). MDPI; 2021;13(11):1521.
10. Berrar D. Bayes’ theorem and naive Bayes classifier. Elsevier (In Press); 2025;
11. Farid DM, Zhang L, Rahman CM, Hossain MA, Strachan R. Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks. Expert Syst Appl. Elsevier; 2014;41(4):1937 46.
12. Suyal M, Goyal P. A review on analysis of k-nearest neighbor classification machine learning algorithms based on supervised learning. International Journal of Engineering Trends and Technology. Seventh Sense Research Group; 2022;70(7):43 8.
13. Dhanabal S, Chandramathi S. A review of various k-nearest neighbor query processing techniques. Int J Comput Appl. Citeseer; 2011;31(7):14 22.
14. Deng Z, Zhu X, Cheng D, Zong M, Zhang S. Efficient kNN classification algorithm for big data. Neurocomputing. Elsevier; 2016;195:143 8.
15. Salman HA, Kalakech A, Steiti A. Random forest algorithm overview. Babylonian Journal of Machine Learning. 2024;2024:69 79.
16. Krishnan R, Sivakumar G, Bhattacharya P. Extracting decision trees from trained neural networks. Pattern Recognit. Elsevier; 1999;32(12).
17. Pham TA, Tran VQ. Developing random forest hybridization models for estimating the axial bearing capacity of pile. PLoS One. Public Library of Science San Francisco, CA USA; 2022;17(3):e0265747.
18. Fahmy Amin M. Confusion matrix in three-class classification problems: A step-by-step tutorial. Journal of Engineering Research. Tanta University, Faculty of Engineering; 2023;7(1):0.
19. Espino-Salinas CH, Galván-Tejada CE, Luna-García H, Gamboa-Rosales H, Celaya-Padilla JM, Zanella-Calzada LA, et al. Two-dimensional convolutional neural network for depression episodes detection in real time using motor activity time series of depresjon dataset. Bioengineering. MDPI; 2022;9(9):458.
20. Wardhani SG, Kurniawati A. Implementation of K-Nearest Neighbor Algorithm for Creditworthiness Analysis Using Methods Cross-Industry Standard Process for Data Mining (CRISP-DM). Science (1979). 2025;10(1):152 7.
21. Gowdra N, Sinha R, MacDonell S, Yan W. Maximum Categorical Cross Entropy (MCCE): A noise-robust alternative loss function to mitigate racial bias in Convolutional Neural Networks (CNNs) by reducing overfitting. 2021;
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Dwi Rantini, Mochammad Fahd Ali Hillaby, Najma Attaqiya Alya, Muhammad Mahdy Yandra, Maryamah, Aziz Fajar, Muhammad Noor Fakhruzzaman, Ratih Ardiati Ningrum, Arip Ramadan, Muhammad Axel Syahputra, Alhassan Sesay (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
The article is distributed under the Creative Commons Attribution 4.0 License. Unless otherwise stated, associated published material is distributed under the same licence.