Comparing the performance of DTID3 and DTID3-Smote methods in predicting the rain events with unbalanced classes

Authors

  • Des A. Zayanti Universitas Sriwijaya
  • Ning Eliyati Universitas Sriwijaya
  • Yulia Resti
  • Sajiril Hoiri Universitas Sriwijaya
  • Endang S. Kresnawati Universitas Sriwijaya
  • Novi R. Dewi Universitas Sriwijaya
  • Ali Amran Universitas Sriwijaya
  • Irsyadi Yani Universitas Sriwijaya

DOI:

https://doi.org/10.12928/bamme.v5i1.13122

Keywords:

decision tree method, unbalanced classes, weather events

Abstract

Prediction of rainfall events in a region is important for many aspects of life. However, the majority of datasets that predict rainfall events have an unbalanced distribution of observations in their classes, including the Prabumulih city dataset, South Sumatra. DTID3 provides very satisfactory performance in many cases of prediction, while the Smote technique is useful for balancing the distribution of data classes.  This study aims to compare the performance of the DTID3 and DTID3-Smote methods in predicting rainfall events in Prabumulih City.  The main contribution of this study compared to previous studies is that the DTID3 and Smote methods are used together to predict rainfall events, especially in Prabumulih City. Using training data from 2017-2022 and test data from 2023, the results show that the DTID3-Smote method has a better performance measure than the decision tree method in predicting rainfall events in Prabumulih City. In the decision tree method, the accuracy, precision, recall, specificity, and f1-score metrics are 73.56%, 81.91%, 50.94%, 91.22%, and 62.81%, respectively. In the decision tree-SMOTE method, the values ​​are respectively 74.66%, 82.61%, 53.44%, 91.22%, and 64.9%.

References

Abdulazeez, M. U., Khan, W., & Abdullah, K. A. (2023). Predicting child occupant crash injury severity in the United Arab Emirates using machine learning models for imbalanced dataset. International Association of Traffic and Safety Sciences, 47(2), 134–159. https://doi.org/10.1016/j.iatssr.2023.05.003

Amokun, R., Arowolo, O. T., & Eke, J. (2024). Comparative analysis of machine learning algorithms for heart disease prediction. The International Conference on Artificial Intelligence and Robotics (MIRG-ICAIR 2024), November, 107–117. https://doi.org/10.56726/irjmets59893

Bunkhumpornpat, C., Boonchieng, E., Chouvatut, V., & Lipsky, D. (2024). FLEX-SMOTE: Synthetic over-sampling technique that flexibly adjusts to different minority class distributions

Breskuvienė, D., & Dzemyda, G. (2024). Enhancing credit card fraud detection: highly imbalanced data case. Journal of Big Data, 11(1). https://doi.org/10.1186/s40537-024-01059-5

Chandra, W., Suprihatin, B., & Resti, Y. (2023). Median-KNN Regressor-SMOTE-Tomek Links for Handling Missing and Imbalanced Data in Air Quality Prediction. Symmetry, 15(4), 887. https://doi.org/10.3390/sym15040887

Cheng, Q., Xu, H., Fei, S., Li, Z., & Chen, Z. (2022). Estimation of Maize LAI using ensemble learning and UAV multispectral imagery under different water and fertilizer treatments. Agriculture, 12(8), 1267. https://doi.org/10.3390/agriculture12081267

Deng, F. (2020). Research on the Applicability of weather forecast model—based on logistic regression and decision tree. Journal of Physics: Conference Series, 1678(1), 012110. https://doi.org/10.1088/1742-6596/1678/1/012110

Dougherty, J., Kohavi, R., & Mehran, S. (1995). Supervised and unsupervised discretization of continuous features. Machine Learning? Proceedings of the Twelfth International Conference.

García, S., Luengo, J., & Herrera, F. (2015). Data preprocessing in data mining. In J. Kacprzyk & L. C. Jain (Eds.), Intelligent Systems Reference Library (72nd ed., Vol. 72). Springer Cham Heidelberg New York Dordrecht London. https://doi.org/10.1007/978-3-319-10247-4

Huang, K., & Wang, T. (2024). Optimized application of the decision tree ID3 algorithm based on big data in sports performance management. International Journal of E-Collaboration, 20(1), 1–20. https://doi.org/10.4018/IJeC.350022

Husain, G., Nasef, D., Jose, R., Mayer, J., Bekbolatova, M., Devine, T., & Toma, M. (2025). SMOTE vs. SMOTEENN: A study on the performance of resampling algorithms for addressing class imbalance in regression models. Algorithms, 18(1), 1–16. https://doi.org/10.3390/a18010037

Kresnawati, E. S., Suprihatin, B., & Resti, Y. (2024). The combinations of fuzzy membership functions on discretization in the decision tree-ID3 to predict degenerative disease status. Symmetry, 16(12). https://doi.org/10.3390/sym16121560

Kumar, P., Bhatnagar, R., Gaur, K., & Bhatnagar, A. (2021). Classification of imbalanced data:review of methods and applications. IOP Conference Series: Materials Science and Engineering, 1099(1), 012077. https://doi.org/10.1088/1757-899x/1099/1/012077

Matzavela, V., & Alepis, E. (2021). Decision tree learning through a predictive model for student academic performance in intelligent M-Learning environments. Computers and Education: Artificial Intelligence, 2, 100035. https://doi.org/10.1016/j.caeai.2021.100035

Mienye, I. D., & Jere, N. (2024). A survey of decision trees: Concepts, Algorithms, and applications. IEEE Access, 12, 86716–86727. https://doi.org/10.1109/ACCESS.2024.3416838

Mondal, S., Maity, R., & Nag, A. (2025). An efficient artificial neural network-based optimization techniques for the early prediction of coronary heart disease: comprehensive analysis. Scientific Reports, 15Mondal,(1), 1–24. https://doi.org/10.1038/s41598-025-85765-x

Nicholas, Hoendarto, G., & Tjen, J. (2025). Heart disease prediction with decision tree. Social Science and Humanities Journal, 9(01), 6451–6457. https://doi.org/10.18535/sshj.v9i01.1444

Noeman, A., Handayani, D., & Hiswara, A. (2022). Decision tree-based weather prediction. PIKSEL : Penelitian Ilmu Komputer Sistem Embedded and Logic, 10(1), 67–78. https://doi.org/10.33558/piksel.v10i1.4418

Prasad, B. K., Uddinlb, M. Z., Nithinc, P., Goudd, T. A., & Subbaiahe, H. V. (2025). Rainfall prediction using machine learning Bikan. International Journal of Research Publication and Reviews, 6(4), 2364–2369. https://doi.org/10.2139/ssrn.4909110

Pratiwi, Y., Rejo, A., Fariani, A., & Faizal, M. (2021). Monitoring and prediction land cover in Prabumulih City, South Sumatera Province, Indonesia using land change modeler and multi-temporal satellite data. Ecology, Environment and Conservation Paper, 27(2021), S334–S340.

Price, I., Sanchez-Gonzalez, A., Alet, F., Andersson, T. R., El-Kadi, A., Masters, D., Ewalds, T., Stott, J., Mohamed, S., Battaglia, P., Lam, R., & Willson, M. (2025). Probabilistic weather forecasting with machine learning. Nature, 637(8044), 84–90. https://doi.org/10.1038/s41586-024-08252-9

Resti, Y., Irsan, C., Amini, M., Yani, I., Passarella, R., & Zayanti, D. A. (2022). Performance improvement of decision tree model using fuzzy membership function for classification of corn plant diseases and pests. Science and Technology Indonesia, 7(3), 284–290. https://doi.org/10.26554/sti.2022.7.3.284-290

Sasanya, B. F., Awodutire, P. O., Ufuoma, O. G., & Balogun, O. S. (2022). Modelling the effects of meteorological factors on maximum rainfall intensities using exponentiated standardized half logistic distribution. Journal of Applied Mathematics, 2022(1), 3250954.

Sondos Jameel Mukhyber. (2025). Classification of heart disease using feature selection and machine learning techniques. Physical Sciences, Life Science and Engineering, 2(3), 9. https://doi.org/10.47134/pslse.v2i3.386

Taha Jijo, B., & Mohsin Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01), 20–28. https://doi.org/10.38094/jastt20165

Thölke, P., Mantilla-Ramos, Y. J., Abdelhedi, H., Maschke, C., Dehgan, A., Harel, Y., Kemtur, A., Mekki Berrada, L., Sahraoui, M., Young, T., Bellemare Pépin, A., El Khantour, C., Landry, M., Pascarella, A., Hadid, V., Combrisson, E., O’Byrne, J., & Jerbi, K. (2023). Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data. NeuroImage, 277(April). https://doi.org/10.1016/j.neuroimage.2023.120253

Vijaya Saraswathi, R., Gajavelly, K., Kousar Nikath, A., Vasavi, R., & Reddy Anumasula, R. (2022). Heart disease prediction using decision tree and SVM. In Algorithms for Intelligent Systems (Issue March, pp. 69–78). Springer Nature Singapore Pte Ltd. https://doi.org/10.1007/978-981-16-7389-4_7

Walsh, R., & Tardy, M. (2023). A comparison of techniques for class imbalance in deep learning classification of breast cancer. Diagnostics, 13(1), 1–19. https://doi.org/10.3390/diagnostics13010067

Wongvorachan, T., He, S., & Bulut, O. (2023). A comparison of undersampling, oversampling, and smote methods for dealing with imbalanced classification in educational data mining. Information (Switzerland), 14(1). https://doi.org/10.3390/info14010054

Xiang, B., Zeng, C., Dong, X., & Wang, J. (2020). The Application of a decision tree and stochastic forest model in summer precipitation prediction in Chongqing. Atmosphere, 11(5), 508. https://doi.org/10.3390/atmos11050508

Yani, I., & Resti, Y. (2024). Plastic-type prediction based on digital image using multinomial Naïve Bayes method. AIP Conference Proceedings, 2920(1), 040005. https://doi.org/10.1063/5.0179636

Zhang, Y., Deng, L., & Wei, B. (2024). Imbalanced data classification based on improved random-SMOTE and feature standard deviation. Mathematics, 12(11), 1709. https://doi.org/10.3390/math12111709

Downloads

Published

2025-05-31

Issue

Section

Articles