Comparing the performance of DTID3 and DTID3-Smote methods in predicting the rain events with unbalanced classes
DOI:
https://doi.org/10.12928/bamme.v5i1.13122Keywords:
decision tree method, unbalanced classes, weather eventsAbstract
Prediction of rainfall events in a region is important for many aspects of life. However, the majority of datasets that predict rainfall events have an unbalanced distribution of observations in their classes, including the Prabumulih city dataset, South Sumatra. DTID3 provides very satisfactory performance in many cases of prediction, while the Smote technique is useful for balancing the distribution of data classes. This study aims to compare the performance of the DTID3 and DTID3-Smote methods in predicting rainfall events in Prabumulih City. The main contribution of this study compared to previous studies is that the DTID3 and Smote methods are used together to predict rainfall events, especially in Prabumulih City. Using training data from 2017-2022 and test data from 2023, the results show that the DTID3-Smote method has a better performance measure than the decision tree method in predicting rainfall events in Prabumulih City. In the decision tree method, the accuracy, precision, recall, specificity, and f1-score metrics are 73.56%, 81.91%, 50.94%, 91.22%, and 62.81%, respectively. In the decision tree-SMOTE method, the values are respectively 74.66%, 82.61%, 53.44%, 91.22%, and 64.9%.
References
Abdulazeez, M. U., Khan, W., & Abdullah, K. A. (2023). Predicting child occupant crash injury severity in the United Arab Emirates using machine learning models for imbalanced dataset. International Association of Traffic and Safety Sciences, 47(2), 134–159. https://doi.org/10.1016/j.iatssr.2023.05.003
Amokun, R., Arowolo, O. T., & Eke, J. (2024). Comparative analysis of machine learning algorithms for heart disease prediction. The International Conference on Artificial Intelligence and Robotics (MIRG-ICAIR 2024), November, 107–117. https://doi.org/10.56726/irjmets59893
Bunkhumpornpat, C., Boonchieng, E., Chouvatut, V., & Lipsky, D. (2024). FLEX-SMOTE: Synthetic over-sampling technique that flexibly adjusts to different minority class distributions
Breskuvienė, D., & Dzemyda, G. (2024). Enhancing credit card fraud detection: highly imbalanced data case. Journal of Big Data, 11(1). https://doi.org/10.1186/s40537-024-01059-5
Chandra, W., Suprihatin, B., & Resti, Y. (2023). Median-KNN Regressor-SMOTE-Tomek Links for Handling Missing and Imbalanced Data in Air Quality Prediction. Symmetry, 15(4), 887. https://doi.org/10.3390/sym15040887
Cheng, Q., Xu, H., Fei, S., Li, Z., & Chen, Z. (2022). Estimation of Maize LAI using ensemble learning and UAV multispectral imagery under different water and fertilizer treatments. Agriculture, 12(8), 1267. https://doi.org/10.3390/agriculture12081267
Deng, F. (2020). Research on the Applicability of weather forecast model—based on logistic regression and decision tree. Journal of Physics: Conference Series, 1678(1), 012110. https://doi.org/10.1088/1742-6596/1678/1/012110
Dougherty, J., Kohavi, R., & Mehran, S. (1995). Supervised and unsupervised discretization of continuous features. Machine Learning? Proceedings of the Twelfth International Conference.
García, S., Luengo, J., & Herrera, F. (2015). Data preprocessing in data mining. In J. Kacprzyk & L. C. Jain (Eds.), Intelligent Systems Reference Library (72nd ed., Vol. 72). Springer Cham Heidelberg New York Dordrecht London. https://doi.org/10.1007/978-3-319-10247-4
Huang, K., & Wang, T. (2024). Optimized application of the decision tree ID3 algorithm based on big data in sports performance management. International Journal of E-Collaboration, 20(1), 1–20. https://doi.org/10.4018/IJeC.350022
Husain, G., Nasef, D., Jose, R., Mayer, J., Bekbolatova, M., Devine, T., & Toma, M. (2025). SMOTE vs. SMOTEENN: A study on the performance of resampling algorithms for addressing class imbalance in regression models. Algorithms, 18(1), 1–16. https://doi.org/10.3390/a18010037
Kresnawati, E. S., Suprihatin, B., & Resti, Y. (2024). The combinations of fuzzy membership functions on discretization in the decision tree-ID3 to predict degenerative disease status. Symmetry, 16(12). https://doi.org/10.3390/sym16121560
Kumar, P., Bhatnagar, R., Gaur, K., & Bhatnagar, A. (2021). Classification of imbalanced data:review of methods and applications. IOP Conference Series: Materials Science and Engineering, 1099(1), 012077. https://doi.org/10.1088/1757-899x/1099/1/012077
Matzavela, V., & Alepis, E. (2021). Decision tree learning through a predictive model for student academic performance in intelligent M-Learning environments. Computers and Education: Artificial Intelligence, 2, 100035. https://doi.org/10.1016/j.caeai.2021.100035
Mienye, I. D., & Jere, N. (2024). A survey of decision trees: Concepts, Algorithms, and applications. IEEE Access, 12, 86716–86727. https://doi.org/10.1109/ACCESS.2024.3416838
Mondal, S., Maity, R., & Nag, A. (2025). An efficient artificial neural network-based optimization techniques for the early prediction of coronary heart disease: comprehensive analysis. Scientific Reports, 15Mondal,(1), 1–24. https://doi.org/10.1038/s41598-025-85765-x
Nicholas, Hoendarto, G., & Tjen, J. (2025). Heart disease prediction with decision tree. Social Science and Humanities Journal, 9(01), 6451–6457. https://doi.org/10.18535/sshj.v9i01.1444
Noeman, A., Handayani, D., & Hiswara, A. (2022). Decision tree-based weather prediction. PIKSEL : Penelitian Ilmu Komputer Sistem Embedded and Logic, 10(1), 67–78. https://doi.org/10.33558/piksel.v10i1.4418
Prasad, B. K., Uddinlb, M. Z., Nithinc, P., Goudd, T. A., & Subbaiahe, H. V. (2025). Rainfall prediction using machine learning Bikan. International Journal of Research Publication and Reviews, 6(4), 2364–2369. https://doi.org/10.2139/ssrn.4909110
Pratiwi, Y., Rejo, A., Fariani, A., & Faizal, M. (2021). Monitoring and prediction land cover in Prabumulih City, South Sumatera Province, Indonesia using land change modeler and multi-temporal satellite data. Ecology, Environment and Conservation Paper, 27(2021), S334–S340.
Price, I., Sanchez-Gonzalez, A., Alet, F., Andersson, T. R., El-Kadi, A., Masters, D., Ewalds, T., Stott, J., Mohamed, S., Battaglia, P., Lam, R., & Willson, M. (2025). Probabilistic weather forecasting with machine learning. Nature, 637(8044), 84–90. https://doi.org/10.1038/s41586-024-08252-9
Resti, Y., Irsan, C., Amini, M., Yani, I., Passarella, R., & Zayanti, D. A. (2022). Performance improvement of decision tree model using fuzzy membership function for classification of corn plant diseases and pests. Science and Technology Indonesia, 7(3), 284–290. https://doi.org/10.26554/sti.2022.7.3.284-290
Sasanya, B. F., Awodutire, P. O., Ufuoma, O. G., & Balogun, O. S. (2022). Modelling the effects of meteorological factors on maximum rainfall intensities using exponentiated standardized half logistic distribution. Journal of Applied Mathematics, 2022(1), 3250954.
Sondos Jameel Mukhyber. (2025). Classification of heart disease using feature selection and machine learning techniques. Physical Sciences, Life Science and Engineering, 2(3), 9. https://doi.org/10.47134/pslse.v2i3.386
Taha Jijo, B., & Mohsin Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01), 20–28. https://doi.org/10.38094/jastt20165
Thölke, P., Mantilla-Ramos, Y. J., Abdelhedi, H., Maschke, C., Dehgan, A., Harel, Y., Kemtur, A., Mekki Berrada, L., Sahraoui, M., Young, T., Bellemare Pépin, A., El Khantour, C., Landry, M., Pascarella, A., Hadid, V., Combrisson, E., O’Byrne, J., & Jerbi, K. (2023). Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data. NeuroImage, 277(April). https://doi.org/10.1016/j.neuroimage.2023.120253
Vijaya Saraswathi, R., Gajavelly, K., Kousar Nikath, A., Vasavi, R., & Reddy Anumasula, R. (2022). Heart disease prediction using decision tree and SVM. In Algorithms for Intelligent Systems (Issue March, pp. 69–78). Springer Nature Singapore Pte Ltd. https://doi.org/10.1007/978-981-16-7389-4_7
Walsh, R., & Tardy, M. (2023). A comparison of techniques for class imbalance in deep learning classification of breast cancer. Diagnostics, 13(1), 1–19. https://doi.org/10.3390/diagnostics13010067
Wongvorachan, T., He, S., & Bulut, O. (2023). A comparison of undersampling, oversampling, and smote methods for dealing with imbalanced classification in educational data mining. Information (Switzerland), 14(1). https://doi.org/10.3390/info14010054
Xiang, B., Zeng, C., Dong, X., & Wang, J. (2020). The Application of a decision tree and stochastic forest model in summer precipitation prediction in Chongqing. Atmosphere, 11(5), 508. https://doi.org/10.3390/atmos11050508
Yani, I., & Resti, Y. (2024). Plastic-type prediction based on digital image using multinomial Naïve Bayes method. AIP Conference Proceedings, 2920(1), 040005. https://doi.org/10.1063/5.0179636
Zhang, Y., Deng, L., & Wei, B. (2024). Imbalanced data classification based on improved random-SMOTE and feature standard deviation. Mathematics, 12(11), 1709. https://doi.org/10.3390/math12111709
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Des A. Zayanti, Ning Eliyati, Yulia Resti, Sajiril Hoiri, Endang S. Kresnawati, Novi R. Dewi, Ali Amran, Irsyadi Yani

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

