Performance Analysis of Random Forest Algorithm with Smote for Multi-Class Attack Detection

Authors

  • Ratna Komalasari Universitas Muhammadiyah Purwokerto
  • Mukhlis Prasetyo Aji Universitas Muhammadiyah Purwokerto
  • Agung Purwo Wicaksono Universitas Muhammadiyah Purwokerto
  • Maulida Ayu Fitriani Universitas Muhammadiyah Purwokerto

DOI:

https://doi.org/10.12928/mf.v8i1.14584

Keywords:

cyber attack detection, multi-class classification, Random Forest, SMOTE, Stratified Random Sampling

Abstract

The increasing sophistication of cyberattacks necessitates the development of detection systems capable of accurately identifying various threat types. Data imbalance within attack logs presents a substantial challenge that can undermine the effectiveness of detection models. This study introduces a multi-class cyberattack detection model employing the Random Forest algorithm, optimized through the Synthetic Minority Over-sampling Technique (SMOTE) to address data imbalance. The innovative aspect of this research lies in integrating Random Forests and SMOTE to improve multi-class classification accuracy on local attack log datasets. This approach remains sparsely explored in academic research. The dataset consists of 3000 cyberattack logs from the Information Systems Bureau of Muhammadiyah University Purwokerto, spanning 10 cyberattack categories. The research process involved data collection, pre- processing, division, model training, and evaluation. Results indicate that the model achieved an average F1-macro score of 76% and a weighted average of 93%, with the " Threat Level Medium " feature identified as the most influential predictor. These findings suggest that the combination of Random Forest and SMOTE effectively enhances multi-class detection performance and presents promising prospects for log-based cybersecurity systems in educational and industrial environments.

References

A. Delplace, S. Hermoso, and K. Anandita, “Cyber Attack Detection thanks to Machine Learning Algorithms,” Jan. 2020, [Online]. Available: http://arxiv.org/abs/2001.06309

Badan Siber dan Sandi Negara (BSSN), “Lanskap Keamanan Siber Indonesia 2024,” 2024.

Accessed: May 26, 2025. [Online]. Available: https://www.bssn.go.id/

K. A. P. da Costa, J. P. Papa, C. O. Lisboa, R. Munoz, and V. H. C. de Albuquerque, “Internet of Things: A survey on machine learning-based intrusion detection approaches,” Computer Networks, vol. 151, pp. 147–157, Mar. 2019, doi: 10.1016/j.comnet.2019.01.023.

K. Razzaq and M. Shah, “Advancing cybersecurity through machine learning: A scientometric analysis of global research trends and influential contributions,” Jun. 2025, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/jcp5020012.

M. Alduailij, Q. W. Khan, M. Tahir, M. Sardaraz, M. Alduailij, and F. Malik, “Machine- Learning-Based DDoS Attack Detection Using Mutual Information and Random Forest Feature Importance Method,” Symmetry (Basel), vol. 14, no. 6, Jun. 2022, doi: 10.3390/sym14061095.

“Deteksi serangan siber pada jaringan komputer menggunakan metode random forest,” in Seminar Nasional Teknologi Informasi, 2024. [Online]. Available: https://bit.ly/CyberSecurityAttacks.

T. Wu, H. Fan, H. Zhu, C. You, H. Zhou, and X. Huang, “Intrusion detection system combined enhanced random forest with SMOTE algorithm,” EURASIP J Adv Signal Process, vol. 2022, no. 1, Dec. 2022, doi: https://doi.org/10.1186/s13634-022-00871-6.

M. A. Talukder, M. Khalid, and N. Sultana, A hybrid machine learning model for intrusion detection in wireless sensor networks leveraging data balancing and dimensionality reduction, vol. 15, no. 1. Nature Publishing Group UK London, 2025, p. 4617. doi: https://doi.org/10.1038/s41598-025-87028-1.

S. Bagui and K. Li, “Resampling imbalanced data for network intrusion detection datasets,” J Big Data, vol. 8, no. 1, Dec. 2021, doi: 10.1186/s40537-020-00390-x.

A. Abdelkhalek and M. Mashaly, “Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning,” Journal of Supercomputing, vol. 79, no. 10, pp. 10611–10644, Jul. 2023, doi: 10.1007/s11227-023- 05073-x.

M. Soylu and R. Das, “Prediction and graph visualization of cyberattacks using graph attention networks,” Comput Secur, vol. 157, p. 104534, 2025, doi: https://doi.org/10.1016/j.cose.2025.104534.

C. Fan, M. Chen, X. Wang, J. Wang, and B. Huang, “A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data,” Mar. 2021, Frontiers Media S.A. doi: 10.3389/fenrg . 2021.652801.

M. P. Pulungan, A. Purnomo, and A. Kurniasih, “Penerapan SMOTE untuk mengatasi imbalance class dalam klasifikasi kepribadian MBTI menggunakan naive bayes classifier,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 11, no. 5, pp. 1033–1042, Oct. 2024, doi: 10.25126/jtiik.2024117989.

G. W. Cha, H. J. Moon, and Y. C. Kim, “Comparison of random forest and gradient boosting machine models for predicting demolition waste based on small datasets and categorical variables,” Int J Environ Res Public Health, vol. 18, no. 16, Aug. 2021, doi: 10.3390/ijerph18168530.

Y. M. Indah, R. Aristawidya, A. Fitrianto, E. Erfiani, and L. M. R. D. Jumansyah, “Comparison of Random Forest, XGBoost, and LightGBM Methods for the Human Development Index Classification,” Jambura Journal of Mathematics, vol. 7, no. 1, pp. 14–18, Jan. 2025, doi: 10.37905/jjom.v7i1.28290.

I. Muhamad and M. Matin, “Hyperparameter Tuning menggunakan GridsearchCV pada Random Forest untuk Deteksi Malware,” Jurnal Informatika dan Komputer, vol. 8, no. 2, pp. 45–52, 2023.

P. A. doost, S. S. Moghadam, E. Khezri, A. Basem, and M. Trik, “A new intrusion detection method using ensemble classification and feature selection,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: 10.1038/s41598-025-98604-w.

M. K. Suryadewiansyah, T. Endra, and E. Tju, “Jurnal Nasional Teknologi dan Sistem Informasi Naïve Bayes dan Confusion Matrix untuk Efisiensi Analisa Intrusion Detection System Alert”, doi: 10.25077/TEKNOSI.v8i2.2022.081-088.

I. Markoulidakis and G. Markoulidakis, “Probabilistic confusion matrix: A novel method for machine learning algorithm generalized performance analysis,” Technologies (Basel), vol. 12, no. 7, Jul. 2024, doi: 10.3390/technologies12070113.

J. Brownlee, Machine learning mastery. Machine Learning Mastery, 2022.

E. Dikici, X. Nguyen, N. Takacs, and L. M. Prevedello, “Prediction of Model Generalizability for Unseen Data: Methodology and Case Study in Brain Metastases Detection in T1-Weighted Contrast-Enhanced 3D MRI.”

Y. Xie, M. Cheng, Y. Chen, and D. Zhang, “An Internet Intrusion Detection Method Based on Altered Triplet Attention ResNet,” in 2025 37th Chinese Control and Decision Conference (CCDC), IEEE, 2025, pp. 2995–3000.

M. A. S. Arifin et al., “Oversampling and undersampling for intrusion detection system in the supervisory control and data acquisition IEC 60870-5-104,” IET Cyber-Physical Systems: Theory and Applications, vol. 9, no. 3, pp. 282–292, Sep. 2024, doi: 10.1049/cps2.12085.

M. Mujahid et al., “Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering,” J Big Data, vol. 11, no. 1, Dec. 2024, doi: 10.1186/s40537-024-00943-4.

N. Abedzadeh and M. Jacobs, “A reinforcement learning framework with oversampling and undersampling algorithms for intrusion detection system,” Applied Sciences, vol. 13, no. 20, p. 11275, 2023.

W. Chen, K. Yang, Z. Yu, Y. Shi, and C. L. P. Chen, “A survey on imbalanced learning: latest research, applications and future directions,” Artif Intell Rev, vol. 57, no. 6, Jun. 2024, doi: 10.1007/s10462-024-10759-6.

R. Alshamy and M. A. Akcayol, “Intrusion Detection Model Using Machine Learning Algorithms on Nsl-Kdd Dataset,” International Journal of Computer Networks and Communications, vol. 16, no. 6, pp. 75–88, Nov. 2024, doi: 10.5121/ijcnc.2024.16605.

MD Shadman Soumik, “A comparative analysis of Network Intrusion Detection (NID) using Artificial Intelligence techniques for increase network security,” International Journal of Science and Research Archive, vol. 13, no. 2, pp. 4014–4025, Dec. 2024, doi: 10.30574/ijsra.2024.13.2.2664.

Random Forest Feature Importance

Downloads

Published

2025-12-15

Issue

Section

Articles