Model Prediksi Kualitas Udara dengan Support Vector Machines dengan Optimasi Hyperparameter GridSearch CV

Authors

  • Ahmad Toha Universitas Nusa Mandiri
  • Purwono Purwono Universitas Harapan Bangsa
  • Windu Gata Universitas Nusa Mandiri

DOI:

https://doi.org/10.12928/biste.v4i1.6079

Keywords:

Classification, Air Quality, Data Science, SVM, Grid Search

Abstract

Air pollution continues to increase in Jakarta. The city ranks 12th in the world as the capital of a country with high levels of pollution. The Jakarta Environmental Service requires processing air quality data generated by the Air Quality Monitoring Station in order to produce valuable information as a decision-making tool. This data processing can be processed with data mining techniques to seek new knowledge from the database so as to find valid, useful and easy-to-learn patterns. The SVM data mining classification model is proposed in this study. Our contribution in this research is to create a classification model with SVM with new techniques, namely improvements in data processing to perform hyperparameter tuning. We saw that previous researchers only pursued high accuracy scores. In contrast to previous studies, we used the gridsearch cv hyperparameter optimization technique on the SVM classification model. The kernel polynomial with 2 degrees is the best parameter recommendation from the grid search cv technique. The accuracy before optimization is 73,31%, while after optimization is 94,8%. This shows an increase in accuracy of 3.2% after applying the grid search cv method to the classification of air quality monitoring using the SVM model

Pencemaran udara terus meningkat di Jakarta. Kota ini menempati urutan ke 12 di dunia sebagai ibukota negara dengan tingkat polusi tinggi. Dinas Lingkungan Hidup Jakarta memerlukan pengolahan data-data kualitas udara yang dihasilkan oleh Stasiun Pemantauan Kualitas Udara agar menghasilkan informasi berharga sebagai alat pengambil keputusan. Pengolahan data ini dapat diproses dengan teknik data mining untuk mencari pengetahuan baru dari basis data sehingga menemukan pola-pola yang valid, bermanfaat dan dapat dipelajari dengan mudah. Model klasifikasi data mining SVM diusulkan dalam penelitian ini. Kontribusi kami dalam penelitian ini adalah membuat model klasifikasi dengan SVM dengan teknik baru yaitu perbaikan dalam pemrosesan data hingga melakukan hyperparameter tuning. Kami melihat para peneliti sebelumnya hanya mengejar nilai akurasi yang tinggi. Berbeda dengan penelitian sebelumnya, kami menggunakan teknik optimasi hiperparameter gridsearch cv pada model klasifikasi SVM. Polinomial kernel dengan 2 derajat merupakan rekomendasi parameter terbaik dari teknik grid search cv. Akurasi sebelum optimasi adalah 73,31%, sedangkan setelah optimasi adalah 94,8%. Hal ini menunjukkan peningkatan akurasi sebesar 21,5% setelah menerapkan metode grid search cv pada klasifikasi pemantauan kualitas udara menggunakan model SVM.

References

H. Haruna, L. Lahming, F. Amir, and A. R. Asrib, “Pencemaran Udara Akibat Gas Buang Kendaraan Bermotor Dan Dampaknya Terhadap Kesehatan,” UNM Environ. Journals, vol. 2, no. 2, p. 57, 2019, https://doi.org/10.26858/uej.v2i2.10092.

S. Machmud, “Analisis Pengaruh Tahun Perakitan Terhadap Emisi Gas Buang Kendaraan Bermotor,” J. Mesin Nusant., vol. 4, no. 1, pp. 21–29, 2021, https://doi.org/10.29407/jmn.v4i1.16038.

A. H. R. Inaku and C. Novianus, “Pengaruh Pencemaran Udara PM 2,5 dan PM 10 Terhadap Keluhan Pernapasan Anak di Ruang Terbuka Anak di DKI Jakarta,” ARKESMAS (Arsip Kesehat. Masyarakat), vol. 5, no. 2, pp. 9–16, 2020, https://doi.org/10.22236/arkesmas.v5i2.4990.

H. Zheng, Y. Cheng, and H. Li, “Investigation of model ensemble for fine-grained air quality prediction,” China Commun., vol. 17, no. 7, pp. 207–223, 2020, https://doi.org/10.23919/J.CC.2020.07.015.

Badan Pengendalian Dampak Lingkungan, “Keputusan Badan pengendalian dampak lingkungan (KABAPEDAL).” pp. 13–36, 1997, https://luk.staff.ugm.ac.id/atur/sda/KEP-107-KABAPEDAL-11-1997ISPU.pdf.

A. Agus, M. Ahmad, S. D. A. Kusumaningtyas, H. Nurhayati, A. N. U. Khoir, C. Sucianingsih, “Analisis Dampak Diterapkannya Kebijakan Working From Home Saat Pandemi Covid-19 Terhadap Kondisi Kualitas Udara Di Jakarta,” J. Meteorol. Klimatologi dan Geofis. Vol.6, vol. 6, no. 3, pp. 6–14, 2019, https://jurnal.stmkg.ac.id/index.php/jmkg/article/view/141.

S. Nurjanah, A. M. Siregar, and D. S. Kusumaningrum, “Penerapan Algoritma K – Nearest Neighbor (KNN) Untuk Klasifikasi Pencemaran Udara Di Kota Jakarta,” Sci. Student J. Information, Technol. Sci., vol. 1, no. 2, pp. 71–76, 2020, https://journal.ubpkarawang.ac.id/mahasiswa/index.php/ssj/article/view/14.

S. Handoko, F. Fauziah, and E. T. E. Handayani, “Implementasi Data Mining Untuk Menentukan Tingkat Penjualan Paket Data Telkomsel Menggunakan Metode K-Means Clustering,” J. Ilm. Teknol. dan Rekayasa, vol. 25, no. 1, pp. 76–88, 2020, https://doi.org/10.35760/tr.2020.v25i1.2677.

I. S. Mangku Negara, Purwono, Purwono, and I. A. Ashari, “Analisa Cluster Data Transaksi Penjualan Minimarket Selama Pandemi,” J. Inf. Technol. Comput. Sci., vol. 3, no. 28, pp. 153–160, 2020, https://doi.org/10.31328/jointecs.v6i3.2693.

K. Setiyanto, “Analisis Proses Data Mining Dalam Sistem Pembelajaran Berbantuan Komputer Pada Praktikum Laboratorium Sistem Informasi Universitas Gunadarma Dengan Pendekatan Machine Learning,” J. Ilm. Inform. dan Komput., vol. 22, no. 2, pp. 145–157, 2017, https://ejournal.gunadarma.ac.id/index.php/infokom/article/view/1735.

N. Noviyanto, “Penerapan Data Mining dalam Mengelompokkan Jumlah Kematian Penderita COVID-19 Berdasarkan Negara di Benua Asia,” Paradig. - J. Komput. dan Inform., vol. 22, no. 2, pp. 183–188, 2020, https://doi.org/10.31294/p.v22i2.8808.

R. Umar, I. Riadi, and P. Purwono, “Klasifikasi Kinerja Programmer pada Aktivitas Media Sosial dengan Metode Support Vector Machines,” CYBERNETICS, vol. 4, no. 1, pp. 32–40, 2020, https://doi.org/10.29406/cbn.v4i01.2042.

M. Ichwan, I. A. Dewi, and Z. M. S, “Klasifikasi Support Vector Machine (SVM) Untuk Menentukan TingkatKemanisan Mangga Berdasarkan Fitur Warna,” MIND J., vol. 3, no. 2, pp. 16–23, 2019, https://doi.org/10.26760/mindjournal.v3i2.16-23.

C. G. Siji George and B. Sumathi, “Grid search tuning of hyperparameters in random forest classifier for customer feedback sentiment prediction,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 9, pp. 173–178, 2020, https://doi.org/10.14569/IJACSA.2020.0110920.

E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” Informatics, vol. 8, no. 4, pp. 1–21, 2021, https://doi.org/10.3390/informatics8040079.

G. S. K. Ranjan, A. Kumar Verma, and S. Radhika, “K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries,” in 2019 IEEE 5th International Conference for Convergence in Technology, I2CT 2019, 2019, no. March, https://doi.org/10.1109/I2CT45611.2019.9033691.

A. S. Handayani, S. Soim, T. E. Agusdi, Rumiasih, and A. Nurdin, “Klasifikasi Kualitas Udara Dengan Metode Support Vector Machine,” JIRE (Jurnal Inform. Rekayasa Elektron., vol. 3, no. 2, pp. 187–199, 2020, http://e-journal.stmiklombok.ac.id/index.php/jire/article/view/303.

S. Syihabuddin Azmil Umri, “Analisis Dan Komparasi Algoritma Klasifikasi Dalam Indeks Pencemaran Udara Di Dki Jakarta,” JIKO (Jurnal Inform. dan Komputer), vol. 4, no. 2, pp. 98–104, 2021, https://doi.org/10.33387/jiko.v4i2.2871.

T. F. Arya, M. Faiqurahman, and Y. Azhar, “Aplikasi Wireless Sensor Network Untuk Sistem Monitoring Dan Klasifikasi Kualitas Udara,” Sistemasi, vol. 7, no. 3, p. 281, 2018, https://doi.org/10.32520/stmsi.v7i3.312.

D. N. Triwibowo, P. Purwono, I. A. Ashari, A. S. Sandi, and Y. Fadlila, “Enkripsi Pesan Menggunakan Algoritma Linear Congruential Generator (LCG) dan Konversi Kode Morse,” Bul. Ilm. Sarj. Tek. Elektro, vol. 3, no. 3, pp. 194–201, 2022, http://journal2.uad.ac.id/index.php/biste/article/view/5546.

P. Purwono, A. Wirasto, and K. Nisa, “Comparison of Machine Learning Algorithms for Classification of Drug Groups,” Sisfotenika, vol. 11, no. 2, p. 196, 2021, https://doi.org/10.30700/jst.v11i2.1134.

T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona, “A survey on missing data in machine learning,” J. Big Data, vol. 8, no. 1, 2021, https://doi.org/10.1186/s40537-021-00516-9.

X. Wan, “Influence of feature scaling on convergence of gradient iterative algorithm,” in International Conference on Advanced Algorithms and Control Engineering, 2019, vol. 1213, no. 3, https://doi.org/10.1088/1742-6596/1213/3/032021.

P. Purwono, A. Ma’arif, I. S. Mangku Negara, W. Rahmaniar, and J. Rahmawan, “Linkage Detection of Features that Cause Stroke using Feyn Qlattice Machine Learning Model,” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 7, no. 3, p. 423, 2021, https://doi.org/10.26555/jiteki.v7i3.22237.

K. R. Singh, K. P. Neethu, K. Madhurekaa, A. Harita, and P. Mohan, “Parallel SVM model for forest fire prediction,” Soft Comput. Lett., vol. 3, no. June, p. 100014, 2021, https://doi.org/10.1016/j.socl.2021.100014.

R. Umar, I. Riadi, and Purwono, “Comparison of SVM, RF and SGD Methods for Determination of Programmer’s Performance Classification Model in Social Media Activities,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 2, pp. 329–335, 2020, https://doi.org/10.29207/resti.v4i2.1770.

A. S. Ritonga and E. S. Purwaningsih, “Penerapan Metode Support Vector Machine ( SVM ) Dalam Klasifikasi Kualitas Pengelasan Smaw (Shield Metal Arc Welding),” Ilm. Edutic, vol. 5, no. 1, pp. 17–25, 2018, https://journal.trunojoyo.ac.id/edutic/article/view/4382.

S. Katoch, V. Singh, and U. S. Tiwary, “Indian Sign Language Recognition System using SURF with SVM and CNN,” Array, p. 100141, 2022, https://doi.org/10.1016/j.array.2022.100141.

X. Xiong, S. Hu, D. Sun, S. Hao, H. Li, and G. Lin, “Detection of false data injection attack in power information physical system based on SVM–GAB algorithm,” Energy Reports, vol. 8, pp. 1156–1164, 2022, https://doi.org/10.1016/j.egyr.2022.02.290.

A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognit., vol. 91, pp. 216–231, 2019, https://doi.org/10.1016/j.patcog.2019.02.023.

T. Yan, S. L. Shen, A. Zhou, and X.-S. Chen, “Prediction of geological characteristics from shield operational parameters using integrating grid search and K-fold cross validation into stacking classification algorithm,” J. Rock Mech. Geotech. Eng., p. 100310, 2022, https://doi.org/10.1016/j.jrmge.2022.03.002.

Downloads

Published

2022-05-26

How to Cite

[1]
A. Toha, P. Purwono, and W. Gata, “Model Prediksi Kualitas Udara dengan Support Vector Machines dengan Optimasi Hyperparameter GridSearch CV”, Buletin Ilmiah Sarjana Teknik Elektro, vol. 4, no. 1, pp. 12–21, May 2022.

Issue

Section

Artikel