Model Prediksi Kualitas Udara dengan Support Vector Machines dengan Optimasi Hyperparameter GridSearch CV
DOI:
https://doi.org/10.12928/biste.v4i1.6079Keywords:
Classification, Air Quality, Data Science, SVM, Grid SearchAbstract
Air pollution continues to increase in Jakarta. The city ranks 12th in the world as the capital of a country with high levels of pollution. The Jakarta Environmental Service requires processing air quality data generated by the Air Quality Monitoring Station in order to produce valuable information as a decision-making tool. This data processing can be processed with data mining techniques to seek new knowledge from the database so as to find valid, useful and easy-to-learn patterns. The SVM data mining classification model is proposed in this study. Our contribution in this research is to create a classification model with SVM with new techniques, namely improvements in data processing to perform hyperparameter tuning. We saw that previous researchers only pursued high accuracy scores. In contrast to previous studies, we used the gridsearch cv hyperparameter optimization technique on the SVM classification model. The kernel polynomial with 2 degrees is the best parameter recommendation from the grid search cv technique. The accuracy before optimization is 73,31%, while after optimization is 94,8%. This shows an increase in accuracy of 3.2% after applying the grid search cv method to the classification of air quality monitoring using the SVM model
Pencemaran udara terus meningkat di Jakarta. Kota ini menempati urutan ke 12 di dunia sebagai ibukota negara dengan tingkat polusi tinggi. Dinas Lingkungan Hidup Jakarta memerlukan pengolahan data-data kualitas udara yang dihasilkan oleh Stasiun Pemantauan Kualitas Udara agar menghasilkan informasi berharga sebagai alat pengambil keputusan. Pengolahan data ini dapat diproses dengan teknik data mining untuk mencari pengetahuan baru dari basis data sehingga menemukan pola-pola yang valid, bermanfaat dan dapat dipelajari dengan mudah. Model klasifikasi data mining SVM diusulkan dalam penelitian ini. Kontribusi kami dalam penelitian ini adalah membuat model klasifikasi dengan SVM dengan teknik baru yaitu perbaikan dalam pemrosesan data hingga melakukan hyperparameter tuning. Kami melihat para peneliti sebelumnya hanya mengejar nilai akurasi yang tinggi. Berbeda dengan penelitian sebelumnya, kami menggunakan teknik optimasi hiperparameter gridsearch cv pada model klasifikasi SVM. Polinomial kernel dengan 2 derajat merupakan rekomendasi parameter terbaik dari teknik grid search cv. Akurasi sebelum optimasi adalah 73,31%, sedangkan setelah optimasi adalah 94,8%. Hal ini menunjukkan peningkatan akurasi sebesar 21,5% setelah menerapkan metode grid search cv pada klasifikasi pemantauan kualitas udara menggunakan model SVM.
References
H. Haruna, L. Lahming, F. Amir, and A. R. Asrib, “Pencemaran Udara Akibat Gas Buang Kendaraan Bermotor Dan Dampaknya Terhadap Kesehatan,” UNM Environ. Journals, vol. 2, no. 2, p. 57, 2019, https://doi.org/10.26858/uej.v2i2.10092.
S. Machmud, “Analisis Pengaruh Tahun Perakitan Terhadap Emisi Gas Buang Kendaraan Bermotor,” J. Mesin Nusant., vol. 4, no. 1, pp. 21–29, 2021, https://doi.org/10.29407/jmn.v4i1.16038.
A. H. R. Inaku and C. Novianus, “Pengaruh Pencemaran Udara PM 2,5 dan PM 10 Terhadap Keluhan Pernapasan Anak di Ruang Terbuka Anak di DKI Jakarta,” ARKESMAS (Arsip Kesehat. Masyarakat), vol. 5, no. 2, pp. 9–16, 2020, https://doi.org/10.22236/arkesmas.v5i2.4990.
H. Zheng, Y. Cheng, and H. Li, “Investigation of model ensemble for fine-grained air quality prediction,” China Commun., vol. 17, no. 7, pp. 207–223, 2020, https://doi.org/10.23919/J.CC.2020.07.015.
Badan Pengendalian Dampak Lingkungan, “Keputusan Badan pengendalian dampak lingkungan (KABAPEDAL).” pp. 13–36, 1997, https://luk.staff.ugm.ac.id/atur/sda/KEP-107-KABAPEDAL-11-1997ISPU.pdf.
A. Agus, M. Ahmad, S. D. A. Kusumaningtyas, H. Nurhayati, A. N. U. Khoir, C. Sucianingsih, “Analisis Dampak Diterapkannya Kebijakan Working From Home Saat Pandemi Covid-19 Terhadap Kondisi Kualitas Udara Di Jakarta,” J. Meteorol. Klimatologi dan Geofis. Vol.6, vol. 6, no. 3, pp. 6–14, 2019, https://jurnal.stmkg.ac.id/index.php/jmkg/article/view/141.
S. Nurjanah, A. M. Siregar, and D. S. Kusumaningrum, “Penerapan Algoritma K – Nearest Neighbor (KNN) Untuk Klasifikasi Pencemaran Udara Di Kota Jakarta,” Sci. Student J. Information, Technol. Sci., vol. 1, no. 2, pp. 71–76, 2020, https://journal.ubpkarawang.ac.id/mahasiswa/index.php/ssj/article/view/14.
S. Handoko, F. Fauziah, and E. T. E. Handayani, “Implementasi Data Mining Untuk Menentukan Tingkat Penjualan Paket Data Telkomsel Menggunakan Metode K-Means Clustering,” J. Ilm. Teknol. dan Rekayasa, vol. 25, no. 1, pp. 76–88, 2020, https://doi.org/10.35760/tr.2020.v25i1.2677.
I. S. Mangku Negara, Purwono, Purwono, and I. A. Ashari, “Analisa Cluster Data Transaksi Penjualan Minimarket Selama Pandemi,” J. Inf. Technol. Comput. Sci., vol. 3, no. 28, pp. 153–160, 2020, https://doi.org/10.31328/jointecs.v6i3.2693.
K. Setiyanto, “Analisis Proses Data Mining Dalam Sistem Pembelajaran Berbantuan Komputer Pada Praktikum Laboratorium Sistem Informasi Universitas Gunadarma Dengan Pendekatan Machine Learning,” J. Ilm. Inform. dan Komput., vol. 22, no. 2, pp. 145–157, 2017, https://ejournal.gunadarma.ac.id/index.php/infokom/article/view/1735.
N. Noviyanto, “Penerapan Data Mining dalam Mengelompokkan Jumlah Kematian Penderita COVID-19 Berdasarkan Negara di Benua Asia,” Paradig. - J. Komput. dan Inform., vol. 22, no. 2, pp. 183–188, 2020, https://doi.org/10.31294/p.v22i2.8808.
R. Umar, I. Riadi, and P. Purwono, “Klasifikasi Kinerja Programmer pada Aktivitas Media Sosial dengan Metode Support Vector Machines,” CYBERNETICS, vol. 4, no. 1, pp. 32–40, 2020, https://doi.org/10.29406/cbn.v4i01.2042.
M. Ichwan, I. A. Dewi, and Z. M. S, “Klasifikasi Support Vector Machine (SVM) Untuk Menentukan TingkatKemanisan Mangga Berdasarkan Fitur Warna,” MIND J., vol. 3, no. 2, pp. 16–23, 2019, https://doi.org/10.26760/mindjournal.v3i2.16-23.
C. G. Siji George and B. Sumathi, “Grid search tuning of hyperparameters in random forest classifier for customer feedback sentiment prediction,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 9, pp. 173–178, 2020, https://doi.org/10.14569/IJACSA.2020.0110920.
E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” Informatics, vol. 8, no. 4, pp. 1–21, 2021, https://doi.org/10.3390/informatics8040079.
G. S. K. Ranjan, A. Kumar Verma, and S. Radhika, “K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries,” in 2019 IEEE 5th International Conference for Convergence in Technology, I2CT 2019, 2019, no. March, https://doi.org/10.1109/I2CT45611.2019.9033691.
A. S. Handayani, S. Soim, T. E. Agusdi, Rumiasih, and A. Nurdin, “Klasifikasi Kualitas Udara Dengan Metode Support Vector Machine,” JIRE (Jurnal Inform. Rekayasa Elektron., vol. 3, no. 2, pp. 187–199, 2020, http://e-journal.stmiklombok.ac.id/index.php/jire/article/view/303.
S. Syihabuddin Azmil Umri, “Analisis Dan Komparasi Algoritma Klasifikasi Dalam Indeks Pencemaran Udara Di Dki Jakarta,” JIKO (Jurnal Inform. dan Komputer), vol. 4, no. 2, pp. 98–104, 2021, https://doi.org/10.33387/jiko.v4i2.2871.
T. F. Arya, M. Faiqurahman, and Y. Azhar, “Aplikasi Wireless Sensor Network Untuk Sistem Monitoring Dan Klasifikasi Kualitas Udara,” Sistemasi, vol. 7, no. 3, p. 281, 2018, https://doi.org/10.32520/stmsi.v7i3.312.
D. N. Triwibowo, P. Purwono, I. A. Ashari, A. S. Sandi, and Y. Fadlila, “Enkripsi Pesan Menggunakan Algoritma Linear Congruential Generator (LCG) dan Konversi Kode Morse,” Bul. Ilm. Sarj. Tek. Elektro, vol. 3, no. 3, pp. 194–201, 2022, http://journal2.uad.ac.id/index.php/biste/article/view/5546.
P. Purwono, A. Wirasto, and K. Nisa, “Comparison of Machine Learning Algorithms for Classification of Drug Groups,” Sisfotenika, vol. 11, no. 2, p. 196, 2021, https://doi.org/10.30700/jst.v11i2.1134.
T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona, “A survey on missing data in machine learning,” J. Big Data, vol. 8, no. 1, 2021, https://doi.org/10.1186/s40537-021-00516-9.
X. Wan, “Influence of feature scaling on convergence of gradient iterative algorithm,” in International Conference on Advanced Algorithms and Control Engineering, 2019, vol. 1213, no. 3, https://doi.org/10.1088/1742-6596/1213/3/032021.
P. Purwono, A. Ma’arif, I. S. Mangku Negara, W. Rahmaniar, and J. Rahmawan, “Linkage Detection of Features that Cause Stroke using Feyn Qlattice Machine Learning Model,” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 7, no. 3, p. 423, 2021, https://doi.org/10.26555/jiteki.v7i3.22237.
K. R. Singh, K. P. Neethu, K. Madhurekaa, A. Harita, and P. Mohan, “Parallel SVM model for forest fire prediction,” Soft Comput. Lett., vol. 3, no. June, p. 100014, 2021, https://doi.org/10.1016/j.socl.2021.100014.
R. Umar, I. Riadi, and Purwono, “Comparison of SVM, RF and SGD Methods for Determination of Programmer’s Performance Classification Model in Social Media Activities,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 2, pp. 329–335, 2020, https://doi.org/10.29207/resti.v4i2.1770.
A. S. Ritonga and E. S. Purwaningsih, “Penerapan Metode Support Vector Machine ( SVM ) Dalam Klasifikasi Kualitas Pengelasan Smaw (Shield Metal Arc Welding),” Ilm. Edutic, vol. 5, no. 1, pp. 17–25, 2018, https://journal.trunojoyo.ac.id/edutic/article/view/4382.
S. Katoch, V. Singh, and U. S. Tiwary, “Indian Sign Language Recognition System using SURF with SVM and CNN,” Array, p. 100141, 2022, https://doi.org/10.1016/j.array.2022.100141.
X. Xiong, S. Hu, D. Sun, S. Hao, H. Li, and G. Lin, “Detection of false data injection attack in power information physical system based on SVM–GAB algorithm,” Energy Reports, vol. 8, pp. 1156–1164, 2022, https://doi.org/10.1016/j.egyr.2022.02.290.
A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognit., vol. 91, pp. 216–231, 2019, https://doi.org/10.1016/j.patcog.2019.02.023.
T. Yan, S. L. Shen, A. Zhou, and X.-S. Chen, “Prediction of geological characteristics from shield operational parameters using integrating grid search and K-fold cross validation into stacking classification algorithm,” J. Rock Mech. Geotech. Eng., p. 100310, 2022, https://doi.org/10.1016/j.jrmge.2022.03.002.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Purwono Purwono, Ahmad Toha, Windu Gata
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
This journal is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.