Naive Bayes for Thesis Labeling

Fitria Nurhayati, Arfiani Nur Khusna, Dimas Chaerul Ekty Saputra


The thesis preparation in the Department of Informatics Universitas Ahmad Dahlan is divided into two areas of interest, namely Intelligent Systems and Software and Data Engineering. Existing thesis title data is only used as an archive and has never been processed or classified to determine the trend of thesis topics based on student interest each year. The stages include data collection, the data is divided into two parts (training data and test data), manual labeling of training data, text preprocessing, and classification using Naive Bayes. The results show the trend of thesis title taking from 2013 to 2018 shows the thesis trend in the field of Intelligent Systems and Software. Accuracy testing uses Confusion Matrix and K-Fold Cross Validation with a k value is 10, has a value of 94.60%, precision of 97.30%, and a recall of 85.70%.


Confusion Matrix; Thesis Title; K-Fold Cross Validation; Classifitcation; Naïve Bayes

Full Text:



Alami, N., Meknassi, M., Ouatik, S. A., & Ennahnahi, N. (2016). Impact of stemming on Arabic text summarization. 4th IEEE International Colloquium on Information Science and Technology (CiSt), 338–343.

Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). A brief survey of text mining: Classification, clustering and extraction techniques. ArXiv.

Alshalabi, Al., Hamood, H., Tiun, S., Omar, N., & Albared, M. (2013). Experiments on the Use of Features election and Machine Learning Methods in Automatic Malay Text Categorization. ICEEI.

Alsubaei, M. A. (2016). Curriculum Development: Teacher Involvement in Curriculum Development. Journal of Education and Practice, 7(9), 106–107.

Caelen, O. (2017). A Bayesian interpretation of the confusion matrix. Annals of Mathematics and Artificial Intelligence, 81(3–4), 429–450.


Dreisbach, C., Koleck, T. A., Bourne, P. E., & Bakken, S. (2019). A systematic review of natural language processing and text mining of symptoms from electronic patient-authored text data. International Journal of Medical Informatics, 125(December 2018), 37–46.

Hosseinalizadeh, M., Kariminejad, N., Chen, W., Pourghasemi, H. R., Alinejad, M., Mohammadian Behbahani, A., & Tiefenbacher, J. P. (2019). Gully headcut susceptibility modeling using functional trees, naïve Bayes tree, and random forest models. Geoderma, 342(October 2018), 1–11.

Katariya, N. P., & Chaundri, M. S. (2015). Text Preprocessing for Text Mining Using Side Information, 3, 3–7.

Khusna, A. N., & Agustina, I. (2018). Implementation of Information Retrieval Using TF-IDF Weighting Method On Detik.Com’s Website. TSSA-IEEE.

Kurniawati, S., Suryadarma, D., Bima, L., & Yusrina, A. (2018). Education in Indonesia: A white elephant? Journal of Southeast Asian Economies, 35(2), 185–199.

Moayedi, H., Osouli, A., Nguyen, H., & Rashid, A. S. A. (2019). A novel Harris hawks’ optimization and k-fold cross-validation predicting slope stability. Engineering with Computers, 1–11.

Mohajon, J. (2017). Confusion Matrix for Your Multi-Class Machine Learning Model | by Joydwip Mohajon | Towards Data Science. Retrieved November 21, 2020, from

Mubarok, M. S., Adiwijaya, A., & Aldhi, M. D. (2017). Aspect-based sentiment analysis to review products using Naïve Bayes. AIP Conference Proceedings, 1867(August).

Ogundare, O., & Wiggins, N. (2018). Identifying Sub-documents in a Composite Scanned Document Using Naive Bayes, Levenshtein Distance and Domain Driven Knowledge Base. 5th International Conference on Soft Computing and Machine Intelligence, ISCMI 2018, 84–87.

Pattekari, S. A., & Parveen, A. (2012). Prediction System for Heart Disease Using Naive Bayes. International Journal of Advanced Computer and Mathematical Sciences, 3(3), 290–294.

Pinto, A., Goncalo Oliveira, H., & Oliveira Alves, A. (2016). Comparing the performance of different NLP toolkits in formal and social media text. In 5th Symposium on Languages, Applications and Technologies (SLATE’16).

Suh, J. H., Park, C. H., & Jeon, S. H. (2010). Applying text and data mining techniques to forecasting the trend of petitions filed to e-people. Expert Systems with Applications 37, 7255–7268.

Wallace, E., Feng, S., Kandpal, N., Gardner, M., & Singh, S. (2019). niversal adversarial triggers for attacking and analyzing NLP. ArXiv.

Xiang, Z., Schwartz, Z., Gerdes Jr, J. H., & Uysal, M. (2015). What can big data and text analytics tell us about hotel guestexperience and satisfaction? International Journal of Hospitality Management, 44, 120–130.

Zampieri, M., Malmasi, S., Nakov, P., Ali, A., Shon, S., Glass, J., & Lee, C. (2018). Language Identification and Morphosyntactic Tagging. The Secind Vardial Evaluation Campaign.



  • There are currently no refbacks.

Copyright (c) 2021 Dimas Chaerul Ekty Saputra

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Mobile and Forensics (MF)

ISSN Online: 2714-6685 | Print: 2656-6257
Organized by Department of Magister Teknik Informatika
Published by Universitas Ahmad Dahlan 
Website : 
Email 1 :
Email 2 :

View My Stats