Machine Learning Approaches for Binary Classification of Portion Size and Cooking Time in Indonesian Recipes

Devi Dwi Purwanto; Aji Prasetya Wibawa; Mazarina Devi

doi:10.12928/biste.v8i3.16013

Authors

Devi Dwi Purwanto Universitas Negeri Malang
Aji Prasetya Wibawa Universitas Negeri Malang
Mazarina Devi Universitas Negeri Malang

DOI:

https://doi.org/10.12928/biste.v8i3.16013

Keywords:

XGBoost, Feature Engineering, Computational Gastronomy, Predictive Analytics, Indonesian Cuisine

Abstract

Estimating portion sizes and cooking times are goals for smart kitchen assistants, enabling better meal planning and reducing food waste due to over-portioning. Existing approaches in computational gastronomy often struggle to provide estimates from prepared ingredient data. This study uses XGBoost to extract features from a dataset containing 1,400 Indonesian recipes to predict binary classification targets for portion sizes and required cooking times. The dataset used for the prediction includes information on ingredients and their quantities, as well as preparation steps. In addition to the recipe dataset, the TKPI dataset is also used to help determine the category of food ingredients, protein content, and cooking technique complexity. This dataset is then further optimized with hyperparameters to maximize model performance. This paper conducted trials with 6 models where the best model for portion size had an accuracy of 0.7821 with a balanced accuracy of 0.4929, and an F1 Score of 0.8763, while the accuracy for cooking time was 0.6929 with a balanced accuracy of 0.6445, and an F1 Score of 0.7737. From the best model, it was found that the quantity of weighted ingredients and the distribution of ingredients per step were among the most influential features, while step-based and technique-based features were the most important features for cooking time. The contribution of this research is the development of an interpretable model for meal planning efficiency in culinary applications. These results indicate that feature aggregation combined with XGBoost provides actionable insights for smart kitchen assistants and recommendation systems.

References

A. Sanatbyek et al., “A Multitask Deep Learning Model for Food Scene Recognition and Portion Estimation—the Food Portion Benchmark (FPB) Dataset,” IEEE Access, vol. 13, pp. 152033–152045, 2025, https://doi.org/10.1109/ACCESS.2025.3603287.

J. Sultana, B. Md. Ahmed, M. M. Masud, A. K. O. Huq, M. E. Ali, and M. Naznin, “A Study on Food Value Estimation From Images: Taxonomies, Datasets, and Techniques,” IEEE Access, vol. 11, pp. 45910–45935, 2023, https://doi.org/10.1109/ACCESS.2023.3274475.

D. Choi, M. Gim, S. Badreddine, H. Kim, D. Park, and J. Kang, “KitchenScale: Learning to predict ingredient quantities from recipe contexts,” Expert Systems with Applications, vol. 224, p. 120041, 2023, https://doi.org/10.1016/j.eswa.2023.120041.

H. Ding et al., “The Application of Artificial Intelligence and Big Data in the Food Industry,” Foods, vol. 12, no. 24, p. 4511, 2023, https://doi.org/10.3390/foods12244511.

N. U. Gilal et al., “Evaluating machine learning technologies for food computing from a data set perspective,” Multimed Tools Appl, vol. 83, no. 11, pp. 32041–32068, 2023, https://doi.org/10.1007/s11042-023-16513-4.

N. Sakib, G. M. Shahariar, Md. M. Kabir, Md. K. Hasan, and H. Mahmud, “Towards automated recipe genre classification using semi-supervised learning,” PLoS ONE, vol. 20, no. 1, p. e0317697, 2025, https://doi.org/10.1371/journal.pone.0317697.

D. Liu, E. Zuo, D. Wang, L. He, L. Dong, and X. Lu, “Deep Learning in Food Image Recognition: A Comprehensive Review,” Applied Sciences, vol. 15, no. 14, p. 7626, 2025, https://doi.org/10.3390/app15147626.

Y. Feng, Y. Wang, X. Wang, J. Bi, Z. Xiao, and Y. Luo, “Large-scale image classification and nutrient estimation for Chinese dishes,” Journal of Agriculture and Food Research, vol. 19, p. 101733, Mar. 2025, https://doi.org/10.1016/j.jafr.2025.101733.

G. Bagler and M. Goel, “Computational gastronomy: capturing culinary creativity by making food computable,” npj Syst Biol Appl, vol. 10, no. 1, p. 72, 2024, https://doi.org/10.1038/s41540-024-00399-5.

M. Bellingeri, A. Bidon-Chanal Badia, M. V. Rigat, R. Alfieri, M. Turchetto, and D. Cassi, “The recipe similarity network: a new algorithm to extract relevant information from cookbooks,” Sci Rep, vol. 15, no. 1, p. 34380, 2025, https://doi.org/10.1038/s41598-025-17189-6.

N. Jia, J. Chen, and R. Wang, “An attention-based convolutional neural network for recipe recommendation,” Expert Systems with Applications, vol. 201, p. 116979, 2022, https://doi.org/10.1016/j.eswa.2022.116979.

R. Ouyang, H. Huang, W. Ou, and Q. Liu, “Multimodal Recipe Recommendation with Heterogeneous Graph Neural Networks,” Electronics, vol. 13, no. 16, p. 3283, 2024, https://doi.org/10.3390/electronics13163283.

B. K. Rai, N. S. Chandan, D. N. Marangappanavar, S. Indira, and G. Kumar, “Classifying food ingredients using machine learning on nutritional and biochemical data,” Discov Food, vol. 5, no. 1, p. 382, 2025, https://doi.org/10.1007/s44187-025-00661-7.

T. Naravane and I. Tagkopoulos, “Machine learning models to predict micronutrient profile in food after processing,” Current Research in Food Science, vol. 6, p. 100500, 2023, https://doi.org/10.1016/j.crfs.2023.100500.

V. Zatsu et al., “Revolutionizing the food industry: The transformative power of artificial intelligence-a review,” Food Chemistry: X, vol. 24, p. 101867, 2024, https://doi.org/10.1016/j.fochx.2024.101867.

M. Rostami, M. Akbari, M. Akbari, M. Faramarzzadeh, I. Virtanen, and M. Oussalah, “Recipe popularity prediction in Finnish social media: integrating visual and non-visual features,” Soc. Netw. Anal. Min., vol. 15, no. 1, p. 107, 2025, https://doi.org/10.1007/s13278-025-01534-8.

N. Theera-Ampornpunt and P. Treepong, “Visual Food Ingredient Prediction Using Deep Learning with Direct F-Score Optimization,” Foods, vol. 14, no. 24, p. 4269, 2025, https://doi.org/10.3390/foods14244269.

M. Rodrigues, V. Miguéis, S. Freitas, and T. Machado, “Machine learning models for short-term demand forecasting in food catering services: A solution to reduce food waste,” Journal of Cleaner Production, vol. 435, p. 140265, 2024, https://doi.org/10.1016/j.jclepro.2023.140265.

G. Ispirova, T. Eftimov, S. Džeroski, and B. K. Seljak, “MsGEN: Measuring generalization of nutrient value prediction across different recipe datasets,” Expert Systems with Applications, vol. 237, p. 121507, 2024, https://doi.org/10.1016/j.eswa.2023.121507.

M. De Clercq, M. Stock, B. De Baets, and W. Waegeman, “Data-driven recipe completion using machine learning methods,” Trends in Food Science & Technology, vol. 49, pp. 1–13, 2016, https://doi.org/10.1016/j.tifs.2015.11.010.

D. Tao, P. Yang, and H. Feng, “Utilization of text mining as a big data analysis tool for food science and nutrition,” Comp Rev Food Sci Food Safe, vol. 19, no. 2, pp. 875–894, 2020, https://doi.org/10.1111/1541-4337.12540.

I. Cabeza-Gil, I. Ríos-Ruiz, M. Á. Martínez, B. Calvo, and J. Grasa, “Digital twins for monitoring and predicting the cooking of food products: A case study for a French crêpe,” Journal of Food Engineering, vol. 359, p. 111697, 2023, https://doi.org/10.1016/j.jfoodeng.2023.111697.

J. Pasaribu, N. Yudistira, and W. F. Mahmudy, “Tabular Data Classification and Regression : XGBoost or Deep Learning with Retrieval-Augmented Generation,” IEEE Access, pp. 1–1, 2024, https://doi.org/10.1109/ACCESS.2024.3518205.

A. Shmuel, O. Glickman, and T. Lazebnik, “A comprehensive benchmark of machine and deep learning models on structured data for regression and classification,” Neurocomputing, vol. 655, p. 131337, 2025, https://doi.org/10.1016/j.neucom.2025.131337.

M. Yamaguchi, M. Araki, K. Hamada, T. Nojiri, and N. Nishi, “Development of a Machine Learning Model for Classifying Cooking Recipes According to Dietary Styles,” Foods, vol. 13, no. 5, p. 667, 2024, https://doi.org/10.3390/foods13050667.

C. Trattner, T. Kusmierczyk, and K. Nørvåg, “Investigating and predicting online food recipe upload behavior,” Information Processing & Management, vol. 56, no. 3, pp. 654–673, 2019, https://doi.org/10.1016/j.ipm.2018.10.016.

J. Zhou, X. Xin, S. Yu, J. Liu, W. Li, and X. Cui, “Advancing Authentic Recipe Ideation across Culinary Styles using a Mathematical Model: RecipeMT,” Journal of Future Foods, p. S2772566925001338, 2025, https://doi.org/10.1016/j.jfutfo.2025.07.004.

X. Wu et al., “Transforming Food Consumer Analysis: The Role of Machine Learning in Food Consumer Demand 4.0,” Journal of Future Foods, p. S2772566925001454, 2025, https://doi.org/10.1016/j.jfutfo.2024.12.008.

E. Kirtil, “Machine learning-driven integration of GC–MS and sensory panel data for aroma prediction in food systems,” Journal of Food Composition and Analysis, vol. 148, p. 108594, 2025, https://doi.org/10.1016/j.jfca.2025.108594.

L. Huang et al., “Dish-level carbon and nutrition dataset for 4,403 Asian recipes,” Sci Data, vol. 12, no. 1, p. 1917, 2025, https://doi.org/10.1038/s41597-025-06180-5.

P. Ma et al., “Application of machine learning for estimating label nutrients using USDA Global Branded Food Products Database, (BFPD),” Journal of Food Composition and Analysis, vol. 100, p. 103857, 2021, https://doi.org/10.1016/j.jfca.2021.103857.

K. Sugioka, S. Kamei, and Y. Morimoto, “BERT Pre-Training for Cooking Time Prediction from Japanese Cooking Recipes,” IEICE Trans. Inf. & Syst., vol. E109.D, no. 4, pp. 531–540, 2026, https://doi.org/10.1587/transinf.2025EDP7055.

P. Koukaras and C. Tjortjis, “Data Preprocessing and Feature Engineering for Data Mining: Techniques, Tools, and Best Practices,” AI, vol. 6, no. 10, p. 257, 2025, https://doi.org/10.3390/ai6100257.

D. Li, Z. Tan, and H. Liu, “Exploring Large Language Models for Feature Selection: A Data-centric Perspective,” SIGKDD Explor. Newsl., vol. 26, no. 2, pp. 44–53, 2025, https://doi.org/10.1145/3715073.3715077.

T. Liu, T. Chong, G. Xu, X. Wang, P. Peng, and J. Ma, “XGBoost-LR: A method for network traffic anomaly detection,” International Journal of Cognitive Computing in Engineering, vol. 7, pp. 325–333, 2026, https://doi.org/10.1016/j.ijcce.2025.11.007.

Q. A. Hidayaturrohman and E. Hanada, “Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart Failure,” BioMedInformatics, vol. 4, no. 4, pp. 2201–2212, 2024, https://doi.org/10.3390/biomedinformatics4040118.

F. E. Bezerra et al., “Impacts of Feature Selection on Predicting Machine Failures by Machine Learning Algorithms,” Applied Sciences, vol. 14, no. 8, p. 3337, 2024, https://doi.org/10.3390/app14083337.

M. C. Barbieri, B. I. Grisci, and M. Dorn, “Analysis and comparison of feature selection methods towards performance and stability,” Expert Systems with Applications, vol. 249, p. 123667, 2024, https://doi.org/10.1016/j.eswa.2024.123667.

M. Yamaguchi, M. Araki, K. Hamada, T. Nojiri, and N. Nishi, “Development of a Machine Learning Model for Classifying Cooking Recipes According to Dietary Styles,” Foods, vol. 13, no. 5, p. 667, 2024, https://doi.org/10.3390/foods13050667.

M. Alizamir et al., “An interpretable XGBoost-SHAP machine learning model for reliable prediction of mechanical properties in waste foundry sand-based eco-friendly concrete,” Results in Engineering, vol. 25, p. 104307, 2025, https://doi.org/10.1016/j.rineng.2025.104307.

K. Le Nguyen, M. Shakouri, and L. S. Ho, “Investigating the effectiveness of hybrid gradient boosting models and optimization algorithms for concrete strength prediction,” Engineering Applications of Artificial Intelligence, vol. 149, p. 110568, 2025, https://doi.org/10.1016/j.engappai.2025.110568.

P. Sridevi, Z. Arefin, and S. I. Ahamed, “An integrated machine learning and hyperparameter optimization framework for noninvasive creatinine estimation using photoplethysmography signals,” Healthcare Analytics, vol. 7, p. 100395, 2025, https://doi.org/10.1016/j.health.2025.100395.

C. B. Arachchilage, G. Huang, J. Zhao, C. Fan, and W. V. Liu, “Hybrid extreme gradient boosting regressor models for the multi-objective mixture design optimization of cementitious mixtures incorporating mine tailings as fine aggregates,” Cement and Concrete Composites, vol. 154, p. 105787, 2024, https://doi.org/10.1016/j.cemconcomp.2024.105787.

M. Gertz et al., “Using the XGBoost algorithm to classify neck and leg activity sensor data using on-farm health recordings for locomotor-associated diseases,” Computers and Electronics in Agriculture, vol. 173, p. 105404, 2020, https://doi.org/10.1016/j.compag.2020.105404.

M. Golec and M. AlabdulJalil, “CRAXNet: Credit Rating via Advanced XGBoost and Neural Networks,” Kuwait Journal of Science, vol. 53, no. 1, p. 100490, 2026, https://doi.org/10.1016/j.kjs.2025.100490.

I. M. Rajagukguk, R. Hartanto, Julian, and R. Halim, “Comparative Analysis of XGBoost, Random Forest, and Logistic Regression for Classifying Jakarta’s Air Pollution Index (ISPU),” Procedia Computer Science, vol. 269, pp. 108–120, 2025, https://doi.org/10.1016/j.procs.2025.08.264.

L. R. Sitompul, A. A. Nababan, M. L. Manihuruk, W. A. Ponsen, and S. Supriyandi, “Comparison of Xgboost, Random Forest and Logistic Regression Algorithms in Stroke Disease Classification,” SinkrOn, vol. 9, no. 2, pp. 957–968, 2025, https://doi.org/10.33395/sinkron.v9i2.14794.

A. R. Ramadan, M. A. Hariyadi, and A. T. W. Almais, “XGBoost Model Optimization Using PCA for Classification of Cyber Attacks on The Internet of Things,” International Journal of Advances in Data and Information Systems, vol. 6, no. 3, 2025, https://doi.org/10.59395/ijadis.v6i3.1465.

T. Abedin, H. Xu, and S. Uddin, “The impact of K selection in K‑fold cross-validation on bias and variance in supervised learning models,” Sci Rep, vol. 16, no. 1, p. 6084, 2026, https://doi.org/10.1038/s41598-026-37247-x.

J. M. Gorriz, R. Martin-Clemente, F. Segovia, J. Ramirez, A. Ortiz, and J. Suckling, “Is K-fold cross validation the best model selection method for Machine Learning?,” Information Fusion, p. 104404, 2026, https://doi.org/10.1016/j.inffus.2026.104404.

T.-T. Wong, “Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation,” Pattern Recognition, vol. 48, no. 9, pp. 2839–2846, 2015, https://doi.org/10.1016/j.patcog.2015.03.009.

V. Teodorescu and L. Obreja Brașoveanu, “Assessing the Validity of k-Fold Cross-Validation for Model Selection: Evidence from Bankruptcy Prediction Using Random Forest and XGBoost,” Computation, vol. 13, no. 5, p. 127, 2025, https://doi.org/10.3390/computation13050127.

M. K. Mayangsari, I. Syarif, and A. Barakbah, “Evaluation of Stratified K-Fold Cross Validation for Predicting Bug Severity in Game Review Classification,” KINETIK, 2023, https://doi.org/10.22219/kinetik.v8i3.1740.

G. Baron and U. Stańczyk, “Standard vs. non-standard cross-validation: evaluation of performance in a space with structured distribution of datapoints,” Procedia Computer Science, vol. 192, pp. 1245–1254, 2021, https://doi.org/10.1016/j.procs.2021.08.128.

V. Teodorescu and L. Obreja Brașoveanu, “Assessing the Validity of k-Fold Cross-Validation for Model Selection: Evidence from Bankruptcy Prediction Using Random Forest and XGBoost,” Computation, vol. 13, no. 5, p. 127, 2025, https://doi.org/10.3390/computation13050127.

P. Heidari and A. Milan, “Combining K-fold cross validation with bayesian hyperparameter optimization for accuracy enhancement of land cover and land use classification,” Sci Rep, vol. 15, no. 1, p. 39758, 2025, https://doi.org/10.1038/s41598-025-23336-w.

H. Khoshvaght, R. R. Permala, A. Razmjou, and M. Khiadani, “A critical review on selecting performance evaluation metrics for supervised machine learning models in wastewater quality prediction,” Journal of Environmental Chemical Engineering, vol. 13, no. 6, p. 119675, 2025, https://doi.org/10.1016/j.jece.2025.119675.

M. Conciatori, A. Valletta, and A. Segalini, “Improving the quality evaluation process of machine learning algorithms applied to landslide time series analysis,” Computers & Geosciences, vol. 184, p. 105531, 2024, https://doi.org/10.1016/j.cageo.2024.105531.

P. Boozary, S. Sheykhan, H. GhorbanTanhaei, and C. Magazzino, “Enhancing customer retention with machine learning: A comparative analysis of ensemble models for accurate churn prediction,” International Journal of Information Management Data Insights, vol. 5, no. 1, p. 100331, 2025, https://doi.org/10.1016/j.jjimei.2025.100331.

F. K. Wijaya, T. C. Sugijono, R. Setiawan, and R. Y. Rumagit, “Comparative Analysis of Machine Learning Random Forest, Naïve Bayes and SVM for Flight Delay Classification,” Procedia Computer Science, vol. 269, pp. 1546–1555, 2025, https://doi.org/10.1016/j.procs.2025.09.096.

S. Boujmiraz, H. Darhmaoui, and A. Drissi El Maliani, “Predicting student performance: A comprehensive review of machine learning, deep learning, and explainable AI approaches,” Computers and Education: Artificial Intelligence, vol. 10, p. 100548, 2026, https://doi.org/10.1016/j.caeai.2026.100548.