A Hybrid Weighted Soft-Voting Ensemble with Integrated Preprocessing for Breast Cancer Classification on the WDBC Dataset
DOI:
https://doi.org/10.12928/biste.v8i3.14587Keywords:
Breast Cancer Classification, Machine Learning, Ensemble Learning, Hybrid Model, XGBoost, Precision Medicine, Diagnostic PipelineAbstract
Breast cancer continues to be one of the major causes of deaths due to cancer amongst women around the globe, requiring an effective method of diagnosis. In this research, a machine learning model pipeline that uses a hybrid weighted soft-voting based ensemble method on breast tumor classification for binary data is presented using the Wisconsin Diagnostic Breast Cancer Dataset (WDBC). Data preprocessing includes balancing the classes using Synthetic Minority Over-sampling Technique (SMOTE), removing the highly correlated attributes, and reducing dimensionality through Principal Component Analysis (PCA). Using stratified 10-fold cross-validation, an ensemble yielded 99.3% accuracy and 100% recall for the malignant class on PCA-transformed data, which was better than any individual classifier. Logistic Regression had good performance too, with 98.83% accuracy and 99.81% ROC AUC, which shows that our data can be nearly linearly separable. The feature importance analysis showed that “worst concave points” and “mean radius” were the most important features, and this makes sense from a medical perspective. Overall, this work presents an effective methodology for diagnosing breast cancer.
References
A. E. Strelcenia and S. Prakoonwit, “Effective feature engineering and classification of breast cancer diagnosis: A comparative study,” BioMedInformatics, vol. 3, no. 3, pp. 616–631, 2023, https://doi.org/10.3390/biomedinformatics3030042.
M. A. A. Albadr, M. Ayob, S. Tiun, F. T. AL-Dhief, A. Arram, and S. Khalaf, “Breast cancer diagnosis using the fast learning network algorithm,” Front. Oncol., vol. 13, p. 1150840, 2023, https://doi.org/10.3389/fonc.2023.1150840.
M. M. Hossin et al., “Breast cancer detection: an effective comparison of different machine learning algorithms on the Wisconsin dataset,” Bull. Electr. Eng. Informatics, vol. 12, no. 4, pp. 2446–2456, 2023, https://doi.org/10.11591/beei.v12i4.4448.
M. S. A. Reshan et al., “Enhancing breast cancer detection and classification using advanced multi-model features and ensemble machine learning techniques,” Life, vol. 13, no. 10, p. 2093, 2023, https://doi.org/10.3390/life13102093.
V. Vandenbussche. The Regularization Cookbook: Explore practical recipes to improve the functionality of your ML models. Packt Publishing Ltd. 2023. https://books.google.co.uk/books?hl=id&lr=&id=gFHNEAAAQBAJ.
A. Michel et al., “Breast cancer risk prediction combining a convolutional neural network-based mammographic evaluation with clinical factors,” Breast Cancer Res. Treat., vol. 200, no. 2, pp. 237–245, 2023, https://doi.org/10.1007/s10549-023-06966-4.
A. A. Nafea, M. AL-Mahdawi, K. M. A. Alheeti, M. S. I. Alsumaidaie, and M. M. AL-Ani, "A Hybrid Method of 1D-CNN and Machine Learning Algorithms for Breast Cancer Detection," Baghdad Science Journal, vol. 21, no. 10, p. 19, 2024, https://doi.org/10.21123/bsj.2024.9443.
T. Khater et al., “An explainable artificial intelligence model for the classification of breast cancer,” IEEE Access, vol. 11, pp. 101049–101063, 2023, https://doi.org/10.21123/bsj.2024.9443.
F. Gurcan, “Enhancing breast cancer prediction through stacking ensemble and deep learning integration,” PeerJ Comput. Sci., vol. 11, p. e2461, 2025, https://doi.org/10.1109/ACCESS.2023.3308446.
N. Kavitha, P. Madhumathy, R. M. Prasad et al., “Machine learning technique for breast cancer detection and classification,” Mach. Learn. Comput. Sci. Eng., vol. 1, p. 16, 2025, https://doi.org/10.7717/peerj-cs.2461.
S. Shukla, S. Rajkumar, A. Sinha, M. Esha, K. Elango, and V. Sampath, “Federated learning with differential privacy for breast cancer diagnosis enabling secure data sharing and model integrity,” Sci. Rep., vol. 15, no. 1, p. 13061, 2025, https://doi.org/10.1007/s44379-025-00018-y.
A. M. Al-Hejri et al., “A hybrid explainable federated-based vision transformer framework for breast cancer prediction via risk factors,” Sci. Rep., vol. 15, no. 1, p. 18453, 2025, https://doi.org/10.1038/s41598-025-95858-2.
N. K. K. Raju, A. Khatua, S. Tarun and M. Monica Subashini, "Breast Cancer Classification Using Ensemble Approach, Machine Learning and Deep Learning," 2022 International Conference on Futuristic Technologies (INCOFT), pp. 1-8, 2022, https://doi.org/10.1038/s41598-025-96527-0.
A. Rasool, C. Bunterngchit, L. Tiejian, M. R. Islam, Q. Qu, and Q. Jiang, “Improved machine learning-based predictive models for breast cancer diagnosis,” International journal of environmental research and public health, vol. 19, no. 6, p. 3211, 2022, https://doi.org/10.3390/ijerph19063211.
H. S. Hegde and A. Kodipalli, "Machine Learning Based Approach for Breast Cancer Detection," 2022 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), pp. 782-786, 2022, https://doi.org/10.1109/ICCCIS56430.2022.10037645.
E. Akkur, F. Turk, and O. Erogul, “Breast cancer diagnosis using feature selection approaches and Bayesian optimization,” Computer Systems Science & Engineering, vol. 45, no. 2, 2023, https://doi.org/10.32604/csse.2023.033003.
K. Karimi, A. Ghodratnama, and R. Tavakkoli-Moghaddam, “Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: a comprehensive analysis,” Annals of Operations Research, vol. 328, no. 1, pp. 665-700, 2023, https://doi.org/10.1007/s10479-022-04933-8.
S. Aamir et al., “Predicting breast cancer leveraging supervised machine learning techniques,” Computational and Mathematical Methods in Medicine, vol. 2022, no. 1, p. 5869529, 2022, https://doi.org/10.37917/ijeee.19.2.6.
M. S. Hashim and A. A. Yassin, “Breast Cancer Prediction Using Soft Voting Classifier Based on Machine Learning Models,” IAENG International Journal of Computer Science, vol. 50, no. 2, 2023, https://doi.org/10.37917/ijeee.19.2.6.
L. K. Singh, M. Khanna, and R. Singh, “Efficient feature selection for breast cancer classification using soft computing approach: A novel clinical decision support system,” Multimedia Tools and Applications, vol. 83, no. 14, pp. 43223-43276, 2024, https://doi.org/10.1007/978-981-99-0189-0_48.
A. K. Singh, “Breast cancer classification using ML on WDBC,” In Machine Vision and Augmented Intelligence: Select Proceedings of MAI 2022, pp. 609-619, 2023, https://doi.org/10.1007/978-981-99-0189-0.
A. A. Khan and M. A. Bakr, “Enhancing breast cancer diagnosis with integrated dimensionality reduction and machine learning techniques,” Journal of Computing & Biomedical Informatics, vol. 7, no. 02, 2024, https://www.jcbi.org/index.php/Main/article/view/573.
kadhim ajlan, I., Murad, H., Salim, A. A., & fadhil bin yousif, A. (2025). Extreme learning machine algorithm for breast cancer diagnosis. Multimedia Tools and Applications, 84(15), 14739-14758, 2025, https://doi.org/10.1007/s11042-024-19515-y.
I. Chhillar and A. Singh, “An improved soft voting-based machine learning technique to detect breast cancer utilizing effective feature selection and SMOTE-ENN class balancing. Discover Artificial Intelligence, 5(1), 4, 2025, https://doi.org/10.1007/s44163-025-00224-w.
T. Islam et al., “Predictive modeling for breast cancer classification in the context of Bangladeshi patients by use of machine learning approach with explainable AI,” Scientific reports, vol. 14, no. 1, p. 8487, 2024, https://doi.org/10.1038/s41598-024-57740-5.
J. Zhu, Z. Zhao, B. Yin, C. Wu, C. Yin, R. Chen, and Y. Ding, “An integrated approach of feature selection and machine learning for early detection of breast cancer,” Scientific reports, vol. 15, no. 1, p. 13015, 2025, https://doi.org/10.1038/s41598-025-97685-x.
M. S. Shahid and A. Imran, “Breast cancer detection using deep learning techniques: challenges and future directions,” Multimedia Tools and Applications, vol. 84, no. 6, pp. 3257-3304, 2025, https://doi.org/10.1007/s11042-025-20606-7.
H. A. Essa, E. Ismaiel, and M. F. A. Hinnawi, “Feature-based detection of breast cancer using convolutional neural network and feature engineering,” Scientific Reports, vol. 14, no. 1, p. 22215, 2024, https://doi.org/10.1038/s41598-024-73083-7.
M. Eftekharian, M. Hosseiny, N. Motallebi, and S. N. Esterabadi, “Accurate and interpretable breast cancer diagnosis using logistic regression: An evaluation on the Wisconsin Diagnostic Dataset,” InfoScience Trends, vol. 2, no. 9, pp. 52–60, 2025, https://doi.org/10.61882/ist.202502.09.04.
A. T. Garba and H. S. Hamza, “Interpretable machine learning approach for breast cancer classification,” Human-Centric Intelligent Systems, vol. 5, no. 3, pp. 308-322, 2025, https://doi.org/10.1007/s44230-025-00111-8.
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Mohammad Khalaf Rahim Al-Juaifari, Israa Mohammed Rahi Alabudy, Ammar Rasoul Mohammad Alsaachi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
This journal is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

