Enhancing Facial Emotion Recognition on FER2013 Using Attention-based CNN and Sparsemax-Driven Class-Balanced Architectures
DOI:
https://doi.org/10.12928/biste.v7i4.14510Keywords:
Facial Emotion Recognition, FER2013, Attention CNN, Sparsemax, Poly-Focal LossAbstract
Facial emotion recognition plays a critical role in various human–computer interaction applications, yet remains challenging due to class imbalance, label noise, and subtle inter-class visual similarities. The FER2013 dataset, containing seven emotion classes, is particularly difficult because of its low resolution and heavily skewed label distribution. This study presents a comparative investigation of advanced deep learning architectures against traditional machine-learning baselines on FER2013 to address these challenges and improve recognition performance. Two novel architectures are proposed. The first is an attention-based convolutional neural network (CNN) that integrates Mish activations and squeeze-and-excitation (SE) channel recalibration to enhance the discriminative capacity of intermediate features. The second, FastCNN-SE, is a refined extension designed for computational efficiency and minority-class robustness, incorporating Sparsemax activation, Poly-Focal loss, class-balanced reweighting, and MixUp augmentation. The research contribution is demonstrating how combining attention, sparse activations, and imbalance-aware learning improves FER performance under challenging real-world conditions. Both models were extensively evaluated: the attention-CNN under 10-fold cross-validation, achieving 0.6170 accuracy and 0.555 macro-F1, and FastCNN-SE on the held-out test set, achieving 0.5960 accuracy and 0.5138 macro-F1. These deep models significantly outperform PCA-based Logistic Regression, Linear SVC, and Random Forest baselines (≤0.37 accuracy and ≤0.29 macro-F1). We additionally justify the differing evaluation protocols by emphasizing cross-validation for architectural stability and held-out testing for generalization and note that FastCNN-SE contains ~3M parameters, enabling efficient inference. These findings demonstrate that architecture-level fusion of SE attention, Sparsemax, and Poly-Focal loss improves balanced emotion recognition, offering a strong foundation for future studies on efficient and robust affective-computing systems.
References
M. Kaur and M. Kumar, “Facial emotion recognition: A comprehensive review,” Expert Syst., vol. 41, no. 10, p. e13670, 2024, https://doi.org/10.1111/exsy.13670.
S. Ullah, J. Ou, Y. Xie, and W. Tian, “Facial expression recognition (FER) survey: a vision, architectural elements, and future directions,” PeerJ Comput. Sci., vol. 10, p. e2024, 2024, https://doi.org/10.7717/peerj-cs.2024.
N. Khan, U. Paracha, A. Akram, and J. Iqbal, “A Detailed Analysis of Emotion Recognition Using Human Facial Features in Intelligent Computing Systems,” Spectr. Eng. Sci., vol. 3, no. 6, pp. 146–157, 2025, https://thesesjournal.com/index.php/1/article/view/448.
S. A. Alanazi, M. Shabbir, N. Alshammari, M. Alruwaili, I. Hussain, and F. Ahmad, “Prediction of emotional empathy in intelligent agents to facilitate precise social interaction,” Appl. Sci., vol. 13, no. 2, p. 1163, 2023, https://doi.org/10.3390/app13021163.
J. Patel, J. Banerjee and D. Singh, "AI-Driven Emotion-Aware Adaptive Systems for Enhancing Real-Time User Engagement," 2025 4th International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), pp. 1690-1695, 2025, https://doi.org/10.1109/ICIMIA67127.2025.11200798.
P. Pulivarthi and A. B. Bhatia, “Designing Empathetic Interfaces Enhancing User Experience Through Emotion,” in Humanizing Technology with Emotional Intelligence, pp. 47–64, 2025, https://doi.org/10.4018/979-8-3693-7011-7.ch004.
C. Stanojevic et al., “Conceptualizing socially-assistive robots as a digital therapeutic tool in healthcare,” Front. Digit. Heal., vol. 5, p. 1208350, 2023, https://doi.org/10.3389/fdgth.2023.1208350.
F. F. Riya, S. Hoque, X. Zhao, and J. S. Sun, “Smart Driver Monitoring Robotic System to Enhance Road Safety: A Comprehensive Review,” arXiv Prepr. arXiv2401.15762, 2024, https://doi.org/10.48550/arXiv.2401.15762.
S. Essahraui et al., "Human Behavior Analysis: A Comprehensive Survey on Techniques, Applications, Challenges, and Future Directions," in IEEE Access, vol. 13, pp. 128379-128419, 2025, https://doi.org/10.1109/ACCESS.2025.3589938.
T. Kopalidis, V. Solachidis, N. Vretos, and P. Daras, “Advances in facial expression recognition: a survey of methods, benchmarks, models, and datasets,” Information, vol. 15, no. 3, p. 135, 2024, https://doi.org/10.3390/info15030135.
H. Ge, Z. Zhu, Y. Dai, B. Wang, and X. Wu, “Facial expression recognition based on deep learning,” Comput. Methods Programs Biomed., vol. 215, p. 106621, 2022, https://doi.org/10.1016/j.cmpb.2022.106621.
M. Karnati, A. Seal, D. Bhattacharjee, A. Yazidi, and O. Krejcar, “Understanding deep learning techniques for recognition of human emotions using facial expressions: A comprehensive survey,” IEEE Trans. Instrum. Meas., vol. 72, pp. 1–31, 2023, https://doi.org/10.1109/TIM.2023.3243661.
S. Nukala, X. Yuan, K. Roy, and O. T. Odeyomi, “Face recognition for blurry images using deep learning,” in 2024 4th International Conference on Computer Communication and Artificial Intelligence (CCAI), pp. 46–52, 2024, https://doi.org/10.1109/CCAI61966.2024.10603301.
P. Aghaomidi, S. Aram and Z. Bahmani, "Leveraging Self-Supervised Learning for Accurate Facial Keypoint Detection in Thermal Images," 2023 30th National and 8th International Iranian Conference on Biomedical Engineering (ICBME), pp. 452-457, 2023, https://doi.org/10.1109/ICBME61513.2023.10488499.
Y. Chen, “Enhancing Re-Identification and Object Detection Through Multi-Modal Feature Learning,” RMIT University, 2025, https://doi.org/10.25439/rmt.29287499.
N. I. Ajali-Hernández and C. M. Travieso-González, “Emotions for Everyone: A Low-Cost, High-Accuracy Method for Emotion Classification,” Cognit. Comput., vol. 17, no. 3, p. 109, 2025, https://doi.org/10.1007/s12559-025-10458-6.
B. Zegeye et al., “Breaking barriers to healthcare access: a multilevel analysis of individual-and community-level factors affecting women’s access to healthcare services in Benin,” Int. J. Environ. Res. Public Health, vol. 18, no. 2, p. 750, 2021, https://doi.org/10.3390/ijerph18020750.
L. Petrescu et al., “Machine learning methods for fear classification based on physiological features,” Sensors, vol. 21, no. 13, p. 4519, 2021, https://doi.org/10.3390/s21134519.
M. C. Gursesli, S. Lombardi, M. Duradoni, L. Bocchi, A. Guazzini, and A. Lanata, “Facial emotion recognition (FER) through custom lightweight CNN model: performance evaluation in public datasets,” IEEE Access, vol. 12, pp. 45543–45559, 2024, https://doi.org/10.1109/ACCESS.2024.3380847.
H. Boughanem, H. Ghazouani, and W. Barhoumi, “Facial Emotion Recognition in-the-Wild Using Deep Neural Networks: A Comprehensive Review,” SN Comput. Sci., vol. 5, no. 1, pp. 1–28, 2024, https://doi.org/10.1007/s42979-023-02423-7.
J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching from handcrafted to deep features: A survey,” Int. J. Comput. Vis., vol. 129, no. 1, pp. 23–79, 2021, https://doi.org/10.1007/s11263-020-01359-2.
G. Shamsipour, S. Fekri-Ershad, M. Sharifi, and A. Alaei, “Improve the efficiency of handcrafted features in image retrieval by adding selected feature generating layers of deep convolutional neural networks,” Signal, image video Process., vol. 18, no. 3, pp. 2607–2620, 2024, https://doi.org/10.1007/s11760-023-02934-z.
N. Kumar, M. Sharma, V. P. Singh, C. Madan, and S. Mehandia, “An empirical study of handcrafted and dense feature extraction techniques for lung and colon cancer classification from histopathological images,” Biomed. Signal Process. Control, vol. 75, p. 103596, 2022, https://doi.org/10.1016/j.bspc.2022.103596.
H. Matthews, G. de Jong, T. Maal, and P. Claes, “Static and motion facial analysis for craniofacial assessment and diagnosing diseases,” Annu. Rev. Biomed. Data Sci., vol. 5, no. 1, pp. 19–42, 2022, https://doi.org/10.1146/annurev-biodatasci-122120-111413.
F. Hu, M. Qian, K. He, W. -A. Zhang and X. Yang, "A Novel Multi-Feature Fusion Network With Spatial Partitioning Strategy and Cross-Attention for Armband-Based Gesture Recognition," in IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 32, pp. 3878-3890, 2024, https://doi.org/10.1109/TNSRE.2024.3487216.
H. Ma, S. Lei, T. Celik, and H.-C. Li, “FER-YOLO-Mamba: facial expression detection and classification based on selective state space,” arXiv Prepr. arXiv2405.01828, 2024, https://doi.org/10.2139/ssrn.5235161.
R. Bravin, L. Nanni, A. Loreggia, S. Brahnam, and M. Paci, “Varied image data augmentation methods for building ensemble,” IEEE Access, vol. 11, pp. 8810–8823, 2023, https://doi.org/10.1109/ACCESS.2023.3239816.
B. Dufumier, P. Gori, I. Battaglia, J. Victor, A. Grigis, and E. Duchesnay, “Benchmarking CNN on 3D anatomical brain MRI: architectures, data augmentation and deep ensemble learning,” arXiv Prepr. arXiv2106.01132, 2021, https://doi.org/10.48550/arXiv.2106.01132.
A. K. Dubey et al., “Ensemble deep learning derived from transfer learning for classification of COVID-19 patients on hybrid deep-learning-based lung segmentation: a data augmentation and balancing framework,” Diagnostics, vol. 13, no. 11, p. 1954, 2023, https://doi.org/10.3390/diagnostics13111954.
L. Davoli et al., “On driver behavior recognition for increased safety: a roadmap,” Safety, vol. 6, no. 4, p. 55, 2020, https://doi.org/10.3390/safety6040055.
M. L. Joshi and N. Kanoongo, “Depression detection using emotional artificial intelligence and machine learning: A closer review,” Mater. Today Proc., vol. 58, pp. 217–226, 2022, https://doi.org/10.1016/j.matpr.2022.01.467.
G. Sikander and S. Anwar, “Driver fatigue detection systems: A review,” IEEE Trans. Intell. Transp. Syst., vol. 20, no. 6, pp. 2339–2352, 2018, https://doi.org/10.1109/TITS.2018.2868499.
R. Saleem and M. Aslam, "A Multi-Faceted Deep Learning Approach for Student Engagement Insights and Adaptive Content Recommendations," in IEEE Access, vol. 13, pp. 69236-69256, 2025, https://doi.org/10.1109/ACCESS.2025.3561459.
F. U. M. Ullah, M. S. Obaidat, A. Ullah, K. Muhammad, M. Hijji, and S. W. Baik, “A comprehensive review on vision-based violence detection in surveillance videos,” ACM Comput. Surv., vol. 55, no. 10, pp. 1–44, 2023, https://doi.org/10.1145/3561971.
L. Cheng, K. R. Varshney, and H. Liu, “Socially responsible ai algorithms: Issues, purposes, and challenges,” J. Artif. Intell. Res., vol. 71, pp. 1137–1181, 2021, https://doi.org/10.1613/jair.1.12814.
M. Mattioli and F. Cabitza, “Not in my face: Challenges and ethical considerations in automatic face emotion recognition technology,” Mach. Learn. Knowl. Extr., vol. 6, no. 4, pp. 2201–2231, 2024, https://doi.org/10.3390/make6040109.
Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9268–9277, 2019, https://doi.org/10.1109/CVPR.2019.00949.
Y. Hong and Y. Chen, “PatchMix: patch-level mixup for data augmentation in convolutional neural networks,” Knowl. Inf. Syst., vol. 66, no. 7, pp. 3855–3881, 2024, https://doi.org/10.1007/s10115-024-02141-3.
X. Zhang, “Multi-modality Medical Image Segmentation with Unsupervised Domain Adaptation,” 2022, https://hdl.handle.net/2123/29776.
J. Zhang et al., “Badlabel: A robust perspective on evaluating and enhancing label-noise learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 6, pp. 4398–4409, 2024, https://doi.org/10.1109/TPAMI.2024.3355425.
M. Aly, A. Ghallab, and I. S. Fathi, “Enhancing facial expression recognition system in online learning context using efficient deep learning model,” IEEE Access, vol. 11, pp. 121419–121433, 2023, https://doi.org/10.1109/ACCESS.2023.3325407.
G. Zhao, X. Li, Y. Li, and M. Pietikäinen, “Facial micro-expressions: An overview,” Proc. IEEE, vol. 111, no. 10, pp. 1215–1235, 2023, https://doi.org/10.1109/JPROC.2023.3275192.
H. Yunusa, S. Qin, A. H. A. Chukkol, A. A. Yusuf, I. Bello, and A. Lawan, “Exploring the synergies of hybrid CNNs and ViTs architectures for computer vision: A survey,” arXiv Prepr. arXiv2402.02941, 2024, https://doi.org/10.48550/arXiv.2402.02941.
C. Liu, K. Hirota, and Y. Dai, “Patch attention convolutional vision transformer for facial expression recognition with occlusion,” Inf. Sci. (Ny)., vol. 619, pp. 781–794, 2023, https://doi.org/10.1016/j.ins.2022.11.068.
Y. Li, “Mental Health Management App Interface Design for Women Based on Emotional Equality,” in International Conference on Human-Computer Interaction, pp. 368–378, 2024, https://doi.org/10.1007/978-3-031-61963-2_37.
A. A. Heydari, C. A. Thompson, and A. Mehmood, “Softadapt: Techniques for adaptive loss weighting of neural networks with multi-part loss functions,” arXiv Prepr. arXiv1912.12355, 2019, https://doi.org/10.48550/arXiv.1912.12355.
J. Chandrasekaran, S. T. Pandeeswari, and S. Pudumalar, “Silos to Synergy: Harnessing Integrated Learning for Improved Outcomes,” J. Eng. Educ. Transform., pp. 318–325, 2024, https://doi.org/10.16920/jeet/2024/v37is2/24056.
S. Li and W. Deng, “Deep facial expression recognition: A survey,” IEEE Trans. Affect. Comput., vol. 13, no. 3, pp. 1195–1215, 2020, https://doi.org/10.1109/TAFFC.2020.2981446.
J.-P. Jiang, S.-Y. Liu, H.-R. Cai, Q. Zhou, and H.-J. Ye, “Representation learning for tabular data: A comprehensive survey,” arXiv Prepr. arXiv2504.16109, 2025, .
V. W. Anelli, A. Bellogin, T. Di Noia, D. Jannach, and C. Pomo, “Top-n recommendation algorithms: A quest for the state-of-the-art,” in Proceedings of the 30th ACM conference on user modeling, adaptation and personalization, pp. 121–131, 2022, https://doi.org/10.1145/3503252.3531292.
C. Molnar, G. Casalicchio, and B. Bischl, “Interpretable machine learning--a brief history, state-of-the-art and challenges,” in Joint European conference on machine learning and knowledge discovery in databases, pp. 417–431, 2020, https://doi.org/10.1007/978-3-030-65965-3_28.
R. Yu and Z. Xu, “Crater-MASN: A Multi-Scale Adaptive Semantic Network for Efficient Crater Detection,” Remote Sens., vol. 17, no. 18, p. 3139, 2025, https://doi.org/10.3390/rs17183139.
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Christiany Suwartono, Julius Victor Manuel Bata, Gregorius Airlangga

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
This journal is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

