An Empirical Study of Pre-trained CNN Models for Multiclass Emotion Recognition | IJCT Volume 13 – Issue 2 | IJCT-V13I2P7

International Journal of Computer Techniques
ISSN 2394-2231
Volume 13, Issue 2  |  Published: March – April 2026

Author

Dr. Megha Bansal, Ms. Tanvi Dalal, Dr. Mitanshi Rastogi, Dr. Neha Goel

Abstract

Emotion recognition from facial expressions has emerged as a significant research area in computer vision and affective computing due to its wide range of applications in healthcare, human–computer interaction, surveillance, and intelligent systems. Recent advancements in deep learning have demonstrated that pre-trained Convolutional Neural Network (CNN) models can effectively extract discriminative features for emotion classification tasks. This study presents an empirical evaluation of multiple pre-trained CNN architectures for multiclass facial emotion recognition. In this work, widely adopted deep learning models are fine-tuned using transfer learning to classify facial images into seven fundamental emotion categories: anger, disgust, fear, happiness, sadness, surprise, and neutral. A standardized experimental framework is employed, including uniform preprocessing, data augmentation, and hyperparameter settings to ensure fair comparison. Model performance is evaluated using accuracy, precision, recall, F1-score, confusion matrix analysis, and computational efficiency metrics such as training time and parameter complexity. The experimental findings reveal notable variations in classification performance and computational cost among the evaluated architectures. While deeper networks demonstrate strong feature extraction capability, optimized models provide a superior balance between accuracy and efficiency. The results highlight the effectiveness of transfer learning in improving multiclass emotion recognition performance, particularly when training data is limited. This study provides practical insights into selecting suitable pre-trained CNN models for robust and scalable emotion recognition systems and contributes to the development of efficient deep learning solutions for real-world affective computing applications.

Keywords

Multiclass Emotion Recognition, CNN, FER

Conclusion

This empirical study evaluated the performance of multiple pre-trained convolutional neural network (CNN) architectures—EfficientNet, ResNet, and VGG—for multiclass facial emotion recognition using the FER-2013 dataset. The comparative analysis demonstrated that transfer learning significantly enhances classification performance in emotion recognition tasks, even when trained on relatively limited and imbalanced datasets. Among the evaluated models, EfficientNet achieved the highest overall accuracy and demonstrated better generalization capability compared to ResNet and VGG. ResNet showed competitive performance with stable convergence, while VGG, though simpler in architecture, required longer training time and exhibited slightly lower accuracy. The confusion matrix analysis revealed that emotions such as Happy and Neutral were classified with higher precision, whereas Fear, Disgust, and Surprise showed higher misclassification rates, primarily due to subtle inter-class similarities and dataset imbalance. The findings confirm that deeper and computationally optimized architectures like EfficientNet provide superior feature extraction and improved performance in multiclass emotion recognition scenarios. Moreover, fine-tuning pre-trained models proves to be an effective strategy for improving robustness and convergence speed compared to training CNNs from scratch. Overall, this study highlights the effectiveness of transfer learning-based CNN frameworks in advancing automated facial emotion recognition systems. The results can contribute to the development of intelligent human–computer interaction systems, affect-aware applications, and real-time emotion analysis platforms.

References

[1] R. W. Picard, Affective Computing. MIT Press, 1997. [2] Fasel and J. Luettin, “Automatic facial expression analysis: A survey,” Pattern Recognition, vol. 36, pp. 259–275, 2003. [3] Ojala, M. Pietikäinen, and D. Harwood, “A comparative study of texture measures with classification based on featured distributions,” Pattern Recognition, vol. 29, pp. 51–59, 1996. [4] Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” NeurIPS, 2012. [5] LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, 2015. [6] Liu et al., “A survey of facial expression recognition datasets: Challenges and opportunities,” IEEE Trans. Affective Computing, 2020. [7] J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowledge and Data Eng., vol. 22, no. 10, 2010. [8] Yosinski et al., “How transferable are features in deep neural networks?” NeurIPS, 2014. [9] Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ICLR, 2015. [10] He et al., “Deep residual learning for image recognition,” CVPR, 2016. [11] Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” ICML, 2019. [12] Mollahosseini, D. Chan, and M. H. Mahoor, “Going deeper in facial expression recognition using deep neural networks,” WACV, 2016. [13] Ojala, M. Pietikäinen, and D. Harwood, “A comparative study of texture measures with classification based on featured distributions,” Pattern Recognition, vol. 29, pp. 51–59, 1996. [14] Yin et al., “A 3D facial expression database for facial behavior research,” IEEE International Conference on Automatic Face and Gesture Recognition, 2000. [15] Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” NeurIPS, 2012. [16] Mollahosseini, D. Chan, and M. H. Mahoor, “Going deeper in facial expression recognition using deep neural networks,” IEEE Winter Conference on Applications of Computer Vision (WACV), 2016. [17] J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010. [18] Zhang, “Transfer learning for facial expression recognition: A comprehensive analysis,” IEEE Access, 2019. [19] Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild,” CVPR, 2017. [20] Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” ICML, 2019. [21] Dhall et al., “Emotion recognition in the wild: A comparative study,” IEEE Transactions on Affective Computing, 2020. [22] Zhang and Z. Zhang, “Attention-based ResNet for facial expression recognition,” International Journal of Computer Vision and Robotics, 2019. [23] C. Niebles, C. K. Koller, and F. Rosenblatt, “Hybrid CNN–RNN models for dynamic facial expression recognition,” Pattern Recognition Letters, 2018.

How to Cite This Paper

Dr. Megha Bansal, Ms. Tanvi Dalal, Dr. Mitanshi Rastogi, Dr. Neha Goel (2026). An Empirical Study of Pre-trained CNN Models for Multiclass Emotion Recognition. International Journal of Computer Techniques, 13(2). ISSN: 2394-2231.

© 2026 International Journal of Computer Techniques (IJCT). All rights reserved.

Submit Your Paper