International Journal of Computer Techniques β Volume 12 Issue 3, May – June 2025
Authors
Md. Amaan Khan – Dept. of IT, NIET, Greater Noida, India.
Anurag Verma – Dept. of IT, NIET, Greater Noida, India.
Harshit Singh – Dept. of IT, NIET, Greater Noida, India.
Shekhar Yadav – Dept. of IT, NIET, Greater Noida, India.
Abstract
Speech Emotion Recognition (SER) technology enables machines to **detect and classify human emotions from voice data**. This study evaluates various deep learning models, including **CNN, LSTM, ANN, and a hybrid CNN-LSTM model**, to analyze speech-based emotional cues. The **CNN-LSTM hybrid model achieved a 98% classification accuracy**, effectively capturing **short- and long-term dependencies in voice data** using **MFCC features extracted via the Librosa toolkit**.
The **CNN-LSTM-based Speech Emotion Recognition model** significantly improves **real-time speech emotion classification**. Its high accuracy of **98%** highlights its potential for applications such as **online education, telemedicine, human-computer interaction, and customer service automation**. Future enhancements may include **multilingual speech emotion recognition and real-time deployment optimization**.
References
Shaik Abdul Khalandar Basha et al. (2024). “Exploring Deep Learning Methods for Speech Emotion Detection: An Ensemble MFCCs, CNNs, and LSTM.”
Qianhe Ouyang (2024). “Speech Emotion Detection Based on MFCC and CNN-LSTM Architecture.”
Chowdhury, J. H., Ramanna, S., & Kotecha, K. (2020). “Speech emotion recognition with lightweight deep neural ensemble model using hand-crafted features.”
Post Comment