Emotion Recognition Using Speech Processing
Main Article Content
Abstract
Emotion recognition through speech processing is a pivotal component in human-machine interface applications, with researchers dedicating significant efforts to this field over the years. The ability to identify emotions is crucial to human mental health since it allows one to communicate their viewpoint and emotional circumstances to other people. One technique that does this is called voice Emotion Recognition, or SER for short. It takes voice signals and extracts emotional markers. These cues might represent a range of typical emotions, such as neutral, happy, sorrow, and rage. Despite advancements, challenges persist in accurately discerning and categorizing these complex emotional states. This work takes a comprehensive strategy to solve these issues by combining a number of feature extraction approaches, such as spectral contrast, chromatogram, Mel-scaled spectrogram, features of the tonal centroid, and Mel-frequency cepstral coefficients (MFCC). These features collectively provide a rich representation of the emotional content present in speech signals. Furthermore, to enable robust emotion classification, a Deep Neural Network (DNN) architecture is employed. The use of DNNs underscores the importance of advanced machine learning techniques in handling the intricate nuances of emotion recognition. The goal of this study is to improve emotion identification systems by utilising these approaches, which will ultimately aid in the creation of more responsive and user-friendly human-machine interfaces. This study integrates several factors and investigates state-of-the-art methods to improve the efficacy and accuracy of emotion recognition from speech. This will improve human-machine interaction and communication.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.