Intelligent Engineering Systems through Artificial Neural Networks, Volume 20
68 Recognition of Emotions from Human Speech
Download citation file:
The classification of emotion, such as joy, anger, interest, etc. from tonal variations in human speech is an important task for research and applications in human computer interaction. Current approaches extract several standard global values from a temporal sequence of power spectra, such as pitch, formants, energy, and attack and decay rates. The values from each spectrum are combined into a single feature vector, and the complete set of feature vectors is then used for classifying the emotion of a test utterance. In this new approach, the frequency dimension of the spectrogram is quantized to simulate the Bark filters in the human audition system. The linear regression coefficients of the surface of each spectrogram segment are combined into a feature vector. In this way, a large sample of feature vectors is extracted from the training samples for each category of emotion, providing a robust basis for a statistical classifier. Several classifiers are tried to achieve the best performance on test utterances, using a 10-fold cross-validation to assure statistical significance. Preliminary results on three databases show significant improvements over results from other methods.