Skip to Main Content
Skip Nav Destination
ASME Press Select Proceedings
Intelligent Engineering Systems through Artificial Neural Networks, Volume 20
Cihan H. Dagli
Cihan H. Dagli
Search for other works by this author on:
No. of Pages:
ASME Press
Publication date:

The classification of emotion, such as joy, anger, interest, etc. from tonal variations in human speech is an important task for research and applications in human computer interaction. Current approaches extract several standard global values from a temporal sequence of power spectra, such as pitch, formants, energy, and attack and decay rates. The values from each spectrum are combined into a single feature vector, and the complete set of feature vectors is then used for classifying the emotion of a test utterance. In this new approach, the frequency dimension of the spectrogram is quantized to simulate the Bark filters in the human audition system. The linear regression coefficients of the surface of each spectrogram segment are combined into a feature vector. In this way, a large sample of feature vectors is extracted from the training samples for each category of emotion, providing a robust basis for a statistical classifier. Several classifiers are tried to achieve the best performance on test utterances, using a 10-fold cross-validation to assure statistical significance. Preliminary results on three databases show significant improvements over results from other methods.

The Short Time Fourier Transform
The Feature Extraction Method
The Experiment Design
Conclusions and Future Work
This content is only available via PDF.
You do not currently have access to this chapter.
Close Modal

or Create an Account

Close Modal
Close Modal