End-to-end speech emotion recognition based on time and frequency information using deep neural networks

Bakhshi, Ali; Wong, Aaron S. W.; Chalup, Stephan

Title: End-to-end speech emotion recognition based on time and frequency information using deep neural networks
Creator: Bakhshi, Ali; Wong, Aaron S. W.; Chalup, Stephan
Relation: ECAI 2020 24th European Conference on Artificial Intelligence. Proceedings of the ECAI 2020 24th European Conference on Artificial Intelligence (Santiago de Compostela, Spain 29 August, 2020 - 08 September, 2020) p. 969-975
Relation: ARC.LE170100032 http://purl.org/au-research/grants/arc/LE170100032
Publisher Link: http://dx.doi.org/10.3233/FAIA200190
Publisher: IOS Press
Resource Type: conference paper
Date: 2020
Description: We propose a speech emotion recognition system based on deep neural networks, operating on raw speech data in an end-to-end manner to predict continuous emotions in arousal-valence space. The model is trained using time and frequency information of speech recordings of the publicly available part of the multi-modal RECOLA database. We use the Concordance Correlation Coefficient (CCC) as it was proposed by the Audio-Visual Emotion Challenges to measure the similarity between the network prediction and gold-standard. The CCC prediction results of our model outperform the results achieved by other state-of-the-art end-to-end models. The innovative aspect of our study is an end-to-end approach to using data that previously was mostly used by approaches involving combinations of pre-processing or post-processing. Our study used only a small subset of the RECOLA dataset and obtained better results than previous studies that used the full dataset.
Subject: deep neural networks; end-to-end manner; speech recordings; pre-processing; post-processing
Identifier: http://hdl.handle.net/1959.13/1471336
Identifier: uon:48653
Identifier: ISBN:9781643681009
Language: eng
Reviewed

Hits: 606
Visitors: 598
Downloads: 0

		Thumbnail	File	Description	Size	Format