- Title
- End-to-end speech emotion recognition based on time and frequency information using deep neural networks
- Creator
- Bakhshi, Ali; Wong, Aaron S. W.; Chalup, Stephan
- Relation
- ECAI 2020 24th European Conference on Artificial Intelligence. Proceedings of the ECAI 2020 24th European Conference on Artificial Intelligence (Santiago de Compostela, Spain 29 August, 2020 - 08 September, 2020) p. 969-975
- Relation
- ARC.LE170100032 http://purl.org/au-research/grants/arc/LE170100032
- Publisher Link
- http://dx.doi.org/10.3233/FAIA200190
- Publisher
- IOS Press
- Resource Type
- conference paper
- Date
- 2020
- Description
- We propose a speech emotion recognition system based on deep neural networks, operating on raw speech data in an end-to-end manner to predict continuous emotions in arousal-valence space. The model is trained using time and frequency information of speech recordings of the publicly available part of the multi-modal RECOLA database. We use the Concordance Correlation Coefficient (CCC) as it was proposed by the Audio-Visual Emotion Challenges to measure the similarity between the network prediction and gold-standard. The CCC prediction results of our model outperform the results achieved by other state-of-the-art end-to-end models. The innovative aspect of our study is an end-to-end approach to using data that previously was mostly used by approaches involving combinations of pre-processing or post-processing. Our study used only a small subset of the RECOLA dataset and obtained better results than previous studies that used the full dataset.
- Subject
- deep neural networks; end-to-end manner; speech recordings; pre-processing; post-processing
- Identifier
- http://hdl.handle.net/1959.13/1471336
- Identifier
- uon:48653
- Identifier
- ISBN:9781643681009
- Language
- eng
- Reviewed
- Hits: 606
- Visitors: 598
- Downloads: 0
Thumbnail | File | Description | Size | Format |
---|