- Title
- Enhanced embeddings in zero-shot learning for environmental audio
- Creator
- Sims, Ysobel; Mendes, Alexandre; Chalup, Stephan
- Relation
- ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Proceedings of 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (Rhodes Island, Greece 04-10 June, 2023)
- Publisher Link
- http://dx.doi.org/10.1109/ICASSP49357.2023.10096134
- Publisher
- Institute of Electrical and Electronics Engineers (IEEE)
- Resource Type
- conference paper
- Date
- 2023
- Description
- Zero-shot learning is a scenario in machine learning where the classes used in the training and test sets are disjoint. This work considers zero-shot learning for environmental audio and improves results by enhancing audio and word embeddings. Previous works use the VGGish model for audio embeddings, and textual class labels are often used as input for word embedding networks such as Word2Vec. This study instead uses a modified YAMNet network to obtain semantic audio embeddings for zero-shot learning. Moreover, part of this study involves adding linguistic devices, such as synonyms, semantic broadening and onomatopoeia, to the input of the word embeddings. With these two modifications, top-1 accuracy is increased on average by over five percentage points compared to the state-of-the-art on ESC-50. This emerging area of research has applications in robot awareness, security systems and wildlife conservation in situations where no data is available for some classes.
- Subject
- environmental audio; zero-shot learning; machine learning; word embeddings; audio embeddings
- Identifier
- http://hdl.handle.net/1959.13/1501165
- Identifier
- uon:55089
- Identifier
- ISBN:9781728163277
- Language
- eng
- Reviewed
- Hits: 231
- Visitors: 225
- Downloads: 0
Thumbnail | File | Description | Size | Format |
---|