Enhanced embeddings in zero-shot learning for environmental audio

Sims, Ysobel; Mendes, Alexandre; Chalup, Stephan

Title: Enhanced embeddings in zero-shot learning for environmental audio
Creator: Sims, Ysobel; Mendes, Alexandre; Chalup, Stephan
Relation: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Proceedings of 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (Rhodes Island, Greece 04-10 June, 2023)
Publisher Link: http://dx.doi.org/10.1109/ICASSP49357.2023.10096134
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Resource Type: conference paper
Date: 2023
Description: Zero-shot learning is a scenario in machine learning where the classes used in the training and test sets are disjoint. This work considers zero-shot learning for environmental audio and improves results by enhancing audio and word embeddings. Previous works use the VGGish model for audio embeddings, and textual class labels are often used as input for word embedding networks such as Word2Vec. This study instead uses a modified YAMNet network to obtain semantic audio embeddings for zero-shot learning. Moreover, part of this study involves adding linguistic devices, such as synonyms, semantic broadening and onomatopoeia, to the input of the word embeddings. With these two modifications, top-1 accuracy is increased on average by over five percentage points compared to the state-of-the-art on ESC-50. This emerging area of research has applications in robot awareness, security systems and wildlife conservation in situations where no data is available for some classes.
Subject: environmental audio; zero-shot learning; machine learning; word embeddings; audio embeddings
Identifier: http://hdl.handle.net/1959.13/1501165
Identifier: uon:55089
Identifier: ISBN:9781728163277
Language: eng
Reviewed

Hits: 231
Visitors: 225
Downloads: 0

		Thumbnail	File	Description	Size	Format