In this paper we investigate the use of formant and anti formant measurements of nasal consonants for speaker verification. The features are obtained using a pole-zero vocal tract model estimate optimized by minimizing a logarithmic criterion which is motivated by the perception of amplitude by the human auditory system. A GMM-UBM approach is used for performing speaker comparisons within the likelihood-ratio framework. Results are compared with systems based on Mel Frequency Cepstral Coefficients (MFCCs) as well as formant center frequencies and bandwidths obtained using the Snack Toolkit. The formant and anti-formant based system attains comparable results to the MFCC system and outperforms the formant-based approach while offering a more straight for ward interpretation in terms of a physical speech production model.
2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2011). Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2011) (Prague 22 - 27 May, 2011) p. 4820-4823