Training Utterance-level Embedding Networks for Speaker Identification and Verification

Interspeech (2018)


Encoding speaker-specific characteristics from speech signals into fixed length vectors is a key component of speaker identification and verification systems. This paper presents a deep neural network architecture for speaker embedding models where similarity in embedded utterance vectors explicitly approximates the similarity in vocal patterns of speakers. The proposed architecture contains an additional speaker embedding lookup table to compute loss based on embedding similarities. Furthermore, we propose a new feature sampling method for data augmentation. Experimentation based on two databases demonstrates that our model is more effective at speaker identification and verification when compared to a fully connected classifier and an end-to-end verification model.


박희웅 (서울대학교), 조석현 (서울대학교), 박규병 (카카오브레인), 김남주 (카카오브레인), 박종헌 (서울대학교)



발행 날짜