Speaker Recognition

Research directions in automatic speaker recognition (verification, identification):

1) Text-independent speaker recognition. Small, medium and large scale voice biometric systems.

National-wide and international voice biometric systems. A biometric engine with high performance and throughput. Decision making based on additional metadata (signal SNR, reverberation time, speech length, channel type, etc.). Handles thousands to millions of speaker templates. Works with various channels, including telephone (analog, digital, landline, cell, GSM, CDMA etc.), radio, microphone (room acoustics).

2) Text-dependent speaker recognition. Voice-keys with fixed and prompted phrases. Works with various channels, including telephone (analog, digital, landline, cell, GSM, CDMA etc.) and microphone (room acoustics).

3) Automatic recognition of speaker language, gender, emotion and age.

4) Multimodal (voice and face) recognition. Feature level, score level and decision level fusion.

Search with high performance and throughput.

5) Audio and speech enhancement and restoration. Music detection and characterization, ambient sound detection and characterization.

Speech/data materials used in this research:

1) Telephone records (8 kHz, 16 bit) with thousands of speakers. Database is constantly updated and speaker segmentation has been a manually checked.

2) Microphone records (11 kHz, 16 bit) with thousands of speakers. Database is constantly updated and speaker segmentation has been manually checked.