Speech Recognition

Research directions in automatic speech recognition (ASR):

1) Large vocabulary continuous speech recognition (LVCSR). Super-large vocabularies (over 500,000 words), large language models (hundreds of millions of N-gram models) for real-time, non-cloud applications. WFST-decoders, online-composition graphs, class language models.

2) Training of acoustic models, new topologies of models. TANDEM, TRAP, Deep belief networks, discriminative training, speaker / condition adaptive training.

3) Keyword Search. Currently focused on full-text indexing of speech material followed by word-lattice based spoken-document indexing. Dictionary consists of hundreds of thousands of words.

Speech data/materials used in this research:

1) Hundreds of hours of telephone records (8 kHz, 16 bit) (speech of call center operators). Database is constantly updated and includes manually checked text.

2) Over 5,000 hours of speech recordings from radio/TV (16 kHz, 16 bit) (1TV, Radio Liberty, Russia TV, etc.). Includes text transcripts, some segmented for different speakers.

3) 1.5 million word database of test materials for training language models (from Internet news, digital libraries).