Alexey A. Karpov

Doctor of Science

Department of Speech Information Systems,
National Research University of Information Technologies, Mechanics and Optics (ITMO University)

Leading Researcher
Speech and Multimodal Interfaces Lab.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS)


  • Dr.Sc.Tech.: 10.2013 (SPIIRAS, Computer Science), thesis “Audio-visual speech interfaces in assistive information technologies”
  • Cand.Sc.Tech.(Ph.D.): 03.2007 (SPIIRAS, Computer Science), thesis “Models and software realization for Russian speech recognition based on morphemic analysis”
  • Engineer(M.Sc.): 02.2002 (SPbSUAI, Computer Science), graduation work “Development of the client/server architecture for object-oriented database management system”

Work experience

  • 2014 – Professor, Department of Speech Information Systems, University ITMO
  • 2014 – Leading Researcher, SPIIRAS, St. Petersburg, Russia
  • 2008-14 Senior Researcher, Speech and Multimodal Interfaces Lab., SPIIRAS
  • 2011-12 Senior Researcher, Applied Phonetics Lab., SPIIRAS
  • 2003-06 PhD student, SPIIRAS,
  • 2002-07 Junior Researcher, Speech Informatics Group, SPIIRAS


  • More than 200 refereed journal papers, refereed conference papers, patents and certificates

Organizational duties

  • Organizing & Programme Committee Member of the series of International Conferences on Speech and Computer SPECOM since 2002
  • General Chairman of the 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU-2014, St. Petersburg, ITMO)
  • Technical/Scientific Committee Member of INTERSPEECH, ICPR, ISCSLP, SPECOM, SLTU International conferences
  • Reviewer of International journals IEEE/ACM Transactions on Audio, Speech and Language Processing; Speech Communication; Computer Speech & Language (Elsevier); Language Resources and Evaluation; Journal on Multimodal User Interfaces (Springer); International Journal of Engineering
  • Member of the Scientific Council of SPIIRAS, 2013–


  • ISCA member (06–)
  • IAPR member (10–)
  • IEEE member (12–)
  • EURASIP member (05–)

Supervised PhDs

  • Alexander Ronzhin “Methods and software for automation of audio-visual monitoring of meeting participants in an intelligent room” (SPIIRAS, St. Petersburg, 2013)
  • Irina Kipyatkova “Methods and software for phonetic and language modeling in automatic Russian speech systems” (SPIIRAS, St. Petersburg, 2011)

Current PhD students

  • Anara Zhambaeva (Bilingual speech recognition for Cyrillic-based languages: Kazakh and Russian; ITMO University – Kazakhstan)

Current MsD students

  • Dmitry Ryumin (Automated recognition of gestures using Kinect; ITMO University – Kazakhstan)

Major research projects

  • Development of software for a multimodal assistive technology for helping people with disabilities (grant of the President of Russia, 15-16, SPIIRAS)
  • Models and methods of audio-visual signal processing for bimodal Russian speech recognition (RFBR project, 15-17, SPIIRAS)
  • Mathematical methods and software for automatic analysis and recognition of conversational Russian speech and speaker diarization (state contract, 12-13, SPIIRAS)
  • Development of software for assistive multimodal living environment (state contract, 11-13, SPIIRAS)
  • Development of methods and models of automatic processing of speech signals in intelligent information-communication systems (state contract, 11-13, SPIIRAS)


Last publications

  • Kipyatkova I.S., Ronzhin A.L., Karpov A.A. Automatic Processing of Conversational Russian Speech. – SPb.: SUAI, 2013. – 314 p. (In Rus.)

Journal papers:

  • Karpov A. An Automatic Multimodal Speech Recognition System with Audio and Video Information // Automation and Remote Control. 2014, Vol. 75, № 12, pp. 2190-2200.
  • Karpov A., Markov K., Kipyatkova I., Vazhenina D., Ronzhin A. Large vocabulary Russian speech recognition using syntactico-statistical language modeling // Speech Communication. 2014, Vol. 56, pp. 213-228.
  • Besacier L., Barnard E., Karpov A., Schultz T. Automatic speech recognition for under-resourced languages: A survey // Speech Communication. 2014, Vol. 56, pp. 85-100.
  • Kipyatkova I., Karpov A., Verkhodanova V., Zelezny M. Modeling of Pronunciation, Language and Nonverbal Units at Conversational Russian Speech Recognition // International Journal of Computer Science and Applications. 2013, Vol. 10, № 1, pp. 11-30.
  • Karpov A.A., Zelezny M. Bilingual multimodal system for text-to-audiovisual speech and sign language synthesis // Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2014, № 5, pp. 92-98 (In Rus.).
  • Karpov A.A. Assistive information technologies based on audio-visual speech interfaces // SPIIRAS Proceedings, 2013, Issue 27, pp. 114 128 (In Rus.).

Refereed conference papers:

  • Karpov A., Akarun L., Yalçın H., Ronzhin Al., Demiröz B., Çoban A., Zelezny M. Audio-Visual Signal Processing in a Multimodal Assisted Living Environment. Proc. 15th International Conference INTERSPEECH-2014, Singapore, 2014, pp. 1023-1027.
  • Karpov A., Ronzhin A. A Universal Assistive Technology with Multimodal Input and Multimedia Output Interfaces. Proc. 16th International Conference on Human-Computer Interaction, Heraklion, Greece, Springer LNCS 8513, 2014, pp. 369-378.
  • Karpov A., Kipyatkova I., Zelezný M. A Framework for Recording Audio-Visual Speech Corpora with a Microphone and a High-Speed Camera. Proc. 16th International Conference on Speech and Computer SPECOM-2014, Novi Sad, Serbia, Springer LNAI 8773, 2014, pp. 50–57.
  • Kipyatkova I., Karpov A. Study of Morphological Factors of Factored Language Models for Russian ASR. Proc. 16th International Conference on Speech and Computer SPECOM-2014, Novi Sad, Serbia, Springer LNAI 8773, 2014, pp. 451–458.
  • Kipyatkova I., Verkhodanova V., Karpov A. Rescoring N-Best Lists for Russian Speech Recognition using Factored Language Models. In Proc. 4th International Workshop on Spoken Language Technologies for Under-resourced Languages SLTU-2014, St. Petersburg, Russia, 2014, pp. 81-86.
  • Karpov A., Krnoul Z., Zelezny M., Ronzhin A. Multimodal Synthesizer for Russian and Czech Sign Languages and Audio-Visual Speech. Proc. 15th International Conference on Human-Computer Interaction, Las Vegas, Nevada, USA, Springer LNCS 8009, 2013, pp. 520-529.