CUED Publications database

Automatic transcription of conversational telephone speech

Hain, T and Woodland, PC and Evermann, G and Gales, MJF and Liu, X and Moore, GL and Povey, D and Wang, L (2005) Automatic transcription of conversational telephone speech. IEEE Transactions on Speech and Audio Processing, 13. pp. 1173-1185. ISSN 1063-6676

Full text not available from this repository.

Abstract

This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.

Item Type: Article
Uncontrolled Keywords: Large-vocabulary conversational speech recognition Telephone speech recognition
Subjects: UNSPECIFIED
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 07 Mar 2014 11:43
Last Modified: 08 Dec 2014 02:38
DOI: 10.1109/TSA.2005.852999