CUED Publications database

Improving multiple-crowd-sourced transcriptions using a speech recogniser

Van Dalen, RC and Knill, KM and Tsiakoulis, P and Gales, MJF (2015) Improving multiple-crowd-sourced transcriptions using a speech recogniser. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, 2015-4-19 to 2015-4-24, Brisbane, Australia pp. 4709-4713..

Full text not available from this repository.


© 2015 IEEE. This paper introduces a method to produce high-quality transcriptions of speech data from only two crowd-sourced transcriptions. These transcriptions, produced cheaply by people on the Internet, for example through Amazon Mechanical Turk, are often of low quality. Often, multiple crowd-sourced transcriptions are combined to form one transcription of higher quality. However, the state of the art is to use essentially a form of majority voting, which requires at least three transcriptions for each utterance. This paper shows how to refine this approach to work with only two transcriptions. It then introduces a method that uses a speech recogniser (bootstrapped on a simple combination scheme) to combine transcriptions. When only two crowd-sourced transcriptions are available, on a noisy data set this improves the word error rate to gold-standard transcriptions by 21% relative.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 17 Jul 2017 19:41
Last Modified: 22 May 2018 07:18