CUED Publications database

Multi-task ensembles with teacher-student training

Wong, JHM and Gales, MJF (2017) Multi-task ensembles with teacher-student training. In: UNSPECIFIED pp. 84-90..

Full text not available from this repository.


Ensemble methods often yield significant gains for automatic speech recognition. One method to obtain a diverse ensemble is to separately train models with a range of context dependent targets, often implemented as state clusters. However, decoding the complete ensemble can be computationally expensive. To reduce this cost, the ensemble can be generated using a multi-task architecture. Here, the hidden layers are merged across all members of the ensemble, leaving only separate output layers for each set of targets. Previous investigations of this form of ensemble have used cross-entropy training, which is shown in this paper to produce only limited diversity between members of the ensemble. This paper extends the multi-task framework in several ways. First, the multi-task ensemble can be trained in a teacher-student fashion toward the ensemble of separate models, with the aim of increasing diversity. Second, the multi-task ensemble can be trained with a sequence discriminative criterion. Finally, a student model, with a single output layer, can be trained to emulate the combined ensemble, to further reduce the computational cost of decoding. These methods are evaluated on the Babel conversational telephone speech, AMI meeting transcription, and HUB4 English broadcast news tasks.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Uncontrolled Keywords: Multi-task teacher-student random forest ensemble speech recognition
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 28 Mar 2018 20:09
Last Modified: 10 Apr 2021 22:34
DOI: 10.1109/ASRU.2017.8268920