CUED Publications database

Improved DNN-based segmentation for multi-genre broadcast audio

Wang, L and Zhang, C and Woodland, PC and Gales, MJF and Karanasou, P and Lanchantin, P and Liu, X and Qian, Y (2016) Improved DNN-based segmentation for multi-genre broadcast audio. In: UNSPECIFIED pp. 5700-5704..

Full text not available from this repository.


© 2016 IEEE. Automatic segmentation is a crucial initial processing step for processing multi-genre broadcast (MGB) audio. It is very challenging since the data exhibits a wide range of both speech types and background conditions with many types of non-speech audio. This paper describes a segmentation system for multi-genre broadcast audio with deep neural network (DNN) based speech/non-speech detection. A further stage of change-point detection and clustering is used to obtain homogeneous segments. Suitable DNN inputs, context window sizes and architectures are studied with a series of experiments using a large corpus of MGB television audio. For MGB transcription, the improved segmenter yields roughly half the increase in word error rate, over manual segmentation, compared to the baseline DNN segmenter supplied for the 2015 ASRU MGB challenge.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 17 Jul 2017 19:00
Last Modified: 22 May 2018 06:59