Liu, X and Gales, MJF and Hieronymus, JL and Woodland, PC (2011) Investigation of acoustic units for LVCSR systems. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. pp. 4872-4875. ISSN 1520-6149Full text not available from this repository.
One important issue in designing state-of-the-art LVCSR systems is the choice of acoustic units. Context dependent (CD) phones remain the dominant form of acoustic units. They can capture the co-articulatory effect in speech via explicit modelling. However, for other more complicated phonological processes, they rely on the implicit modelling ability of the underlying statistical models. Alternatively, it is possible to construct acoustic models based on higher level linguistic units, for example, syllables, to explicitly capture these complex patterns. When sufficient training data is available, this approach may show an advantage over implicit acoustic modelling. In this paper a wide range of acoustic units are investigated to improve LVCSR system performance. Significant error rate gains up to 7.1% relative (0.8% abs.) were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using word and syllable position dependent triphone and quinphone models. © 2011 IEEE.
|Divisions:||Div F > Machine Intelligence|
|Depositing User:||Unnamed user with email email@example.com|
|Date Deposited:||09 Dec 2016 17:20|
|Last Modified:||01 May 2017 00:42|