CUED Publications database

Statistical parametric speech synthesis based on speaker and language factorization

Zen, H and Braunschweiler, N and Buchholz, S and Gales, MJF and Knill, K and Krstulović, S and Latorre, J (2012) Statistical parametric speech synthesis based on speaker and language factorization. IEEE Transactions on Audio, Speech and Language Processing, 20. pp. 1713-1724. ISSN 1558-7916

Full text not available from this repository.

Abstract

An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language. © 2012 IEEE.

Item Type: Article
Uncontrolled Keywords: Hidden Markov models (HMMs) Speaker and language factorization Statistical parametric speech synthesis
Subjects: UNSPECIFIED
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 07 Mar 2014 11:21
Last Modified: 12 Dec 2014 19:04
DOI: 10.1109/TASL.2012.2187195