CUED Publications database

Generating multiple-accent pronunciations for TTS using joint sequence model interpolation

Kolluru, BK and Wan, V and Latorre, J and Yanagisawa, K and Gales, MJF (2014) Generating multiple-accent pronunciations for TTS using joint sequence model interpolation. In: UNSPECIFIED pp. 1273-1277..

Full text not available from this repository.


Copyright © 2014 ISCA. Standard grapheme-to-phoneme (G2P) systems are trained using a homogeneous lexicon, for example one associated with a particular accent. In practice, a synthesis system may be required to handle multiple accents. Furthermore, a speaker rarely has a pure accent; accents vary continuously within and between regions of a country. Generating phonetic sequences for each accent is possible, but combining them to yield a single synthesis pronunciation is highly challenging. To address this problem, this paper considers a space of accents. The bases for these spaces are defined by statistical G2P models in the form of graphone models. A linear combination of these models define the accent space. By selecting a point in this continuous space, it is possible to specify the accent for an individual speaker. The performance of this approach is evaluated using an accent space defined by American, Scottish and British English. By moving around the accent space, it is shown that it is possible to synthesize speech from all these accents as well as a range of intermediate points.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 17 Jul 2017 19:32
Last Modified: 22 May 2018 07:18