CUED Publications database

Noise-robust TTS speaker adaptation with statistics smoothing

Yanagisawa, K and Chen, L and Gales, MJF (2014) Noise-robust TTS speaker adaptation with statistics smoothing. In: UNSPECIFIED pp. 1519-1523..

Full text not available from this repository.


Copyright © 2014 ISCA. In practical scenarios for speaker adaptation of speech synthesis systems, the quality of adaptation audio data may be poor. In these situations, it is necessary to make use of the available audio to capture the speaker attributes, whilst aiming to obtain a synthesis voice which does not have any of the lowquality attributes of the audio. One approach to achieving this is to define a sub-space of parametric synthesis parameters in which the adapted system must lie. Though this yields reasonable synthesis quality, target speaker similarity degrades. Quality is also affected in severe noise conditions. This paper describes a smoothing approach that addresses this problem. For a noisy target speaker, first a 'similar speaker' is selected from a database of speakers. Statistics from this speaker are then smoothed with those obtained from the target speaker. By appropriately combining the two sources of information, it is possible to balance similarity and quality. Results indicate that both the quality and similarity can be improved by smoothing, especially for severe noise conditions. The similarity performance, however, varies from speaker to speaker, indicating the importance of a reasonable automatic speaker selection method and the coverage of the candidate speaker pool.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 17 Jul 2017 19:32
Last Modified: 22 May 2018 07:18