CUED Publications database

A Log Domain Pulse Model for Parametric Speech Synthesis

Degottex, G and Lanchantin, P and Gales, M (2017) A Log Domain Pulse Model for Parametric Speech Synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26. pp. 57-70. ISSN 2329-9290

Full text not available from this repository.

Abstract

Most of the degradation in current Statistical Parametric Speech Synthesis (SPSS) results from the form of the vocoder. One of the main causes of degradation is the reconstruction of the noise. In this article, a new signal model is proposed that leads to a simple synthesizer, without the need for ad-hoc tuning of model parameters. The model is not based on the traditional additive linear source-filter model, it adopts a combination of speech components that are additive in the log domain. Also, the same representation for voiced and unvoiced segments is used, rather than relying on binary voicing decisions. This avoids voicing error discontinuities that can occur in many current vocoders. A simple binary mask is used to denote the presence of noise in the time-frequency domain, which is less sensitive to classification errors. Four experiments have been carried out to evaluate this new model. The first experiment examines the noise reconstruction issue. Three listening tests have also been carried out that demonstrate the advantages of this model: comparison with the STRAIGHT vocoder; the direct prediction of the binary noise mask by using a mixed output configuration; and partial improvements of creakiness using a mask correction mechanism.

Item Type: Article
Uncontrolled Keywords: speech speech processing speech synthesis text-to-speech parametric speech synthesis acoustic model voice pulse model
Subjects: UNSPECIFIED
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 19 Jan 2018 20:13
Last Modified: 10 Apr 2021 01:04
DOI: 10.1109/TASLP.2017.2761546