CUED Publications database

Expressive visual text to speech and expression adaptation using deep neural networks

Parker, J and Maia, R and Stylianou, Y and Cipolla, R (2017) Expressive visual text to speech and expression adaptation using deep neural networks. In: UNSPECIFIED pp. 4920-4924..

Full text not available from this repository.

Abstract

© 2017 IEEE. In this paper, we present an expressive visual text to speech system (VTTS) based on a deep neural network (DNN). Given an input text sentence and a set of expression tags, the VTTS is able to produce not only the audio speech, but also the accompanying facial movements. The expressions can either be one of the expressions in the training corpus or a blend of expressions from the training corpus. Furthermore, we present a method of adapting a previously trained DNN to include a new expression using a small amount of training data. Experiments show that the proposed DNN-based VTTS is preferred by 57.9% over the baseline hidden Markov model based VTTS which uses cluster adaptive training.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Subjects: UNSPECIFIED
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 03 Aug 2017 02:37
Last Modified: 08 Aug 2017 01:52
DOI: