CUED Publications database

On-line policy optimisation of Bayesian spoken dialogue systems via human interaction

Gasic, M and Breslin, C and Henderson, M and Kim, D and Szummer, M and Thomson, B and Tsiakoulis, P and Young, S (2013) On-line policy optimisation of Bayesian spoken dialogue systems via human interaction. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. pp. 8367-8371. ISSN 1520-6149

Full text not available from this repository.


A partially observable Markov decision process has been proposed as a dialogue model that enables robustness to speech recognition errors and automatic policy optimisation using reinforcement learning (RL). However, conventional RL algorithms require a very large number of dialogues, necessitating a user simulator. Recently, Gaussian processes have been shown to substantially speed up the optimisation, making it possible to learn directly from interaction with human users. However, early studies have been limited to very low dimensional spaces and the learning has exhibited convergence problems. Here we investigate learning from human interaction using the Bayesian Update of Dialogue State system. This dynamic Bayesian network based system has an optimisation space covering more than one hundred features, allowing a wide range of behaviours to be learned. Using an improved policy model and a more robust reward function, we show that stable learning can be achieved that significantly outperforms a simulator trained policy. © 2013 IEEE.

Item Type: Article
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 17 Jul 2017 19:12
Last Modified: 22 May 2018 07:18