CUED Publications database

On-line policy optimisation of spoken dialogue systems via live interaction with human subjects

Gašić, M and Jurčiček, F and Thomson, B and Yu, K and Young, S (2011) On-line policy optimisation of spoken dialogue systems via live interaction with human subjects. 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings. pp. 312-317.

Full text not available from this repository.

Abstract

Statistical dialogue models have required a large number of dialogues to optimise the dialogue policy, relying on the use of a simulated user. This results in a mismatch between training and live conditions, and significant development costs for the simulator thereby mitigating many of the claimed benefits of such models. Recent work on Gaussian process reinforcement learning, has shown that learning can be substantially accelerated. This paper reports on an experiment to learn a policy for a real-world task directly from human interaction using rewards provided by users. It shows that a usable policy can be learnt in just a few hundred dialogues without needing a user simulator and, using a learning strategy that reduces the risk of taking bad actions. The paper also investigates adaptation behaviour when the system continues learning for several thousand dialogues and highlights the need for robustness to noisy rewards. © 2011 IEEE.

Item Type: Article
Subjects: UNSPECIFIED
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 07 Mar 2014 12:12
Last Modified: 16 Dec 2014 19:06
DOI: 10.1109/ASRU.2011.6163950