Gašić, M and Jurčiček, F and Thomson, B and Yu, K and Young, S (2011) On-line policy optimisation of spoken dialogue systems via live interaction with human subjects. 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings. pp. 312-317.Full text not available from this repository.
Statistical dialogue models have required a large number of dialogues to optimise the dialogue policy, relying on the use of a simulated user. This results in a mismatch between training and live conditions, and significant development costs for the simulator thereby mitigating many of the claimed benefits of such models. Recent work on Gaussian process reinforcement learning, has shown that learning can be substantially accelerated. This paper reports on an experiment to learn a policy for a real-world task directly from human interaction using rewards provided by users. It shows that a usable policy can be learnt in just a few hundred dialogues without needing a user simulator and, using a learning strategy that reduces the risk of taking bad actions. The paper also investigates adaptation behaviour when the system continues learning for several thousand dialogues and highlights the need for robustness to noisy rewards. © 2011 IEEE.
|Divisions:||Div F > Machine Intelligence|
|Depositing User:||Unnamed user with email email@example.com|
|Date Deposited:||15 Dec 2015 13:26|
|Last Modified:||29 Apr 2016 23:06|