CUED Publications database

Domain-independent user satisfaction reward estimation for dialogue policy learning

Ultes, S and Budzianowski, P and Casanueva, I and Mrkšić, N and Rojas-Barahona, L and Su, PH and Wen, TH and Gašić, M and Young, S (2017) Domain-independent user satisfaction reward estimation for dialogue policy learning. In: UNSPECIFIED pp. 1721-1725..

Full text not available from this repository.


Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work which is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we propose to use a reward based on user satisfaction. We will show in simulated experiments that a live user satisfaction estimation model may be applied resulting in higher estimated satisfaction whilst achieving similar success rates. Moreover, we will show that one satisfaction estimation model which has been trained on one domain may be applied in many other domains which cover a similar task. We will verify our findings by employing the model to one of the domains for learning a policy from real users and compare its performance to policies using the user satisfaction and task success acquired directly from the users as reward.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 18 Jan 2018 01:41
Last Modified: 10 Apr 2021 22:20
DOI: doi:10.21437/Interspeech.2017-1032