Jurčíček, F and Keizer, S and Gašić, M and Mairesse, F and Thomson, B and Yu, K and Young, S (2011) Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. pp. 3061-3064. ISSN 1990-9772Full text not available from this repository.
This paper describes a framework for evaluation of spoken dialogue systems. Typically, evaluation of dialogue systems is performed in a controlled test environment with carefully selected and instructed users. However, this approach is very demanding. An alternative is to recruit a large group of users who evaluate the dialogue systems in a remote setting under virtually no supervision. Crowdsourcing technology, for example Amazon Mechanical Turk (AMT), provides an efficient way of recruiting subjects. This paper describes an evaluation framework for spoken dialogue systems using AMT users and compares the obtained results with a recent trial in which the systems were tested by locally recruited users. The results suggest that the use of crowdsourcing technology is feasible and it can provide reliable results. Copyright © 2011 ISCA.
|Uncontrolled Keywords:||Crowdsourcing Evaluation Spoken dialogue systems|
|Divisions:||Div F > Machine Intelligence|
|Depositing User:||Cron Job|
|Date Deposited:||07 Mar 2014 12:12|
|Last Modified:||16 Dec 2014 19:06|