CUED Publications database

Cross-domain paraphrasing for improving language modelling using out-of-domain data

Liu, X and Gales, MJF and Woodland, PC (2013) Cross-domain paraphrasing for improving language modelling using out-of-domain data. In: UNSPECIFIED pp. 3424-3428..

Full text not available from this repository.


In natural languages the variability in the underlying linguistic generation rules significantly alters the observed surface word sequence they create, and thus introduces a mismatch against other data generated via alternative realizations associated with, for example, a different domain. Hence, direct modelling of out-of-domain data can result in poor generalization to the indomain data of interest. To handle this problem, this paper investigated using cross-domain paraphrastic language models to improve in-domain language modelling (LM) using out-ofdomain data. Phrase level paraphrase models learnt from each domain were used to generate paraphrase variants for the data of other domains. These were used to both improve the context coverage of in-domain data, and reduce the domain mismatch of the out-of-domain data. Significant error rate reduction of 0.6% absolute was obtained on a state-of-the-art conversational telephone speech recognition task using a cross-domain paraphrastic multi-level LM trained on a billion words of mixed conversational and broadcast news data. Consistent improvements on the in-domain data context coverage were also obtained. Copyright © 2013 ISCA.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 17 Jul 2017 19:01
Last Modified: 22 May 2018 06:59