CUED Publications database

Source sentence simplification for statistical machine translation

Hasler, EC and de Gispert, A and Stahlberg, F and Waite, A and Byrne, WJ (2017) Source sentence simplification for statistical machine translation. Computer Speech & Language, 45. pp. 221-235. ISSN 0885-2308 (Unpublished)

Full text not available from this repository.


Long sentences with complex syntax and long-distance dependencies pose difficulties for machine translation systems. Short sentences, on the other hand, are usually easier to translate. We study the potential of addressing this mismatch using text simplifi- cation: given a simplified version of the full input sentence, can we use it in addition to the full input to improve translation? We show that the spaces of original and simplified translations can be effectively combined using translation lattices and compare two decoding approaches to process both inputs at different levels of integration. We demonstrate on source-annotated portions of WMT test sets and on top of strong baseline systems combining hierarchical and neural translation for two language pairs that source simplification can help to improve translation quality.

Item Type: Article
Uncontrolled Keywords: hierarchical machine translation text simplification neural machine translation
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 17 Jul 2017 19:04
Last Modified: 15 Apr 2021 06:31
DOI: 10.1016/j.csl.2016.12.001