CUED Publications database

Improving speech recognition and keyword search for low resource languages using web data

Mendels, G and Cooper, E and Soto, V and Hirschberg, J and Gales, M and Knill, K and Ragni, A and Wang, H (2015) Improving speech recognition and keyword search for low resource languages using web data. In: UNSPECIFIED pp. 829-833..

Full text not available from this repository.

Abstract

Copyright © 2015 ISCA. We describe the use of text data scraped from the web to augment language models for Automatic Speech Recognition and Keyword Search for Low Resource Languages. We scrape text from multiple genres including blogs, online news, translated TED talks, and subtitles. Using linearly interpolated language models, we find that blogs and movie subtitles are more relevant for language modeling of conversational telephone speech and obtain large reductions in out-of-vocabulary keywords. Furthermore, we show that the web data can improve Term Error Rate Performance by 3.8% absolute and Maximum Term-Weighted Value in Keyword Search by 0.0076-0.1059 absolute points. Much of the gain comes from the reduction of out-of-vocabulary items.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Subjects: UNSPECIFIED
Divisions: Div F > Machine Intelligence
Depositing User: Cron Job
Date Deposited: 17 Jul 2017 19:42
Last Modified: 26 Oct 2017 01:48
DOI: