Jourlin, P and Johnson, SE and Spärck Jones, K and Woodland, PC (2000) Spoken document representations for probabilistic retrieval. Speech Communication, 32. pp. 21-36. ISSN 0167-6393Full text not available from this repository.
This paper presents some developments in query expansion and document representation of our spoken document retrieval system and shows how various retrieval techniques affect performance for different sets of transcriptions derived from a common speech source. Modifications of the document representation are used, which combine several techniques for query expansion, knowledge-based on one hand and statistics-based on the other. Taken together, these techniques can improve Average Precision by over 19% relative to a system similar to that which we presented at TREC-7. These new experiments have also confirmed that the degradation of Average Precision due to a word error rate (WER) of 25% is quite small (3.7% relative) and can be reduced to almost zero (0.2% relative). The overall improvement of the retrieval system can also be observed for seven different sets of transcriptions from different recognition engines with a WER ranging from 24.8% to 61.5%. We hope to repeat these experiments when larger document collections become available, in order to evaluate the scalability of these techniques.
|Divisions:||Div F > Machine Intelligence|
|Depositing User:||Cron Job|
|Date Deposited:||02 Sep 2016 17:11|
|Last Modified:||27 Oct 2016 03:24|