Rainton, D and Young, SJ (1992) Time-frequency spectral estimation of speech. Computer Speech and Language, 6. pp. 15-36. ISSN 0885-2308Full text not available from this repository.
In recent years there has been a growing interest amongst the speech research community into the use of spectral estimators which circumvent the traditional quasi-stationary assumption and provide greater time-frequency (t-f) resolution than conventional spectral estimators, such as the short time Fourier power spectrum (STFPS). One distribution in particular, the Wigner distribution (WD), has attracted considerable interest. However, experimental studies have indicated that, despite its improved t-f resolution, employing the WD as the front end of speech recognition system actually reduces recognition performance; only by explicitly re-introducing t-f smoothing into the WD are recognition rates improved. In this paper we provide an explanation for these findings. By treating the spectral estimation problem as one of optimization of a bias variance trade off, we show why additional t-f smoothing improves recognition rates, despite reducing the t-f resolution of the spectral estimator. A practical adaptive smoothing algorithm is presented, which attempts to match the degree of smoothing introduced into the WD with the time varying quasi-stationary regions within the speech waveform. The recognition performance of the resulting adaptively smoothed estimator is found to be comparable to that of conventional filterbank estimators, yet the average temporal sampling rate of the resulting spectral vectors is reduced by around a factor of 10.
|Divisions:||Div F > Machine Intelligence|
|Depositing User:||Cron Job|
|Date Deposited:||07 Mar 2014 11:31|
|Last Modified:||18 Dec 2014 11:46|