Machine-Learning Algorithm for Improved Speech Intelligibility in Noise
A monaural machine-learning algorithm for classifying time-frequency units in an unknown signal, which results in marked speech-intelligibly improvements in noisy signals.
A primary complaint of hearing-impaired (HI) listeners is poor speech recognition in background noise. This issue can be quite debilitating and persists despite considerable efforts to improve hearing technology. Despite considerable effort, monaural (single-microphone) algorithms capable of increasing the intelligibility of speech in noise have remained elusive. Successful development of such an algorithm is especially important for hearing-impaired (HI) listeners, given their particular difficulty in noisy backgrounds.
Researchers at The Ohio State University, led by Dr. Eric Healy, have developed an algorithm using time-frequency masking to separate speech from noise in audio signals of various signal-to-noise ratios. The algorithm combines the computational simplicity of an Ideal Binary Mask (IBM) with the sound quality of an Ideal Ratio Mask (IRM) in order to attain intelligibility results equal to or superior to the IRM at computational loads only marginally larger than the IBM.
An IBM is a binary system that assigns a value of 0 or 1 to each time-frequency unit based on its signal-to-noise ratio (SNR). Units with a poor SNR are assigned a 0 and attenuated, resulting in an output signal containing only t-f units dominated by speech. In the IRM, signals are again attenuated based on SNR, but they can be assigned any value between 0 and 1 resulting in a smoother output. The Ideal Quantized Mask (IQM) developed by OSU’s research team utilizes both methods. Instead of IBM’s two attenuation levels, the IQM classifies each t-f unit into any number of discrete categories. While this means that the IQM could theoretically have an infinite number of categories, with just eight attenuation levels the IQM achieves IRM level intelligibility, far higher than the IBM without the need to engage in IRM’s regression calculations. Like the other masks, the IQM can be estimated directly from the speech-plus-noise mixture using a machine-learning algorithm.