Spectral enhancement
All techniques processing in the SNR domain have a decisive impact on the properties of the spectral gain function applied to denoise the noisy speech and consequently on the quality of the denoising procedure. Thus, a finding of an optimal spectral gain function with better trade-off between high noise suppression and low speech distortion is still a currect topic of ongoing research.
![](/fileadmin-hni/user_upload/csm_SpectralDenoising_System_0b75dd1496.jpg)
Three modules of spectral enhancement
Starting with the seminal paper Boll (1979) introducing the spectral subtraction algorithm for noise suppression of short-time spectral amplitudes of noisy speech signal much research has been devoted to find an optimal gain function. For this, a trade-off between high noise suppression and low speech distortion available in a denoised speech signal has to be solved and a so called musical noise has to be avoided. Thus the minimum mean squared error (MMSE) of log-spectral amplitude (LSA) estimator was shown to successfully reduce the musical noise phenomenon. However a closer look at the shapes of the MMSE-LSA gain curves revealed that the price to pay for the good quality of the enhanced speech signals was a weaker noise suppression in regions with low speech energy. Further it was proposed to carry out the enhancement in domains other than the magnitude or power spectral domain. The MMSE-based generalized spectral subtraction (GSS) gain functions proposed by Sim (1998) were derived, e.g., in the domain of the spectral amplitudes raised to a generalized power exponent, whose value 1 and 2 correspond to the magnitude and the power spectral domain, respectively. Investigations have shown that the MMSE-GSS constrained parametric estimator results in a respectable ability to suppress noise however on costs of speech quality.
Thus, a finding of an optimal spectral gain function with better trade-off between high noise suppression and low speech distortion is still a currect topic of ongoing research.
![](/fileadmin-hni/user_upload/csm_3_GainFunctions_GSS_LGSA_9c6b6f21b5.png)
While the a posteriori SNR calculated from the noise PSD estimates is considered as a correction parameter of the gain function, the a priori SNR has been advised to be used as its dominant parameter. A priori SNR is usually calculated from the a posteriori SNR by using the well-known decision directed (DD) approach as a weighted sum of two terms. The first is the a priori SNR estimate calculated from the spectral magnitude of the enhanced speech signal of the previous frame, and the second is the maximum likelihood (ML) estimate of the a priori SNR based on the current a posteriori SNR estimate. Thus, the a priori SNR estimation exploits information of both the noise PSD tracker and the used gain function, and it can be considered a central component of a spectral enhancement system. However, the DD approach suffers from one well known drawback - slow response to an abrupt change in the instantaneous SNR known also as the reverberation effect. To overcome this shortcoming novel approaches are developed in our department.
![](/fileadmin-hni/user_upload/csm_1_AprioriSNR_WMMoptPar_2ba4214c9f.png)
The estimation of the SPP for each individual time-frequency slot is a important part of many speech processing systems. Thus, the widespread speech enhancement approaches based on estimation of the short-time spectral amplitude of the clean speech signal crucially depend on an SPP estimator. However, a reliable SPP estimator is difficult to obtain in a noisy scenario. It is well known that speech signals have characteristic temporal and spectral correlations in the time-frequency domain. Usually, this fact is exploited by smoothing the estimated characteristics, such as the SPP estimations themselves, the a priori SNR, or even the gain factor of individual time-frequency slot across time, frequency, or both. However rather than smoothing the estimates with heuristically chosen filter parameters in a postprocessing step, the correlations can be directly employed in the estimation of the SPP by applying a statistical inference on the a posteriori SNR estimates averaged over a certain adjacent time-frequency slots. Developing of SPP estimators, which are able to take into account spectral correlations in the neighbouring time-frequency slots, has a high priority in our group.
![](/fileadmin-hni/user_upload/csm_2_SPPestimator_NoisySpec_IdealSPP_e0bca99884.png)