当前位置:首页 > 申请书大全 > 语音信号频谱分析【含噪语音信号频谱增强技术的统计方法研究与展望】
 

语音信号频谱分析【含噪语音信号频谱增强技术的统计方法研究与展望】

发布时间:2019-02-21 04:43:21 影响了:

  摘要:基于单个麦克风的含噪语音信号频谱增强技术,一直受到有关工业和学术界的高度关注,其广泛应用于诸如语音识别、助听系统和免提终端通信等领域中。本文系统地讨论了含噪语音信号频谱增强系统设计的基本模块元素,并对诸如语音信号估计、语音信号出现概率估计、先验信噪比(SNR)估计和噪声功率谱估计等模块元素的统计技术与方法进行了较详细的讨论和描述。文中还讨论了含噪语音信号频谱增强算法的有关选择问题,并展望了其今后可能的研究与发展方向。
  关键词:语音增强 统计模型 信号估计 信号出现概率 先验SNR 噪声功率谱 谱增强算法选择 研究展望
  中图分类号:TN912.3 文献标识码:A 文章编号:1007-9416(2012)01-0128-13
  
  Statistical Methods for Noisy Speech Spectral Enhancement
  
  LEI Guangzhi1 LIANG Min2*
  (1.Panocom Communication Systems,Inc.,Guangzhou 510300,China;
  2.Lantai Eastern Technologies,Inc.,Guangzhou 510300,China)
  
  Abstract:The problem of spectral enhancement of noisy speech signals based on a single microphone has attracted considerable research effort for over 30 years. It is a problem with numerous applications ranging from speech recognition, to hearing aids and hands-free mobile communication. In this paper, the statistical methods are described and discussed for the fundamental components that constitute a noisy speech spectral enhancement system. In Section 2, the problem of speech spectral enhancement is formulated mathematically. Then, the time-frequency correlations of spectral coefficients for speech and noise signals are addressed and the statistical models are presented that confirm with these characteristics in Section 3. In Section 4, some estimators are given for speech spectral coefficients under speech presence uncertainty based on various fidelity criteria. The problem of speech presence probability estimation is addressed in Section 5. The useful estimators such as decision-directed approach and the recursive estimation approach for the a priori signal-to-noise ratio (SNR) under speech presence uncertainty are presented in Section 6. In additions, some typical and useful estimators for noise power spectrum are described in Section 7. In Section 8, the main types of spectral enhancement components are surveyed, and the significance of the choice of the statistical model, fidelity criterion, a priori SNR estimator, and noise spectrum estimator is discussed as well. Finally, some concluding comments and the future are made and discussed in Section 9.
  Keywords:Speech enhancement Statistical model Signal estimation Signal presence probability A priori SNR Noise power spectrum Choice for components of spectral enhancement algorithm Future
  
  1、引言
  基于单个麦克风的含噪语音信号频谱增强技术,作为语音识别、助听系统和免提终端通信中的一个重要应用研究领域,一直受到有关学术和工业界的高度重视并引起极大的研究热诚[1-3],迄今已出现了大量技术方法。最为著名的技术应该算“谱相减”(spectral subtraction)方法[4-5],该方法首先基于含噪语音信号的短时功率谱密度来估计出背景噪声短时功率谱密度,然后将含噪语音信号的短时功率谱密度减去已估背景噪声短时功率谱密度,用该差值的平方根作为频谱幅度,并与原含噪语音信号频谱的相位一起来估计原非含噪语音信号,从而达到语音增强结果。这一技术通常使增强语音信号中存在着影响听觉效果的随机起伏窄带残留噪声,即音乐噪声(musical tone)。为减小和降低这一音乐噪声,Boll[4]、Berouti[6]、Goh[7]、Sim[8]和Gustafsson[9]等相继地提出了一些行之有效的方法,以便改进“谱相减”技术的性能。而Tsoukalas[10]和Virag[11]则分别从人类听觉系统的特性出发,提出了基于听觉遮隐特性(masking property)的谱相减技术。“谱相减”类型的语音增强技术均对语音和噪声作出了最少的先验假设,在应用和实际实现时,合理地选择有关参数,对某些应用场合可得到符合要求的增强结果。
  与之相反,语音增强另一类称之为“统计方法”的技术则需要估计出语音增强信号,使其与原非退化语音信号间的失真度最小化[12-16]。这类方法均需要对语音和噪声的可靠统计模型做出先验假定,并且还需要事先规定或定义听觉意义上的失真测度。鉴于语音信号的统计模型和听觉意义上的失真测度至今尚未完全确定,那么现有的不同语音增强统计方法之差别主要在于它们所基于的语音统计模型[12-15]和失真测度[17-19]之不同以及谱增强算法的特殊实现方式[2]。基于隐马尔可夫过程(Hidden Markov Process,HMP)的谱增强技术试图避开对语音和噪声过程的特定统计分布作事先假设[20-23],它首先根据噪声和非退化语音样本的长训练集序列来估计语音和噪声过程的概率分布,然后将已估两过程之概率分布同时应用于一个给定的失真测度以便导出一个语音信号估计器。通常假设一给定状态序列所产生的矢量是统计独立的。通过利用每个子源非对角协方差矩阵并假设由一给定状态序列所产生的矢量为一个非零阶自回归(AR)过程,那么HMP可扩展到能够处理语音信号时频相关性情况[21,24]。基于HMP语音增强技术十分依赖于训练数据集的类型[25],在训练集涵盖的噪声类型条件下,其工作性能较好,而在其它类型的噪声条件下,其工作性能较差;并且性能的改善一般地需要更为复杂的模型和更大的计算量。尽管HMP模型已成功地应用于非退化语音信号自动识别领域[26-27],但对语音增强应用而言,该模型的精度尚未能满足这一特定应用的要求[3]。子空间法[28-31]试图将含噪信号的矢量空间分解成一个“信号加噪声”子空间和一个“噪声”子空间。通过移去噪声子空间并在相应余下的子空间中来估计语音信号,从而实现语音谱增强技术。信号空间的分解现有二种方法:基于含噪矢量Toeplitz协方差估计之特征值分解的Karhunen-Loève变换(KLT)[28,30]和数据矩阵的奇异值分解(SVD)[32-33]。在“信号加噪声”子空间中,应用线性估计技术来达到信号失真最小化、并由信号来遮荫残留噪声的目的。为此,Jabloun[34]和Hu[35]则从人的听觉系统遮蔽特性和降低残留噪声感知效应的角度出发,分别提出了用于含噪语音增强的感知信号子空间法。
本文为全文原貌 未安装PDF浏览器用户请先下载安装 原版全文   本文意在讨论和描述含噪语音信号频谱增强系统设计的基本模块元素及其相应的统计技术方法。文中首先描述了谱增强技术的关联问题,其次讨论了语音和噪声信号谱系数的时-频相关特性并给出与该特性一致的相应统计模型。接着,本文讨论和描述了在语音信号出现不确定性下根据不同保真度准则而导出的相应语音谱系数估计器,并讨论了语音信号出现概率的估计问题。文中还讨论了分别基于决策引导(decision-directed)技术和递归估计(recursive estimation)技术的先验SNR估计器,以及噪声功率谱估计的最小统计量技术、最小值控制递归平均(MCRA)技术及其改进型(IMCRA)、连续谱最小值跟踪技术和加权平均技术。最后,本文讨论了含噪语音信号频谱增强算法的选择问题,并展望了其今后可能的研究与发展方向。
  2、含噪语音信号谱增强技术问题的数学表征
  设x(n)为非退化语音信号,d(n)为不相关的加性噪声,y(n)=x(n)+d(n)为可观测的退化语音信号。应用短时Fourier变换(STFT)将y(n)变换至时-频域,可得:
  本文以下就语音谱增强系统中诸如统计模型、保真度、先验SNR估计器和噪声谱估计器的选择问题进行若干讨论。
  8.1 统计模型和保真度的选择
  目前,高斯语音统计模型构成了许多语音增强算法的设计基础[12,17,18,42,64-66]。这一模型的建立根源于概率论中的“中心极限定理”,因为语音信号每个Fourier展开系数均可表式为来自随机序列的随机变量之加权和[12]。当信号内相关性跨度与其帧长度相比充分小时,谱系数的概率分布函数随帧长度的增加而渐进地逼近高斯分布。这种高斯近似仅在均值附近的高斯曲线中心区域有效,而在远离均值的曲线尾部区域,则其近似进度很差[67]。Porter和Boll[46]指出,先验语音谱具有伽玛概率分布函数而不是高斯概率分布函数,他们提出从语音信号数据直接来计算最佳估计器,以取代基于语音统计量参数模型的最佳估计器。Martin[40]考虑了一种伽玛语音模型,其中非退化语音谱分量的实部和虚部模型化为独立同分布的伽玛随机变量。在假设不同频谱分量是统计独立的条件下,Martin给出了高斯和拉普拉斯噪声模型下复语音谱系数的MMSE估计器,并指出:在高斯噪声模型下,伽玛语音模型在分段信噪比(Segmental SNR)方面比高斯语音模型可获得极大的改善;而在拉普拉斯噪声模型下,伽玛语音模型比高斯语音模型具有更低的残留音乐噪声。Martin和Breithaupt[45]研究表明:当用拉普拉斯随机变量来模型化语音谱系数分量的实部和虚部时,复语音谱系数的MMSE估计器与在伽玛语音模型下导出的MMSE估计器具有相似的特性,但却易于计算和实现。Breithaupt和Martin[68]用相同的统计模型导出了谱系数幅度平方的MMSE估计器,并与高斯语音模型下的相应估计器在性能上作比较,结果发现其分段SNR的提高是以增加残留音乐噪声电平为代价的。基于高斯噪声模型和超高斯(Super-Gauss)语音模型,Lotter和Vary[69]导出了语音谱幅度最大后验(MAP)估计器,他们提出了语音谱幅度的一种参数化的概率密度函数(pdf),通过适当选择参数,该pdf可近似为伽玛和拉普拉斯密度函数;通过与Ephraim-Malah的MMSE谱幅度估计器[12]比较,拉普拉斯语音模型的MAP估计器在噪声抑制方面有明显改善。语音高斯、伽玛和拉普拉斯统计模型均考虑了连续的语音谱分量间的时域相关性。在STFT变换域中,由于分析帧的有限长度和连续帧间的重叠存在,人们通常假设谱分量同时具有时域间和频域间的统计相关性[15]。语音增强性能试验表明[16,43]:高斯、伽玛和拉普拉斯语音统计模型的实用性在很大程度上取决于选择何种先验SNR估计器;当应用决策引导技术的先验SNR估计器时,伽玛语音模型比高斯语音模型具有更多的优点;当应用非因果递归技术的先验SNR估计器时,与其它模型相比,拉普拉斯语音模型则获得更高的分段SNR和更低的对数谱失真(LSD)而高斯语音模型则获得最小的残留音乐噪声电平;此外,与应用决策引导技术的先验SNR估计器相比,当应用非因果递归技术的先验SNR估计器时,高斯、伽玛和拉普拉斯语音模型间的差别将变小。应该指出的是,与MMSE估计器相比,谱幅度或对数谱幅度MSE失真最小化的估计器更适合于语音增强应用;而且,MMSE-LSA估计器的解析表达式仅在高斯语音模型下存在,而对伽玛和拉普拉斯语音模型,其推导异常困难乃至不可能、甚至不存在。因此在高斯语音模型下的MMSE-LSA估计器常常最为选用[2]。
  8.2 先验SNR估计器的选择
  Ephraim和Malah[12,70]提出了三种不同的先验SNR估计器:最大似然(ML)估计器、决策引导(DD)技术估计器和最大后验(MAP)估计器。先验SNR的ML估计器假设语音谱方差是慢时便参数,这便大致了音乐残留噪声,它不利于增强后的语音信号之听觉效果。DD先验SNR估计器特别适合与MMSE-SA或MMSE-LSA联合使用,其结果使得增强后的语音信号中残留噪声在听觉上无色化(perceptually colorless),但DD估计器是启发式产生的,由于其高度的非线性,目前在理论上尚不知其性能。MAP先验SNR估计器依赖于产生语音谱方差序列的一阶Markov模型,它涉及到一组非线性方程,该方程可由Viterbi算法递归地求解。MAP估计器的计算复杂度相对于DD估计器而言要大,但并未使增强后的语音质量获得有效的改善[70]。近二十多年来,DD技术广泛地应用于语音谱系数方差的估计,但DD中有关参数通常是由对每个语音增强算法和时频变换的特定设置而进行的模拟实验和主观听觉测试来加以确定。注意到,DD技术并不需语音统计模型任何信息,因而其参数不必自适应于语音谱系数分量,可以实现设定。
  Cohen[53,78]提出了另两种不同的先验SNR估计器:因果递归估计器(Causal Recursive Estimation,CRE)和非因果递归估计器(Non-causal Recursive Estimation,NCRE)。先验SNR之CRE包含称之为传播步骤(propagation step)和更新步骤(update step)的两部处理,它遵循着Kalman滤波的原理,在新数据到来时递归地预测和更新语音谱方差的估计。CRE与DD估计器有着紧密的联系,一个带有时变频率相关平滑因子的DD估计器实际上是CRE的一个特例。注意到这一平滑因子是瞬时SNR的单调减函数,那么该平滑因子在语音不出现期间具有较大的取值而在语音出现期间具有较小的取值,这便改善了残留音乐噪声和增强语音信号失真问题。然而,与DD估计器相比,CRE的这一改善并不可观。先验SNR之NCRE利用将来的谱测量来更好地预测非退化语音的谱方差。实验比较表明:CRE和NCRE的主要差别在于语音信号的开始端(speech onset)。CRE和DD估计器在瞬时SNR突发增大时均不能很快地相应,否则将意味着增大残留音乐噪声的电平。与其相反,NCRE由于利用了可资的一些将来的含噪谱观测因而能够辨识出瞬时SNR突发增大是由语音信号开始端引发的还是由噪声的不规则性引起的,从而可快速地响应于瞬时SNR的突发增大。
本文为全文原貌 未安装PDF浏览器用户请先下载安装 原版全文   因此,在增强信号和含噪观测信号间的时延需最小化时,建议选用DD估计器。然而,在诸如数字语音记录、监控和语音识别等应用场合,增强信号和含噪观测信号间的时延可为几帧,这时建议选用NCRE。
  8.3 噪声估计器的选择
  传统的噪声估计方法通常都是在无语音信号期间进行递归平均处理来估计噪声而在有语音信号期间保持其估计不变。然而,这些方法通常需应用VAD技术,而噪声估计的更新须限定在无语音信号期间进行。众所周知,对若语音信号分量和低输入SNR,VAD的可靠性将严重地退化[65,71,72]。这便致使传统噪声估计器的性能急剧恶化。噪声估计另一类技术则基于功率谱域的直方图[55,73,74],尽管它们避免了使用VAD,但其计算复杂度高、所需的存储资源多,在低SNR的条件下,工作性能差。此外,用来建立直方图的信号段通常需几百毫秒长,因此噪声估计的更新率本质上只能算中等而不算快。
  Martin提出了一个称之为最小统计量的有效噪声估计技术[59],它通过跟踪含噪信号功率谱平滑后的最小值并将其乘以一个偏置补偿因子来获得噪声的无偏估计。然而该方法的估计方差近似为传统方法方差的二倍[59],而且它或许会偶尔衰减低能量音素,特别在最小值搜索窗的长度较短时[75]。为克服这些缺陷,平滑因子参数和偏置补偿因子需在时频域中作自适应变化[56]。Doublinger提出了一种计算有效的最小值跟踪方法[57],但该方法缺点是:(1)在噪声能量电平突发增大时,噪声估计的更新率较慢;(2)有抵消信号的趋向。另一类相关的技术则是低能量包络跟踪[55]和基于分位数(quantile-based)的估计方法[76]。与选择平化周期图的最小值不同,该类方法则基于含噪信号非平滑周期图的一个时域分位数来估计噪声,其缺点是高的计算复杂度和用于保留过去谱功率值的额外存储量。Cohen提出的IMCRA噪声估计器[54]联合了递归平均的简易性和最小值跟踪的稳健性,其平滑因子参数在时频域内根据语音出现概率作自适应地变化,噪声估计由此可连续地更新,即便在若语音信号出现期间;该估计器由含噪测量平滑周期图的最小值来控制,它联合利用了瞬时和局部测量功率的条件来提供一个语音出现和不出现之间的软转换(soft transition),这便阻止了在语音活动期间噪声估计量的偶尔增大;而且,在二个迭代中分别进行的平滑处理和最小值跟踪,可让人们使用较大长度的平滑窗和较小长度的最小值跟踪搜索窗,即使在强语音活动期间,也能可靠地跟踪最小值。这便见底了最小值的方差和速短了响应于噪声功率增加的延迟。在非平稳噪声环境和低SNR的条件下,IMCRA技术特别有用[54]。
  9、结语
  本文系统地讨论和描述了语音和噪声信号在STFT变换域中的统计模型,并导出了在语音信号出现不确定性条件下的语音谱系数估计器。统计模型充分考虑了语音信号连续谱分量间的时间相关性,而语音信号谱估计器涉及了噪声功率谱估计、语音信号出现概率的计算、在语音信号出现不确定情况下先验SNR估计。文中我们讨论了MMSE、MMSE-SA和MMSE-LSA谱增益函数的行为特征以及MMSE-LSA抗音乐噪声的优点机理。事实上,纯噪声帧期间后验SNR的局部野值可被拉回到平均噪声电平,从而避免了超出平均特性的噪声局部积累。先验语音信号出现概率估计器充分利用了连续帧邻近频段中语音出现具有强相关性这一特性,从而能进一步降低噪声分量同时避免语音开始端的钳制和弱语音信号尾部的误检。文中我们还给出并讨论了在语音信号出现不确定情况下的若干先验SNR估计器,指出了因果递归估计器在特定的情况下可蜕变成一个具有时变频率相关平滑因子的决策引导估计器;而且,在容许增强语音和含噪观测间的延迟限于几帧的应用场合,非因果递归估计器能获得比因果递归估计器较低的信号失真和较少的音乐残留噪声。另外,本文详细讨论了噪声功率谱估计的若干方法,并重点介绍了MS、MCRA和IMCRA估计器。最后,文中还就语音增强算法中统计模型和保真度的选择、先验SNR估计器的选择和噪声功率谱估计器的选择问题,提出了作者的看法。
  语音增强技术至今仍是一个活跃的研究领域。随着人们对语音信号统计特性和人类听觉系统的深入理解和利用,语音增强系统的性能将得到进一步地提高和改善。作者认为今后语音增强技术的研究将可能围绕着以下几个方向展开:
  (1)有意义的语音增强最优化准则是什么?如何在数学上来描述和表达它们?
  (2)信号变换的哪类分析技术(例如短时Fourier变换、子带变换和子波变换等)最适合于语音增强领域?
  (3)在不损害智能性(intelligibility)的条件下,如何提高和改善增强信号的听觉质量(perceived quality)?在不损害听觉质量的条件下,如何提高和改善增强信号的智能性?
  (4)在语音增强领域中如何将信号理论和感知技术有效和有机地结合起来?
  (5)对于正常或有听觉障碍者感知的信号而言,什么处理技术是最优的?对于语音编码器或者语音识别系统所处理的信号而言,什么处理技术是最优的?在这些分别最优的处理技术中,它们之间关联性是什么?
  (6)人类听觉系统的高级阶段中将有哪些处理过程发生?如何对这些处理过程进行数学建模?
  我们深信,随着电子与信息技术、人工智能技术的飞速发展和人脑听觉系统研究的进一步深化,语音增强技术将发生革命性的变革!
  参考文献
  [1]J. Benesty, S. Makino & J. Chen (Eds.): “Speech Enhancement,” Springer, Berlin, Heidelberg 2005.
  [2]Y. Ephraim & I. Cohen, “Recent advancements in speech enhancement,” In: The Electrical Engineering Handbook, Circuits, Signals, and Speech and Image Processing, 3rd Edn., Ed. By R.C. Dorf (CRC, Boca Raton 2006)pp.15-12-15-26, Chap.15.
  [3]Y. Ephraim, H. Lev-Ari & W.J.J. Roberts, “A brief survey of speech enhancement,” In: The Electronic Handbook, 2nd Edn.(CRC�Press, Boca Raton 2005)
  [4]S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust. Speech Signal Processing, Vol. 27, No. 2, 1979, pp. 113-120.
  [5]J. S. Lim & A. V. Oppenheim, “Enhancement and bandwidth compression of noisy speech,” Proceedings of IEEE, Vol. 67, No. 12, 1979, pp. 1586-1604.
本文为全文原貌 未安装PDF浏览器用户请先下载安装 原版全文   [6]M. Berouti, R. Schwartz, & J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” Proc. 4th ICASSP 79, 1979, pp. 208-211.
  [7]Z. Goh, K.-C. Tan, & T. G. Tan, “Postprocessing method for suppression musical noise generated by spectral subtraction,” IEEE Trans. Speech Audio Process., Vol. 6, No. 3, 1998, pp. 287-292.
  [8]B. L. Sim, Y. C. Tong, J. S. Chang, & C. T. Tan, “A parametric formulation of the generalized spectral subtraction method,” IEEE Trans. Speech Audio Process., Vol. 6, No. 3, 1998, pp. 328-337.
  [9]H. Gustafsson, S. E. Nordholm, & I. Claesson, “Spectral subtraction using reduced delay convolution and adaptive averaging,” IEEE Trans. Speech Audio Process., Vol. 9, No. 8, 2001, pp. 799-807.
  [10]D. E. Tsoukalas, J. N. Mourijopoulos & G. Kokkinakis, “Speech enhancement based on audible noise suppression,” IEEE Trans. Speech Audio Process., Vol. 5, No. 6, 1997, pp. 497-514.
  [11]N. Virag, “Single channel speech enhancement based on masking properties of the human auditory system,” IEEE Trans. Speech Audio Process., Vol. 7, No. 2, 1999, pp. 126-137.
  [12]Y. Ephraim & D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust. Speech Signal Process., Vol. 32, No. 6, 1984, pp. 1109-1121.
  [13]D. Burshtein & S. Gannot, “Speech enhancement using a mixture-maximum model,” IEEE Trans. Speech Audio Process., Vol. 10, No. 6, 2002, pp. 341-351.
  [14]R. Martin, “Speech enhancement based on minimum mean-square error estimation and super-gaussian priors,” IEEE Trans. Speech Audio Process., Vol. 13, No. 5, 2005, pp. 845-856.
  [15]I. Cohen, “Relaxed statistical model for speech enhancement and a priori SNR estimation,” IEEE Trans. Speech Audio Process., Vol. 13, No. 5, 2005, pp. 870-881.
  [16]I. Cohen, “Speech spectral modeling and enheteroscedasticity models,” Signal Process. Vol. 86, No. 4, 2006, pp. 698-709.
  [17]Y. Ephraim & D. Malah, “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Trans. Acoust. Speech Signal Process. Vol. 33, No. 2, 1985, pp. 443-445.
  [18]P. J. Wolfe & S. J. Godsill, “Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement,” Spectral Issue EURASIP JASP Digital Audio Multim. Commun., No.10, 2003, pp. 1043-1051.
  [19]P. C. Loizou, “Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum,” IEEE Trans. Speech Audio Process., Vol. 13, No. 5, 2005, pp. 857-869.
  [20]B. H. Juang & L. R. Rabiner, “Mixture autoregressive hidden Markov models for speech signals,” IEEE Trans. Acoust. Speech Signal Process. Vol. 33, No. 6, 1985, pp. 1404-1413.
  [21]Y. Ephraim, “Statistical-model-based speech enhancement systems,” Proceedings of IEEE, Vol. 80, No. 10, 1992, pp. 1526-1555.
  [22]H. Sheikhzadeh & L. Deng, “Waveform-based speech recognition using hidden filter models: Parameter selection and sensitivity to power normalization,” IEEE Trans. Speech Audio Process., Vol. 2, 1994, pp. 80-91.
本文为全文原貌 未安装PDF浏览器用户请先下载安装 原版全文   [23]Y. Ephraim & N. Merhav, “Hidden Markov processes,” IEEE Trans. Inform. Theory, Vol. 48, No. 6, 2002, pp. 1518-1568.
  [24]C. J. Wellekens, “Explicit time correlations in hidden Markov models for speech recognition,” Proc. 12th ICASSP 87, 1987, pp.384-386.
  [25]H. Sameti, H. Sheikhzadeh, L. Deng & R. L. Brennan, “HMM-based strategies for enhancement of speech signals embedded in nonstationary noise,” IEEE Trans. Speech Audio Process., Vol. 6, No. 5, 1998, pp. 445-455.
  [26]L. R. Rabiner & B.-H. Juang, “Fundamentals of speech recognition,” Prentice-Hall, Upper Saddle River, 1993.
  [27]F. Jelinek, “Statistical methods for speech recognition,” MIT Press, Cambridge 1998.
  [28]Y. Ephraim & H. L. V. Trees, “A signal subspace approach for speech enhancement,” IEEE Trans. Speech Audio Process., Vol. 3, No. 4, 1995, pp. 251-266.
  [29]F. Asano, S. Hayamizu, T. Yamada & S. Nakamura, “Speech enhancement based on the subspace method,” IEEE Trans. Speech Audio Process., Vol. 8, No. 5, 2000, pp. 497-507.
  [30]U. Mittal & N. Phamdo, “Signal / noise KLT based approach for enhancing speech degraded by colored noise,” IEEE Trans. Speech Audio Process., Vol. 8, No. 2, 2000, pp. 159-167.
  [31]Y. Hu & P. C. Loizou, “A generalized subspace approach for enhancing speech corrupted by colored noise,” IEEE Trans. Speech Audio Process., Vol. 11, No. 4, 2003, pp. 334-341.
  [32]S. H. Jensen, P. C. Hansen, S.D. Hansen & J. A. Sφrensen, “Reduction of broad-band noise in speech by truncated QSVD,” IEEE Trans. Speech Audio Process., Vol. 3, No. 6, 1995, pp. 439-448.
  [33]S. Doclo & M. Moonen, “GSVD-based optimal filtering for single and multimicrophone speech enhancement,” IEEE Trans. Signal Process., Vol. 50, No. 9, 2002, pp. 2230-2244.
  [34]F. Jabloun & B. Champagne, “Incorporating the human hearing properties in the signal subspace approach for speech enhancement,” IEEE Trans. Speech Audio Process., Vol. 11, No. 6, 2003, pp. 700-708.
  [35]Y. Hu & P. C. Loizou, “A perceptually motivated approach for speech enhancement,” IEEE Trans. Speech Audio Process., Vol. 11, No. 5, 2003, pp. 457-465.
  [36]J. Wexler & S. Raz, “Discrete Gabor expansions,” Speech Process., Vol.21, No. 3, 1990, pp. 207-220.
  [37]R. E. Crochiere & L. R. Rabiner, “Multirate digital signal processing,” Prentice-Hall, Englewood Cliffs, 1983.
  [38]J. S. Garofolo, “Getting Started with the DARPA TIMIT CD-ROM: An Acoustic Phonetic Continuous Speech Database,” NIST, Gaithersburg 1988.
  [39]A. Stuart & J. K. Ord, “Kendall’s Advanced Theory of Statistics,” Vol. 1, 6th Edn, Arnold, London, 1994.
  [40]R. Martin, “Speech enhancement using MMSE short time spectral estimation with Gamma distributed speech priori,” Proc. 27th ICASSP’02, 2002, pp. 253-256.
  [41]I. Cohen, “Modeling speech signals in time-frequency domain using GARCH,” Signal Process., Vol. 84, No. 12, 2004, pp. 2453-2459.
  [42]I. Cohen & B. Berdugo, “Speech enhancement for non-stationary noise environments,” Signal Process., Vol. 81, No. 11, 2001, pp. 2403-2418.
本文为全文原貌 未安装PDF浏览器用户请先下载安装 原版全文   [43]I. Cohen, “Speech enhancement using supper-gaussian speech models and non-causal a priori SNR estimation,” Speech Commun., Vol. 47, No. 3, 2005, pp.336-350.
  [44]I. S. Gradshteyn & I. M. Ryzhik, “Table of Integrals, Series, and Products,” 4th Edn., Academic Press, New York, 1980.
  [45]R. Martin & C. Breithaupt, “Speech enhancement in the DFT domain using Laplacian speech priori,” in Proc. 8th Int. Workshop on Acoustic Echo and Noise Control, Kyoto, Japan, 2003, pp. 87-90.
  [46]J. Porter & S. Boll, “Optimal estimators for spectral restoration of noisy speech,” in Proc. ICASSP’84, 1984, pp. 18A.2.1-18A.2.4.
  [47]O. Cappe, “Elimination of the musical noise phenomenon with Ephraim and Malah noise suppressor,” IEEE Trans. Acoust. Speech Signal Process., Vol. 2, No. 2, 1994, pp. 345-349.
  [48]P. Scalart & J. Vieira-Filho, “Speech enhancement based on a priori signal to noise estimation,” Proc. 21st ICASSP’96, 1996, pp. 629-632.
  [49]D. Malah, R. V. Cox & A. J. Accardi, “Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments, Proc. 24th ICASSP’99, 1999, pp. 789-792.
  [50]I. Cohen, “On speech enhancement under signal presence uncertainty,” Proc. 26th ICASSP’01, 2001, pp.167-170.
  [51]I. Y. Soon, S. N. Koh, & C. K. Yeo, “Improved noise suppression filter using slf-adaptve estimator of probability of speech absence,” Signal Process., Vol. 75, No. 2, 1999, pp. 151-159.
  [52]M. Marzinzik, “Noise reduction schemes for digital hearing aids and their use for the hearing impaired,” Ph.D. Thesis, Oldenburg Univ., Oldenburg, 2000.
  [53]I. Cohen, “Speech enhancement using a noncausal a priori SNR estimator,” IEEE Signal Process. Lett., Vol. 11, No. 9, 2004, pp. 725-728.
  [54]I. Cohen, “Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging,” IEEE Trans. Speech Audio Process., Vol. 11, No. 5, 2003, pp. 466-475.
  [55]C. Ris & S. Dupont, “Assessing local noise level estimation methods: Application to noise robust ASR,” Speech Commun., Vol. 34, No.1-2, 2001, pp. 141-158.
  [56]R. Martin, “Noise power spectral density estimation based on optimal smoothing and minimum statistics,” IEEE Trans. Speech Audio Process., Vol. 9, No. 5, 2001, pp. 504-512.
  [57]G. Doublinger, “Computationally efficient speech enhancement by spectral minima tracking in subbands,” Proc. 4th Eurospeech’95, 1995, pp. 1513-1516.
  [58]S. Qian & D. Chen, “Discrete Gabor transform,” IEEE Trans. Signal Process., Vol. 47, No. 7, 1993, pp. 2429-2438.
  [59]R. Martin, “Spectral subtraction based on minimum statistics,” Proc. 7th EUSIPCO’94, 1994, pp. 1182-1185.
  [60]A. Varga & H. J. M. Steeneken, “Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Commun., Vol. 12, No. 3, 1993, pp. 247-251.
  [61]S. R. Quackenbush, T. P. Barnwell & M. A. Clements, “Objective Measures of Speech Quality,” Prentice-Hall, Englewood Cliffs, 1988.
本文为全文原貌 未安装PDF浏览器用户请先下载安装 原版全文   [62]J. R. Deller, J. H. L. Hansen & J. G. Proakis, “Discrete-Time Processing of Speech SignalS,” 2nd Edn., IEEE Press, New York, 2000.
  [63]P. E. Papamichalis, “Practical Approaches to Speech Coding,” Prentice-Hall, Englewood Cliffs, 1987.
  [64]A. J. Accardi & R. V. Cox, “A modular approach to speech enhancement with an application to speech coding,” Proc. 24th ICASSP’99, 1999, pp. 201-204.
  [65]J. Sohn, N. S. Kim & W. Sung, “A statistical model-based voiceactivity detector,” IEEE Signal Process. Lett., Vol. 6, No. 1, 1999, pp. 1-3.
  [66]T. Lotter, C. Benien & P. Vary, “Multichannel speech enhancement using Bayesian spectral amplitude estimation,” Proc. 28th ICASSP’03, 2003, pp. 832-835.
  [67]J. W. B. Davenport, “Probability and Random Processes: An Introduction for Applied Scientists and Engnieers,” McGraw-Hill, New York, 1970.
  [68]C. Breithaupt & R. Martin, “MMSE estimation of magnitude-squared DFT coefficients with supper-gaussian prioris,” Proc. 28th ICASSP’03, 2003, pp. 896-899.
  [69]T. Lotter & P. Vary, “Noise reduction by maximum a posteriori spectral amplitude estimation with supper-gaussian speech modeling,” Proc. 8th Internat. Workshop on Acoustic Echo and Noise Control, 2003, pp. 83-86.
  [70]Y. Ephraim & D. Malah, “Signal to Noise Ratio Estimation for Enhancing Speech Using the Vertibi Algorithm,” Tech. Rep., EE PUB 489, Technion-Israel Institute of Techology, Haifa, 1984.
  [71]J. Meyer, K. U. Simmer & K. D. Kammeyer, “Comparison of one- and two-channel noise-estimation techniques,” Proc. 5th IWAENC’97, 1997, pp. 134-145.
  [72]B. L. McKinley & G. H. Whipple, “Model based speech pause detection,” Proc. 22nd ICASSP’97, 1997, pp. 1179-1182.
  [73]R. J. McAulay & M. L. Malpass, “Speech enhancement using a soft-decision noise suppression filter,” IEEE Trans. Acoust. Speech Signal Process., Vol. 28, No. 2, 1980, pp. 137-145.
  [74]H. G. Hirsch & Ehrlicher, “Noise estimation techniques for robust speech recognition,” Proc. 20th ICASSP’95, 1995, pp. 153-156.
  [75]I. Cohen & B. Berdugo, “Speech enhancement for non-stationary noise environments,” Signal Process., Vol. 81, No. 11, 2001, pp. 2403-2418.
  [76]V. Stahl, A. Fischer & R. Bippus, “Quantile based noise estimation for spectral subtraction and Wiener filtering,” Proc. 25th ICASSP’00, 2000, pp. 1875-1878.
  [77]I. Cohen & B. Berdugo, “Noise estimation by minima controlled recursive averaging for robust speech enhancement,” IEEE Signal Process. Lett., Vol. 9, No. 1, 2002, pp. 12-15.
  [78]I. Cohen, “Relaxed statistical model for speech enhancement and a priori SNR estimation,” IEEE Trans. Speech Audio Process., Vol. 13, No. 5, 2005, pp. 870-881.
本文为全文原貌 未安装PDF浏览器用户请先下载安装 原版全文

猜你想看
相关文章

Copyright © 2008 - 2022 版权所有 职场范文网

工业和信息化部 备案号:沪ICP备18009755号-3