2017年司法语音及声学研究

2018-02-21 16:54康锦涛王晓笛李敬阳黄文林

刑事技术 2018年3期

康锦涛，王莉，王晓笛，盛卉，李敬阳，黄文林

（公安部物证鉴定中心，2011计划司法文明协同创新中心，北京 100038）

司法语音及声学在我国即为广义上的声纹鉴定，包括司法语音学检验中的语音同一认定、语音人身分析、语音内容辨识和司法声学检验中的录音的真实性检验、降噪及语音增强、噪声分析、音源同一鉴定以及录音器材鉴定等内容[1]。国外司法语音及声学的研究内容与我国大致相同[2]。2017年，语音同一认定仍是司法语音及声学的主要内容，其在听觉分析、语音学-声学分析、自动识别、质量控制等方面均产生了新的成果；语音人身分析除传统的性别、年龄等特征外，语音情感分析也成为重要内容，并在自动识别方面发展迅速；各国学者也在录音的真实性检验以及降噪及语音增强等方向做了开拓。本文对2017年司法语音及声学领域的语音同一认定、语音人身分析、录音的真实性检验、降噪及语音增强等热点专业的代表性成果进行介绍。

1 语音同一认定

语音同一认定在我国即为狭义上的声纹鉴定[3]，它也是司法语音及声学实践中的主要分支[4]。目前，国际上的语音同一认定实践中，绝大多数机构与从业者采用的是听觉分析与声学分析相结合的专家鉴定方法[5]，但也有一些机构开始尝试将自动识别的方法引入语音同一认定领域，采用半自动（专家干预）或自动识别等方法开展实践[6-7]。2017年，关于语音同一认定的专业论述多数集中在听觉分析方法、语音学及声学特征分析、语音特征的鉴定价值、鉴定意见表述、自动识别技术以及语音同一认定过程中的质量控制与标准化等方面。

1.1 听觉分析

听觉分析是目前语音同一认定技术方法的重要组成部分[1,8-10]，在国内外许多规范标准中早有明确规定[11-16]。2017年，Sundqvist等[17]设计了一套听觉分析程序，并将之应用于瑞典国家法庭科学中心（NFC）的检验实践中。为了推进听觉分析方法的体系化与规范化，Lindh等[18]对听觉分析方法的可靠性做了考察，分别使用听觉分析与自动识别对芬兰语说话人进行对比分析，并用于芬兰国家调查局（NBI）的语音同一认定实践的流程改进。Leinonen等[19]提出建立不同语种的听觉特征集，并在瑞典语和芬兰语两个语种上开始了初步尝试。Land等[20]对笑声的听觉分析价值进行了探讨。在伪装语音的研究方面，Skarnitzl与 Růžičková等[21-22]研究了捷克语说话人的常见伪装方式，并对不同伪造方式下的听觉特征与声学特征做了初步分析，Delvaux等[23]考察了伪装与模仿两种方式下听觉特征与声学特征的差异。

嗓音特质分析（Vocal Prof i le Analysis, VPA）在语音同一认定中的应用是近年来听觉分析研究的热点[24-29]，2017年，许多专家学者继续就这一方向进行探索。为了便于分析，Segundo等[30]设计了简化的VPA分析表，并应用于同卵双胞胎的听觉分析上；Segundo等[31]验证了VPA分析表在西班牙语、德语、英语语境下的有效性。Klug[32]就VPA分析表的改进做了探讨，提出应当在加强培训的基础上改进要素的类目。Hughes等[33-34]将VPA分析表得分与自动识别方法结合起来考察，结果表明，将使用梅尔频率倒谱系数（MFCC）参数与长时共振峰分布（LTFD）特征的自动识别系统融合，系统性能提升有限，将VPA得分结果加入后，系统识别正确率显著增加。

1.2 语音学-声学分析

听觉分析与语音学-声学分析是共生互补的关系[35-36]，语音学-声学分析方法不仅为听觉分析提供量化支持，而且也可以提供新的特征[3]。在语音学-声学分析方面，Heuven、Gold等[37-38]继续就填词暂停（f i lled pauses）、犹豫词（hesitation markers）的声学特征进行分析，以进一步挖掘其在语音同一认定中的价值。He等[39]研究了不同说话人的重音变化受噪音或不同频段影响的程度，结果表明不同说话人的重音特征在全频段上都有较好的体现。双语者在说两种语言时的声学特征各有何特点是一直以来的研究课题之一，Dorreen等[40]就这个课题下的长时基频分布做了研究。Arantes等[41]考察了语种、话语方式等因素对长时基频达到稳定状态时的时长影响，结果表明话语方式的影响最大。Dimos、Lopez等[42-43]研究了大喊状态下语音的节奏、韵律以及频谱特征。He等[44]研究了音强曲线的声纹鉴定价值。不同语种的元音空间（vowel space）并不相同，Varošanec-Škarić[45]研究了克罗地亚语、塞尔维亚语和斯洛文尼亚语男性说话人元音空间的异同，为开展不同语种间的说话人鉴定提供了一定基础。McDougall等[46]比较了基于音节与基于时间的两种流利度描写方法。Wang等[47]研究了汉语复合元音的动态特征，结果表明复合元音也具备较高的声纹鉴定价值。Heeren[48]对电话录音中[s]在不同语境下的不同声学特性进行了探讨。在嗓音档案（voice prof i le）的构建方面，Franchini[49]以[l]音的声学特征为例对此做了研究，Fingerling[50]对二语说话人的元音集合重建做了探索。

1.3 语音特征的价值

在语音同一认定中，语音特征价值的高低是需要重点考虑的内容。根据语音特征的动态性原理，其具有变异性（即同一说话人的自身的差异）和差异性（即不同说话人之间的差异），变异小而差异大的特征鉴定价值较高。2017年，对于特征价值的关注点主要在人群的语音特征分布上。Rhodes等[51]认为现阶段的人群特征分布研究应与实际案件结合。Hughes、Wormald[52]提出建立维基方言库的构想，将方言中的高价值特征放入数据库。Hughes等[53]提出了研究人群语音特征分布需要考虑的四个问题，一是控制因子，二是特异度，三是误差，四是确定程度，并以英语中双元音[ai]中的共振峰走势为例，说明了不同情况下的语音特征分布对语音同一认定结果的可能影响。在检材与样本内部语音特征的表现是否稳定方面，在以往部分研究的基础上，Ajili[54-56]提出一种使用信息论中的同质化度量（homogeneity measure）标准对声学参数的稳定性进行度量的方法[57]。

1.4 声纹鉴定意见表述

声纹鉴定的意见表述一直以来都是讨论热点。国际上，Rose和Morrison一直提倡量化的似然比体系，英国的Nolan等绝大部分从业人员使用英国立场声明形式，欧洲大陆的大部分从业者则使用可能性等级形式。我国则多使用5级分类的可能性等级形式[11]。

2017年，英国的French[58]调整了其意见表述形式，逐渐从英国立场说明框架下的一致性与独特性[59]转向可能性等级形式，在这一框架下，意见共分为13级，与英国法庭科学提供者协会（Association of Forensic Science Providers）推荐的标准[60]一致。荷兰NFI的Vermeulen[61]介绍了其得出“强烈支持”结论的依据，在实际案例中，NFI只有在检材与样本特征几乎相同或者说话人有言语障碍等高度独特性特征时才给出这种鉴定意见。

1.5 语音数据库及自动识别技术

目前，国际上司法语音及声学专门的语音数据库有英国的Nolan建立的DyVis[62]、澳大利亚的Morrison建立的FVCD[63]、西班牙的Ramos建立的AHUMADA[64]、荷兰的Vloed建立的NFI-FRITS[65]、法国的Ajili建立的FABIOLE[66]等。国内方面，我国的“全国公安机关声纹数据库”依然是国际上收录说话人最多的声纹鉴定语音数据库。2017年新建的VoxCeleb[67]则是比较新的代表。目前说话人自动识别技术的主流框架主要有两类，一种是高斯混合模型加通用背景模型（GMM-UBM），另一种是基于i向量（i-vector）空间的概率线性判别分析（PLDA）方法，同时开始使用深度神经网络（deep neural network,DNN）提取语音特征。后一种框架较新，因此成为2017年的研究热点。DNN提取语音特征的方法取得的效果较好，对训练数据量的要求也较大，我国的“全国公安机关声纹数据库”已经采用DNN方法提取特征。Park等[68]将嗓音音质声学特征引入采用这种架构的自动识别系统中，与MFCC特征结合，显著提升了短语音的识别率。Solewicz等[69]为解决现有的对数似然比（LLR）对处理说话人内部变异的不足提出了一种新的说话人自动识别系统性能指标——空假设对数似然比（Null-Hypothesis LLR）。Tschäpe等[70]考察了基于i向量系统的错误结果，发现如果加入地域信息，系统错误率会大大下降。Alexander等[71]设计了基于i向量的多说话人自动识别系统。Milošević[72]将基频、共振峰频率、共振峰带宽等音段特征（SF）与现有GMM-MFCC架构的自动识别系统相结合，提升了原有系统的识别正确率。

关于说话人自动识别在语音同一认定中的作用，目前仍有争议。比如，虽然德国、西班牙、瑞典等国的诉讼中已有接受专家干预自动识别方法鉴定结论的判例，但鉴于目前自动识别系统的性能，这种“接受”不仅在程度上有限，而且推广起来仍困难重重。以英国为例，英国JP French实验室的French与Harrison作为辩方专家证人在“女王诉斯雷德等人”（R v Slade&Ors）的上诉案件中提供了专家鉴定与自动识别系统两套语音同一认定证据，但是上诉法院驳回了自动识别系统的鉴定结论。 French[58]表示，虽然这宗判例并没有直接扼杀英国未来使用自动识别系统鉴定结论的希望，但是，鉴于英美法系的判例传统，除非未来说话人自动识别技术取得重大技术突破，否则不仅是英国，甚至包括加拿大、新西兰、澳大利亚等英联邦国家（共52个国家）都将驳回说话人自动识别系统的鉴定结论。

2 质量控制及标准化

质量控制方面，French等[73]提出了声纹鉴定实验室检验鉴定的透明化倡议，其将之称为“打开百叶窗”（opening the blinds）行动，并详细介绍了JP French实验室的检验流程。德国BKA的Wagner[74]则介绍了其语音同一认定的标准操作规程，并结合实际案例进行了演示。这种透明化与标准化的趋势是司法语音及声学中质量控制的主要方向。

标准化方面，我国的公安部颁布了司法语音及声学的四个公安安全行业标准，包括语音同一认定[11]、录音的真实性检验[12]、降噪及语音增强[13]和语音人身分析[14]四个专业方向。

3 语音人身分析

语音人身分析是指在只闻其声、不见其人的情况下，对说话人的社会群体属性和个体属性进行刻画；或在见其人但不知其身份的情况下，通过上述综合分析对其社会群体属性进行判断[4]。声纹鉴定实践中，还涉及对说话人的暂时状态与瞬时状态的分析刻画，如通过语音对说话人是否抽烟、吸毒进行分析，通过语音推测说话人心理状态语音情感分析[75]，我们也将之归入语音人身分析中去。

人工耳蜗的频率响应有自己的特点，Kovačić[76]研究了人工耳蜗对声音信号的处理特性，并探索其在说话人性别、体型、身分识别等方面的应用潜力。Georg[77]研究了德语的不同方言对年龄分析的影响，探索了不同方言对年龄推测的影响因素。Tomić[78]研究了通过口音负迁移推断说话人地域的方法。Jong-Lendle等[79]研究了从外国人的德语口音中推断其母语的方法。Schwab等[80]研究了抽烟对嗓音的影响，Rodmonga等[81]研究了吸毒后的言语听觉特征，其结果均可用于对说话人身体状态的分析。自动人身分析方面，Kelly等[82]设计了基于i向量的说话人自动刻画系统，能够自动分析说话人的性别、年龄及语种。Watt等[83]对自动口音识别与人工口音识别进行了比对研究。

语音情感分析方面，Kathiresan等[84]研究了MFCC中的语音情感信息。Hippey等[85]探索了在语音中识别懊悔情绪的方法。Bizozzero等[86]研究了女性说话人声音中的恐惧信息，主要涉及基频、语速以及音高对恐惧信息的影响。Satt等设计了一种使用卷积网络与递归网络两种神经网络工[87]具直接从声谱图中识别情感的方法。Zhang等[88]针对对话语音设计了一个情感交流与转换（EIT）模型挖掘对话中的交流与转换语中的情感信息，设计的算法比传统方法在正确率与精度方面各提升了18.8%与22.6%。Parthasarathy、Le等[89-90]对深度学习中的多任务学习方法在语音情感识别中的应用做了探索。除了一般性的情感识别外，语音测谎也是语音情感识别的研究热点。Schroder[91]使用合成分析方法（analysis-bysynthesis）将不同的发声方式、语速、颤音（tremolo）及基频与中性言语（neutral utterances）组合，分别判断各段语音的可信度。结果表明，当颤音与气息增加时，语音内容的可信度大大提升，当暂停与基频增加上，语音内容的可信度则下降。Mendels[92]使用CXD语料库比较了频谱集合、声学-韵律集合和用词特征集合对于谎言的表征程度，并使用混合深度模型对这些集合进行测试。

4 录音的真实性检验

录音真实性检验是指通过对录音资料进行语音学和声学、电磁学、信号处理技术等方面的分析检验，做出其是否经过剪辑的结论[4]。

2017年，Ali[93]等开发了一套自动系统，系统基于心理声学原理，准确率达99.2%。Catalin等[94]为了解决检验中无法获取原始录音器材的问题，将18年间的125中录音设备与40中商业录音软件的文件结构与格式做了全面介绍。Jeff等[95]研究了iOS系统中的音频文件，并基于决策树建立了针对此类文件的检验流程。Rashmika等[96]探讨了录音中的混响等噪音信息在真实性检验中的价值。

电网频率（ENF）检测方法是录音的真实性检验中的热点。关于这一方法的原理与具体内容，可参见以往文献[97-99]。Huang等[100]就ENF检验中的一些常见问题进行了讨论。James等[101]开发了基于云端的便携式ENF系统，从而避免了检验的地域限制。Huang等[102]提出用绝对误差图（absolute error map）联系检材音频与ENF数据库中的ENF信息，并据此构建的两套算法。Reis等[103]开发了基于ESPRITHilbert检测ENF的分析方法，结果大大优于其他方法。

国内方面，操文成[104]针对语音伪造的检测提出了两种新算法，漏检率均低于10%。孙蒙蒙[105]提出了适用于音频检测的共生向量特征，基于该特征的方法准确率可达95%。申小虎[106]等在系统分析数字音频文件篡改方法基本原理的基础上，使用多种频谱分析方法寻找音频文件的篡改特征，建立了有效的频谱检验的方法。

5 降噪及语音增强

降噪及语音增强是综合运用计算机技术、声学技术对录音资料进行降低噪音信号、增强语音信号的处理技术，目前主要的算法有自适应噪声抵消算法、统计模型算法、谱减法、听觉掩蔽算法，短时谱估计算法、子空间算法、小波变换算法等[4]。2017年，使用DNN方法降噪及语音增强成为热点。

在去混响及回声消除方面，Guzewich等[107]研究了使用DNN去混响的一种新方法。此前，相关研究[108-111]已经在使用DNN去混响方面取得了一定进展，新方法处理的音频在说话人比对系统中的等错误率由9.2%降至6.8%。Bulling等[112]提出了一种消除录音中回声的新方法，可以使信号的最大稳定增益（MSG）提升30分贝。在语音增强方面，Wu等[113]提出了基于局部线性嵌入（LLE）算法的差异补偿后置滤波（post-f i ltering）方法。Ogawa等[114]从基于深度神经网络的声学模型（DNN-AM）中提取出瓶颈特征（bottleneck features），然后使用噪音样例搜索（example search）的方法消除单声道音频中的高度不稳定噪音。Gelderblom等[115]提出了一种评价基于DNN的语音增强算法的主观评测方法。在非DNN方法上，Qian等[116]使用贝叶斯WaveNet方法直接就原始音频进行处理，也得到了不错的语音增强效果。在降噪方面，Pascual等[117]使用深度网络中的生成式对抗网络（generative adversarial network）降噪，并以主观与客观两种评测方法证明了这种方法的有效性。Maiti等[118]同时使用两个网络进行拼接再合成（concatenative resynthesis），大大提升了处理速度。值得注意的是，在司法实践中，背景噪音因为包含着有用信息，需要在降噪过程中保留甚至增强，这就需要实践中结合多种方法，消减目标噪音，保留有用信息，上述部分深度学习的方法因具有较强的灵活性便具有了更大的优势。

［1］李敬阳．音像物证技术第二章 : 声音物证技术［M］//李学军.新编物证技术学. 北京：北京交通大学出版社，2015：339-360．

［2］ HOLLIEN H．The acoustics of crime: the new science of forensic phonetics［M］．New York: Plenum Press, 1990.

［3］曹洪林，李敬阳，王英利，等．论声纹鉴定意见的表述形式［J］.证据科学，2013,21(5):605-624．

［4］王英利，李敬阳，曹洪林．声纹鉴定技术综述［J］．警察技术,2012(4):54-56．

［5］ Eriksson A. Aural/Acoustic vs. automatic methods in forensic phonetic casework［M］// NEUSTEIN A, PATIL H.A. In Forensic Speaker Recognition: Law Enforcement and Counter-Terrorism. New York: Springer, 2011: 41-69．

［6］ GOLD E, FRENCH P. International practices in forensic speaker comparison［J］. International Journal of Speech Language and the Law, 2011, 18(2): 293-307.

［7］ MORRISON G S, SAHITO F H, JARDINE G, et al. Interpol survey of the use of speaker identif i cation by law enforcement agencies［J］. Forensic Science International, 2016, 263(3): 92-100.

［8］ NOLAN F. The phonetic bases of speaker recognition［M］. Cambridge, UK: Cambridge University Press, 1983.

［9］ HOLLIEN H, DIDLA G, HARNSBERGER J D, et al. The case for aural perceptual speaker identif i cation［J］. Forensic Science International, 2016, 269(3) :8-20.

［10］ ROSE P. Forensic speaker identif i cation［M］. London: Taylor and Francis, 2002.

［11］中华人民共和国公安部，法庭科学语音同一认定技术规范：GA/T 1433-2017［S］.北京：中国标准出版社，2017.

［12］中华人民共和国公安部，法庭录音的真实性检验技术规范：GA/T 1432-2017 ［S］. 北京：中国标准出版社，2017.

［13］中华人民共和国公安部，法庭科学降噪及语音增强技术规范：GA/T 1431-2017［S］ .北京：中国标准出版社，2017.

［14］中华人民共和国公安部，法庭科学语音人身分析技术规范：GA/T 1430-2017［S］ .北京：中国质检出版社，2017.

［15］中华人民共和国司法部司法鉴定管理局，录音资料鉴定规范：SF/Z JD0301001-2010［S］.北京：中国标准出版社，2010.

［16］ CAIN S. American Board of Recorded Evidence-Voice Comparison Standards［EB/OL］. (1998)［ 2017-10-15］. http://www.forensictapeanalysisinc.com/Articles/voice_comp.htm

［17］ SUNDQVIST M, LEINONEN T, LINDH J, et al. Blind test procedure to avoid bias in perceptual analysis for forensic speaker comparison casework［C］// IAFPA . Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017: 45-47.

［18］ LINDH J, NAUTSCH A, LEINONEN T, et al. Comparison between perceptual and automatic systems on fi nnish phone speech data (FinEval1) - a pilot test using score simulations［C］// IAFPA.Proceedings of IAFPA2017. Split, Croatia:IAFPA,2017:86-87.

［19］ LEINONEN T, LINDH J, AKESSON J. Creating linguistic feature set templates for perceptual forensic speaker comparison in fi nnish and swedish［C］// IAFPA. Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017:126-128.

［20］ LAND E, GOLD E. Speaker identif i cation using laughter in a close social network［C］ // IAFPA. Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017: 99-101.

［21］ SKARNITZL R, RŮŽIČKOVÁ A. The malleability of speech production: An examination of sophisticated voice disguise［C］ // IAFPA. Proceedings of IAFPA2017. Split,Croatia:IAFPA,2017:59-60.

［22］ RŮŽIČKOVÁ A, SKARNITZL R. Voice disguise strategies in Czech male speakers［J］. AUC Philologica, Phonetica Pragensia.2017.

［23］ DELVAUX V, CAUCHETEUX L, HUET K, et al. Voice disguise vs. Impersonation: Acoustic and perceptual measurements of vocal flexibility in non-experts［C］// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:3777-3781.

［24］ JESSEN M. Speaker-specif i c information in voice quality parameters［J］, Forensic Linguistics 1997, 4 (1):84-103.

［25］ KÖSTER O, KÖSTER J P. The auditory-perceptual evaluation of voice quality in forensic speaker recognition［J］. The Phonetician, 2004,89: 9–37.

［26］ NOLAN F. Voice quality and forensic speaker identif i cation［J］.GOVOR XXIV 2007. 24(2):111-128.

［27］ KÖSTER O, JESSEN M , KHAIRI F, et al. Auditory-perceptual identif i cation of voice quality by expert and non-expert listeners［C］. ICphS XVI, 2007:1845-1848.

［28］ SEGUNDO E, ALVES H , TRINIDAD M F. CIVIL corpus:voice quality for speaker forensic comparison［J］. Proceida, Social and Behavioral Science. 2013,95(4): 587-593.

［29］ FRENCH P. Developing the vocal prof i le analysis scheme for forensic voice comparison［C］. York, UK:IAFPA, 2016.

［30］ SEGUNDO E. A simplif i ed vocal prof i le analysis protocol for the assessment of voice quality and speaker similarity［J］. Journal of Voice. 2017,31(5):11-27.

［31］ SEGUNDO E, BRAUN A, HUGHES V, et al. Speaker-similarity perception of Spanish twins and non-twins by native speakers of Spanish, German and English［C］// IAFPA. Proceedings of IAFPA2017. Split, Croatia:IAFPA,2017:159-162.

［32］ KLUG K. Refining the Vocal Profile Analysis (VPA) scheme for forensic purposes［C］// IAFPA. Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017. 190-191.

［33］ HUGHES V, HARRISON P , FOULKES P, et al. Mapping across feature spaces in forensic voice comparison: the contribution of auditory-based voice quality to (semi-)automatic system testing［C］// ISCA. Proceedings of Interspeech2017. Stockholm,Sweden: ISCA , 2017:3892-3896.

［34］ HUGHES V, HARRISON P, P FOULKES, et al. The complementarity of automatic, semi-automatic, and phonetic measures of vocal tract output in forensic voice comparison［C］. // IAFPA.Proceedings of IAFPA2017. Split, Croatia:IAFPA,2017:83-85.

［35］ NOLAN F. Speaker identification evidence: its forms, limitations, and roles［C］//Proceedings of the conference’ Law and Language: Prospect and Retrospect’ . University of Lapland,2001.

［36］ NOLAN F. Voice［M］// BOGAN P S, ROBERTS A. In identif i cation: investigation, trial and scientif i c evidence. Jordan Publishing ,2011:381-390.

［37］ HEUVEN V, CORTES P. Speaker specificity of filled pauses compared with vowels and consonants in Dutch［C］ // IAFPA .Proceedings of IAFPA2017. Split, Croatia:IAFPA,2017: 48-49.

［38］ GOLD E, ROSS S, EARNSHAW K. Delimiting the West Yorkshire population: Examining the regional-specif i city of hesitation markers［C］ // IAFPA .Proceedings of IAFPA2017. Split,Croatia:IAFPA,2017:50-52.

［39］ HE L, DELWO V. Between-speaker intensity variability is maintained in different frequency bands of amplitude demodulated signal［C］// IAFPA .Proceedings of IAFPA2017. Split,Croatia:IAFPA,2017:55-58.

［40］ DORREEN K, PAPP V. Bilingual speakers’ long-term fundamental frequency distributions as cross-linguistic speaker discriminants［C］ // IAFPA .Proceedings of IAFPA2017. Split,Croatia:IAFPA,2017:61-64.

［41］ ARANTES P, ERIKSSON A, GUTZEIT. Effect of language,speaking style and speaker on long-term f0 estimation［C］ //ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden:ISCA ,2017:3897-3901.

［42］ DIMOS K, DELLWO V, HE L. Rhythm and speaker-specif i c variability in shouted speech［C］ // IAFPA .Proceedings of IAFPA2017. Split, Croatia:IAFPA,2017:102-104.

［43］ LOPEZ A, SAEIDI R, JUVELA L, et al. Normal-to-shouted speech spectral mapping for speaker recognition under vocal effort mismatch［C］// ICASSP. Proceedings of ICASSP2017.ICASSP,2017:4940-4944.

［44］ HE L, DELLWO V. Speaker-specific temporal organizations of intensity contours［C］ // IAFPA .Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017:163-166.

［45］ VAROŠANEC-ŠKARIĆ G, BAŠIĆ I, KIŠIČEK G. Comparison of vowel space of male speakers of Croatian, Serbian and Slovenian language［C］ // IAFPA .Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017: 142-146.

［46］ MCDOUGALL K, DUCKWORTH M. Fluency prof i ling for forensic speaker comparison: a comparison of syllable- and timebased approaches［C］ // IAFPA .Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017:129-131.

［47］ WANG L, KANG J, LI J, et al. Speaker-specif i c dynamic features of diphthongs in Standard Chinese ［C］ // IAFPA . Proceedings of IAFPA2017. Split, Croatia:IAFPA,2017: 91-95.

［48］ HEEREN W. Speaker-dependency of /s/ in spontaneous telephone conversation［C］ // IAFPA . Proceedings of IAFPA2017.Split, Croatia:IAFPA,2017:68-71.

［49］ FRANCHINI S. Construction of a voice profile: An acoustic study of /l/［C］ // IAFPA . Proceedings of IAFPA2017. Split,Croatia:IAFPA,2017:183-186.

［50］ FINGERLING B. Constructing a voice prof i le: Reconstruction of the L1 vowel set for a L2 speaker［C］ // IAFPA . Proceedings of IAFPA2017. Split, Croatia:IAFPA,2017:197-199.

［51］ RHODES R, FRENCH P, HARRISON P, et al. Which questions,propositions and ‘relevant populations’ should a speaker comparison expert assess［C］// IAFPA. Proceedings of IAFPA2017.Split, Croatia: IAFPA. 2017: 40-44.

［52］ HUGHES V, WORMALD J. WikiDialects: a resource for assessing typicality in forensic voice comparison［C］ // IAFPA.Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 154-155.

［53］ HUGHES V, FOULKES P. What is the relevant population?Considerations for the computation of likelihood ratios in forensic voice comparison［C］ // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:3772-3776.

［54］ AJILI M, BONASTRE J, KHEDER W, et al. Phonetic content impact on forensic voice comparison［C］// Spoken Language Technology Workshop(SLT), 2016 IEEE. IEEE, 2016:210–217.

［55］ AJILI M, BONASTRE J, ROSSATTO S, et al. Inter-speaker variability in forensic voice comparison: a preliminary evaluation［C］//2016 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). IEEE, 2016:2114–2118.

［56］ DODDINGTON G, LIGGETT W, MARTIN A, et al. Sheep,goats, lambs and wolves: A statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation［C］//Tech. Rep. DTIC Document, 1998.

［57］ AJILI M, BONASTRE J, KHEDER W, et al. Homogeneity measure impact on target and non-target trials in forensic voice comparison［C］ // ISCA. Proceedings of Inter speech 2017.Stockholm, Sweden: ISCA ,2017:2844-2848.

［58］ FRENCH P. A developmental history of forensic speaker comparison in the UK［J］. English Phonetics, 2017: 271-286.

［59］ FRENCH P, HARRISON P. Position statement concerning use of impressionistic likelihood terms in forensic speaker comparison cases［J］. International Journal of Speech Language and the Law, 2007, 14(1): 137-144.

［60］ Association of Forensic Science Providers. Standards for the formulation of evaluative forensic science expert opinion［J］. Science and Justice 2009(49):161-164.

［61］ VERMEULEN J, CAMBIER-LANGEVELD T. Outstanding cases: about case reports with a “strong” conclusion［C］ // IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 31-33.

［62］ NOLAN F, MCDOUGALL K, JONG G D, et al. A forensic phonetic study of ‘dynamic’ sources of variability in speech: the dyvis project［C］//Proceedings of the 11th Australian International Conference on Speech Science & Technology．University of Auckland, 2006: 13-18.

［63］ MORRISON G S, ZHANG C, ENZINGER E, et al. Forensic voice comparison databases［DB/OL］, 2015. http://www.forensic-voice-comparison.net/

［64］ RAMOS D, GONZALEZ-RODRIGUEZ J, LUCENA-MOLINA J J. Addressing database mismatch in forensic speaker recognition with Ahumada III: a public real-casework database in Spanish［C］. International Speech Communication Association.2008.

［65］ VLOED V D, BOUTEN J, LEEUWEN D. NFI-FRITS: A forensic speaker recognition database and some fi rst experiments［C］//Proceedings of Odyssey: The Speaker and Language Recognition Workshop. 2014:6-13.

［66］ AJILI M, BONASTRE J, ROSSATO S. FABIOLE, a Speech database for forensic speaker comparison［C］// Proceedings of LREC-Conference, Slovenia. 2016:726-733.

［67］ NAGRANI A, CHUNG J, ZISSERMAN A. VoxCeleb: a largescale speaker identif i cation dataset［J］. Sound. 2017.

［68］ PARK S J, YEUNG G, KREIMAN J, et al. Using voice quality features to improve short-utterance, text-independent speaker verification systems［C］ // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:1522-1526.

［69］ SOLEWICZ Y, JESSEN M, VAN DER VLOED. Null-Hypothesis LLR: a proposal for forensic automatic speaker recognition［C］// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden:ISCA ,2017:2849-2853.

［70］ TSCHÄPE N. Analysis of i-vector-based false-accept trials in a dialect labelled telephone corpus［C］ // IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 65-67.

［71］ ALEXANDER A. Not a lone voice: automatically identifying speakers in multi-speaker recordings［C］ // IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 80-82.

［72］ MILOŠEVIĆ M, GLAVITSCH U. Combining Gaussian mixture models and segmental feature models for speaker recognition［C］// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden:ISCA ,2017:2042-2043.

［73］ FRENCH J, HARRISON P, KIRCHHÜBEL C, et al. From receipt of recordings to dispatch of report: opening the blinds on lab practices［C］ // IAFPA .Proceedings of IAFPA2017. Split,Croatia: IAFPA. 2017: 29-30.

［74］ WAGNER I. The BKA standard operation procedure of forensic speaker comparison and examples of case work［C］ // IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 34-36.

［75］韩文静，李海峰，阮华斌，等 .语音情感识别研究进展综述［J］.软件学报，2014, 25(1):37-50.

［76］ KOVAČIĆ D. Voice gender identification in cochlear implant users ［C］ // IAFPA .Proceedings of IAFPA2017. Split, Croatia:IAFPA. 2017: 23-25.

［77］ GEORG A. The effect of dialect on age estimation［C］ // IAFPA.Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 118-121.

［78］ TOMIĆ K. Cross-language accent analysis for determination of origin［C］ // IAFPA .Proceedings of IAFPA2017. Split, Croatia:IAFPA. 2017: 171-173.

［79］ JONG-LENDLE G, KEHREIN R, URKE F, et al. Language identif i cation from a foreign accent in German［C］ // IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 135-138.

［80］ SCHWAB S, AMATO M, DELLWO V, et al. Can we hear nicotine craving［C］ // IAFPA .Proceedings of IAFPA2017. Split,Croatia: IAFPA. 2017: 115-117.

［81］ RODMONGA P, TATIANA A, NIKOLAY B, et al. Perceptual auditory speech features of drug-intoxicated female speakers(preliminary results)［C］ // IAFPA .Proceedings of IAFPA2017.Split, Croatia: IAFPA. 2017: 118-121.

［82］ KELLY F, FORTH O, ATREYA A, et al. What your voice says about you: automatic speaker profiling using i-vectors［C］ //IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA.2017: 72-75.

［83］ WATT D, JENKINS M, BROWN G. Performance of human listeners vs. the Y-ACCDIST automatic accent classif i er in an accent authentication task［C］ // IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA. 2017: 139-141.

［84］ KATHIRESAN T, DELLWO V. Cepstral dynamics in MFCCs using conventional deltas for emotion and speaker recognition［C］ // IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA.2017: 105-108.

［85］ HIPPEY F, GOLD E. Detecting remorse in the voice: A preliminary investigation into the perception of remorse using a voice line-up methodology［C］ // IAFPA .Proceedings of IAFPA2017.Split, Croatia: IAFPA. 2017: 179-182.

［86］ BIZOZZERO S, NETZSCHWITZ N, LEEMANN A. The effect of fundamental frequency f0, syllable rate and pitch range on listeners’ perception of fear in a female speaker’s voice［C］// IAFPA .Proceedings of IAFPA2017. Split, Croatia: IAFPA.2017: 174-178.

［87］ SATT A, ROZENBERG S, HOORY R. Eff i cient emotion recognition from speech using deep learning on spectrograms［C］ //ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden:ISCA ,2017:1089-1093.

［88］ ZHANG R, ATSUSHI A, KOBASHIKAWA S, et al. Interaction and transition model for speech emotion recognition in dialogue［C］ // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017: 1094-1097.

［89］ PARTHASARATHY S, BUSSO C. Jointly predicting arousal,valence and dominance with multi-task learning［C］ // ISCA.Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:1103-1107.

［90］ LE D, ALDENEH Z, PROVOST E. Discretized continuous speech emotion recognition with multi-task deep recurrent neural network［C］ // ISCA. Proceedings of Inter speech 2017.Stockholm, Sweden: ISCA ,2017:1108-1112.

［91］ SCHRODER A, STONE S, BIRKHOLZ P. The sound of deception - what makes a speaker credible［C］ // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:1467-1471.

［92］ MENDELS G, LEVITAN S, LEE K. Hybrid acoustic-lexical deep learning approach for deception detection［C］ // ISCA.Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:1472-1476.

［93］ ALI Z, IMRAN M, ALSULAIMAN M. An automatic digital audio authentication/forensics system［J］. Digital Object Identif i er.2017(5):2994-3007.

［94］ GRIGORAS C, SMITH J. Large scale test of digital audio fi le structure and format for forensic analysis［C］. 2017 AES International Conference on Audio Forensics,2017.

［95］ SMITH J, LACEY D, KOENIG B, et al. Triage approach for the forensic analysis of apple ios audio fi les recorded using the “voice memos” app［C］. 2017 AES International Conference on Audio Forensics,2017.

［96］ PATOLE R, KORE G, REGE P. Reverberation based tampering detection in audio recordings［C］. 2017 AES International Conference on Audio Forensics,2017.

［97］ Advisory Panel of White House Tapes. The EOB Tape of June 20, 1972: Report on a Technical Investigation Conducted for the U.S. District Court for the District of Columbia［R］. 1974.

［98］ GRIGORAS C. Application of ENF Analysis Method in Forensic Authentication of Digital Audio and Video Recordings［J］. Journal of the Audio Engineering Society, 2007, 57 (9) :643-661.

［99］ GRIGORAS C. Statistical Tools for Multimedia Forensics［C］.39th International Conference: Audio Forensics: Practices and Challenges, 2010.

［100］ HUA G, THING V. On practical issues of electric network frequency based audio forensics［J］. IEEE Transactions on Information Forensics & Security,2017(5): 20640-20651.

［101］ JAMES Z, GRIGORAS C, SMITH J. A low cost, cloud based, portable, remote ENF system［C］. 2017 AES International Conference on Audio Forensics ,2017.

［102］ HUA G, ZHANG Y, GOH J. Audio authentication by exploring the absolute-error-map of ENF signals［J］. IEEE Transactions on Information Forensics & Security,2016(5)：1003-1016.

［103］ REIS P M G, MIRANDA R, GALDO G. ESPRIT-Hilbert based audio tampering detection with SVM classif i er for forensic analysis via electrical network frequency［J］. IEEE Transactions on Information Forensics & Security, 2017(4):853-864.

［104］操文成.语音伪造盲检测技术研究［D］.成都：西南交通大学, 2017.

［105］孙蒙蒙.录音真实性辨识和重翻录检测［D］.深圳：深圳大学, 2017.

［106］申小虎，金恬，张长珍，等.录音资料真实性鉴定的频谱检验技术研究［J］. 刑事技术 , 2017,42(3):173-177.

［107］ GUZEWICH P, ZAHORIAN S. Improving speaker verif i cation for reverberant conditions with deep neural network dereverberation processing［C］ // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:171-175.

［108］ HAN K, WANG Y, WANG D. Learning spectral mapping for speech dereverberation［C］// 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP),2014:4661-4665.

［109］ HAN K, WANG Y, WANG D, et al. Learning spectral mapping for speech dereverberation and denoising［J］. IEEE/ACM Transactions on Audio, Speech,and Language Processing,2015,23 (6) :982-992.

［110］ WU B, LI K, YANG M, et al. A study on target feature activation and normalization and their impacts on the performance of DNN based speech dereverberation systems［C］. 2016 Asia-Pacif i c Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016.

［111］ WU B, LI K, YANG M, Et al. A reverberation-time-aware approach to speech dereverberation based on deep neural networks［J］. IEEE/ACM transactions on audio, speech, and language processing, 2017,25(1):102-111.

［112］ BULLING P, LINHARD K, WOLF A, et al. Stepsize control for acoustic feedback cancellation based on the detection of reverberant signal periods and the estimated system distance［C］// ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden:ISCA ,2017:176-180.

［113］ WU Y C, HWANG H, WANG S, et al. A post-f i ltering approach based on locally linear embedding difference compensation for speech enhancement［C］ // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:1953-1957.

［114］ OGAWA A, KINOSHITA K, DELCROIX M, et al. Improved example-based speech enhancement by using deep neural network acoustic model for noise robust example search［C］ //ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden:ISCA ,2017:1963-1967.

［115］ GELDERBLOM F B, GRONSTAD T, VIGGEN E. Subjective intelligibility of deep neural network-based speech enhancement［C］ // ISCA. Proceedings of Inter speech 2017.Stockholm, Sweden: ISCA ,2017:1968-1972.

［116］ QIAN K, ZHANG Y, CHANG S, et al. Speech enhancement using bayesian wavenet［C］ // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:2013-2017.

［117］ PASCUAL S, BONAFONTE A, SERRA J. SEGAN: Speech enhancement generative adversarial network［C］ // ISCA. Proceedings of Inter speech 2017. Stockholm, Sweden: ISCA ,2017:3642-3646.

［118］ MAITI S, MANDEL M. Concatenative resynthesis using twin networks［C］ // ISCA. Proceedings of Inter speech 2017.Stockholm, Sweden: ISCA ,2017:3647-3651.