[1]霍颖翔,滕少华*.语音共振峰包络的增量频移字典学习方法[J].江西师范大学学报(自然科学版),2019,(04):394-401.[doi:10.16357/j.cnki.issn1000-5862.2019.04.11]
 HUO Yingxiang,TENG Shaohua*.The Online Dictionary Learning Method of Incremental Frequency Shift for Speech Formant Envelopes[J].Journal of Jiangxi Normal University:Natural Science Edition,2019,(04):394-401.[doi:10.16357/j.cnki.issn1000-5862.2019.04.11]
点击复制

语音共振峰包络的增量频移字典学习方法()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2019年04期
页码:
394-401
栏目:
信息科学与技术
出版日期:
2019-08-10

文章信息/Info

Title:
The Online Dictionary Learning Method of Incremental Frequency Shift for Speech Formant Envelopes
文章编号:
1000-5862(2019)04-0394-08
作者:
霍颖翔滕少华*
广东工业大学计算机学院,广东 广州 510006
Author(s):
HUO YingxiangTENG Shaohua*
School of Computer Science and Technology,Guangdong University of Technology,Guangzhou Guangdong 510006,China
关键词:
语音流 有损压缩 在线字典学习
Keywords:
voice stream lossy compression online dictionary learning
分类号:
TP 301.6
DOI:
10.16357/j.cnki.issn1000-5862.2019.04.11
文献标志码:
A
摘要:
数字语音在当今应用非常广泛,大量的语音流产生了巨大的网络带宽和服务器存储空间的消耗.因此,在保持听觉效果基本不受影响的前提下,对语音进行有损压缩,降低其比特率是非常重要的.针对压缩语音的共振峰包络提出了一种新颖的在线字典学习方法.不同于一般的线性方法,该方法通过对字典中的原子进行频移,使其能更好地进行共振峰拟合.通过使用希尔伯特变换,能快速并精确地确定最优频移量.实验结果表明,在还原近似度下限为99.5%的前提下,经过该方法压缩后,比特数比原包络平均减少了99%.因此,该方法能适用于对传输带宽或存储空间有严格要求的场合,同时保证解压后的语音听觉比较自然.
Abstract:
Digital voice is widely used nowadays.The uncountable and continuous generating voice streams consume huge amount of network bandwidth and hard disk space.Hence,under the pre-requisition of keeping the perceptual quality,it is important to compress these voice signals to achieve minimum bit rates.Therefore,a novel lossy compression algorithm for speech formant envelopes based on online dictionary learning method is proposed.By utilizing Hilbert transform,the proposed method efficiently shifts the atoms of the dictionary over the frequency for better fitting the formant envelopes.Experimental results show that when the minimum reconstruction quality threshold is 99.5%,in comparison of the uncompressed envelope data,the proposed method achieves 99% bit rate reduction on average.Hence,this method is applicable to scenarios that has limited band width and storage capacity,and meanwhile can keep good perceptual quality.

参考文献/References:

[1] Sunnydayal V,Kumar T K.Speech enhancement using posterior regularized NMF with bases update[J].Computers and Electrical Engineering,2017,62:663-675.
[2] Gunawan T S,Khalifa O O,Shafie A A,et al.Speech compression using compressive sensing on a multicore system[EB/OL].[2019-02-05].http://ieeexpolre.ieee.org/document/5937130#.
[3] Al-Azawi M K M,Gaze A M.Combined speech compression and encryption using chaotic compressive sensing with large key size[J].IET Signal Processing,2018,12(2):214-218.
[4] Grosse R,Raina R,Kwong H,et al.Shift-invariant sparse coding for audio classification[EB/OL].[2019-01-13].https://arxiv.org/abs/1206.5241.
[5] Févotte C,Bertin N,Durrieu J L.Nonnegative matrix factorization with the itakurasaito divergence:with application to music analysis[J].Neural Computation,2009,21(3):793-830.
[6] Zibulevsky M,Pearlmutter B A.Blind source separation by sparse decomposition in a signal dictionary[J].Neural Computation,2001,13(4):863-882.
[7] Elad M,Aharon M.Image denoising via sparse and redundant representations over learned dictionaries[J].IEEE Transactions on Image Processing,2006,15(12):3736-3745.
[8] Mairal J,Elad M,Sapiro G.Sparse representation for color image restoration[J].IEEE Transactions on Image Processing,2008,17(1):53-69.
[9] Mairal J,Bach F,Ponce J,et al.Supervised dictionary learning[J].Advances in Neural Information Processing Systems,2009,21:1033-1040.
[10] Bradley D M,Bagnell J A.Differentiable sparse coding[J].Advances in Neural Information Processing Systems,2009,21:113-120.
[11] Yang Jianchao,Yu Kai,Gong Yihong,et al.Linear spatial pyramid matching using sparse coding for image classification[C].2009 IEEE Conference on Computer Vision and Pattern Recognition,2009:1794-1801.
[12] Lu Xuan,Wang Dingwen,Shi Wenxuan,et al.Group-based single image super-resolution with online dictionary learning[J].Geomatics and Information Science of Wuhan University,2016,2016(1):84.
[13] Peyré G.Sparse modeling of textures[J].Journal of Mathematical Imaging and Vision,2009,34(1):17-31.
[14] Warmuth M K,Kuzmin D.Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension[J].Journal of Machine Learning Research,2008,9:2287-2320.
[15] Mairal J,Bach F,Ponce J,et al.Online dictionary learning for sparse coding[C].Proceedings of the 26th Annual International Conference on Machine Learning,2009:689-696.
[16] Mensch A,Mairal J,Thirion B,et al.Stochastic subsampling for factorizing huge matrices[J].IEEE Transactions on Signal Processing,2017,66(1):113-128.
[17] Liu Jialin,Garcia-Cardona C,Wohlberg B,et al.Online convolutional dictionary learning[C].2017 IEEE International Conference on Image Processing(ICIP),2017:1707-1711.
[18] Jolliffe I T.Principal component analysis[M].New York:Springer-Verlag,2005.
[19] Hinton G E,Salakhutdinov R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507.
[20] Schölkopf B,Smola A,Müller K-R.Kernel principal component analysis[C].Artificial Neural Networks-ICANN,1997,1997:583-588.
[21] Schmitz M A,Heitz M,Bonneel N,et al.Wasserstein dictionary learning:optimal transport-based unsupervised nonlinear dictionary learning[J].SIAM Journal on Imaging Sciences,2018,11(1):643-678.
[22] Aharon M,Elad M,Bruckstein A.K-SVD:an algorithm for designing of overcomplete dictionaries for sparse representations[J].IEEE Transactions on Signal Processing,2006,54(11):4311-4322.
[23] Pirker G,Wohlmayr M,Petrik S,et al.A pitch tracking corpus with evaluation on multipitch tracking scenario[C].INTERSPEECH 2011,12th Annual Conference of the International Speech Communication Association,2011:1509-1512.

备注/Memo

备注/Memo:
收稿日期:2019-03-10
基金项目:国家自然科学基金(61772141,61402118,61673123,61603100,61702110),广东省科技计划(2016B010108007),广东省教育厅(粤教高函[2018]179号,粤教高函[2018]1号,粤教高函[2015]113号,粤教高函[2014]97号)和广州市科技计划(201802030011,201802010026,201802010042,201604020145,201604046017)资助项目.
通信作
更新日期/Last Update: 2019-08-10