[1]张 巍,王 洋,刘东宁,等.基于随机聚类方法建模的序列分析[J].江西师范大学学报(自然科学版),2017,(05):470-475.
 ZHANG Wei,WANG Yang,LIU Dongning,et al.The Sequence Analysis Method Based on Random Clustering Model[J].,2017,(05):470-475.
点击复制

基于随机聚类方法建模的序列分析()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2017年05期
页码:
470-475
栏目:
出版日期:
2017-11-01

文章信息/Info

Title:
The Sequence Analysis Method Based on Random Clustering Model
作者:
张 巍王 洋刘东宁滕少华张 莉徐新爱
1.广东工业大学计算机学院,广东 广州 510006; 2.南昌师范学院,江西 南昌 330032
Author(s):
ZHANG WeiWANG YangLIU DongningTENG ShaohuaZHANG LiXU Xinai
1.School of Computer Science and Technology,Guangdong University of Technology,Guangzhou Guangdong 510006,China; 2.Nanchang Normal University,Nanchang Jiangxi 330032,China
关键词:
随机聚类算法 序列分析 系统发育
Keywords:
random clustering algorithm sequence analysis phylogeny
分类号:
TP 391
文献标志码:
A
摘要:
大数据下的系统发育估计是一个组合优化问题,在有限计算时间内,现有算法很难为大量序列数据的分析提供最优解.基于前人启发式算法,提出了一种系统发育树随机聚类建树方法,可在较短时间内为系统发育过程产生的大规模序列数据提供所有具有进化意义的解及最优解,以揭示发育过程中的序列进化关系.实验结果表明,该随机聚类方法是行之有效的,对生物计算及系统发育相关领域研究具有积极意义.
Abstract:
Large phylogeny estimation is a combinatorial optimization problem that no future computer will ever be able to solve exactly in practical computing time.Here,a tree constructing approach has been reported,the random clustering method,involving several pruning of trees that are used to provide the optimal solution and near-optimal solution with evolutionary significances,to reveal the complete evolutionary relationships based on basis of previous studies.The experiments show the correctness and efficiency of our method,and the significances to biocomputing and phylogenetic analysis.

参考文献/References:

[1] Howe K,Bateman A,Durbin R.QuickTree:building huge neighbour-joining trees of protein sequences [J].Bioinformatics,2002,18(11):1546-1547.
[2] Excoffier L,Slatkin M.Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population [J].Molecular biology and evolution,1995,12(5):921-927.
[3] Kolaczkowski B,Thornton J W.Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous [J].Nature,2004,431(7011):980-984.
[4] Seo T K,Kishino H.Statistical comparison of nucleotide,amino acid,and codon substitution models for evolutionary analysis of protein-coding sequences [J].Systematic biology,2009,58(2):199-210.
[5] 高灵渲,张巍,霍颖翔,等.改进的聚类模式过滤推荐算法 [J].江西师范大学学报:自然科学版,2012,36(1):106-110.
[6] 韩娜,滕少华,房小兆.基于哈达玛变换的多元时间序列聚类研究 [J].计算机工程与设计,2012,33(3):983-986,1021.
[7] Ronquist F,Teslenko M,van der Mark P,et al.MrBayes 3.2:efficient Bayesian phylogenetic inference and model choice across a large model space [J].Systematic biology,2012,61(3):539-542.
[8] Drummond A J,Rambaut A.BEAST:Bayesian evolutionary analysis by sampling trees [J].BMC evolutionary biology,2007,7(1):214.
[9] Zwickl D J.GARLI:genetic algorithm for rapid likelihood inference [EB/OL].
[2016-12-27].See http://www.bio.utexas.edu/faculty/antisense/garli/Garli.html,2006.
[10] Minh B Q,Vinh L S,Von Haeseler A,et al.pIQPNNI:parallel reconstruction of large maximum likelihood phylogenies [J].Bioinformatics,2005,21(19):3794-3796.
[11] Schmidt H A,von Haeseler A.Phylogenetic inference using maximum likelihood methods [J].The phylogenetic handbook:a practical approach to phylogenetic analysis and hypothesis testing,2009(2):512-522.
[12] Altschul S F,Gish W,Miller W,et al.Basic local alignment search tool [J].Journal of Molecular Biology,1990,215(3):403-410.
[13] Swofford D L,Documentation B.Phylogenetic analysis using parsimony [M].IL:Illinois Natural History Survey,Champaign,IL,1991.
[14] Tamura K,Nei M.Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees [J].Molecular biology and evolution,1993,10(3):512-526.
[15] Stamatakis A.RAxML version 8:a tool for phylogenetic analysis and post-analysis of large phylogenies [J].Bioinformatics,2014,30(9):1312-1313.
[16] Sük?sd Z,Knudsen B,Kjems J,et al.PPfold 3.0:fast RNA secondary structure prediction using phylogeny and auxiliary data [J].Bioinformatics,2012,28(20):2691-2692.
[17] L?ytynoja A,Vilella A J,Goldman N.Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm [J].Bioinformatics,2012,28(13):1684-1691.
[18] Markova-Raina P,Petrov D.High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes [J].Genome research,2011,21(6):863-874.
[19] Pruesse E,Peplies J,Gl?ckner F O.SINA:accurate high-throughput multiple sequence alignment of ribosomal RNA genes [J].Bioinformatics,2012,28(14):1823-1829.
[20] Güyer T,Atasoy B,Somyürek S.Measuring disorientation based on the Needleman-Wunsch algorithm [J].The International Review of Research in Open and Distributed Learning,2015,16(2):316-322.
[21] Vinh L S,von Haeseler A.IQPNNI:moving fast through tree space and stopping in time [J].Molecular biology and evolution,2004,21(8):1565-1571.

备注/Memo

备注/Memo:
收稿日期:2017-02-19基金项目:国家自然科学基金资助项目(61402118,61673123),广东省科技计划(2015B090901016,2016B010108007),广东省教育厅项目(粤教高函[2014]97号,粤教高函[2015]133号),广州市科技计划(2016201604030034,201508010067,201604046017),江西省教育厅科技研究(GJJ151255)和南昌师范学院基金(15KJZD39)资助项目.作者简介:张 巍(1964-),女,江西南昌人,教授
更新日期/Last Update: 1900-01-01