[1]肖存威,石海鹤*,王 岚,等.基于混合策略的de novo序列拼接算法构造[J].江西师范大学学报(自然科学版),2022,(03):300-307.[doi:10.16357/j.cnki.issn1000-5862.2022.03.13]
 XIAO Cunwei,SHI Haihe*,WANG Lan,et al.The Construction of de novo Sequence Assembly Algorithm Based on Hybrid Strategy[J].Journal of Jiangxi Normal University:Natural Science Edition,2022,(03):300-307.[doi:10.16357/j.cnki.issn1000-5862.2022.03.13]
点击复制

基于混合策略的de novo序列拼接算法构造()
分享到:

《江西师范大学学报》(自然科学版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2022年03期
页码:
300-307
栏目:
信息科学与技术
出版日期:
2022-05-25

文章信息/Info

Title:
The Construction of de novo Sequence Assembly Algorithm Based on Hybrid Strategy
文章编号:
1000-5862(2022)03-0300-08
作者:
肖存威石海鹤*王 岚程柏良
江西师范大学计算机信息工程学院,江西 南昌 330022
Author(s):
XIAO CunweiSHI Haihe*WANG LanCHENG Bailiang
School of Computer and Information Engineering,Jiangxi Normal University,Nanchang Jiangxi 330022,China
关键词:
de novo序列拼接 混合策略 领域特征建模 产生式编程 形式化方法
Keywords:
de novo sequences assembly mixed strategy domain feature modeling generative programming formal method
分类号:
Q 344; TP 312
DOI:
10.16357/j.cnki.issn1000-5862.2022.03.13
文献标志码:
A
摘要:
在对3种de novo(从头)序列拼接的基本策略进行分析的基础上,该文研究了混合策略序列拼接算法的构造过程,从而整合多个单一策略优点; 再利用形式化方法和形式化平台方面的优势,结合领域分析建模和产生式编程的方法,构造了2个基于OLC策略的算法(OLC_assembly_1,OLC_assembly_2)及1个基于DBG策略的算法(DBG_assembly),进一步组装出在(OLC+DBG)→OLC混合模式下的算法(简称ODO算法); 最后,从GenBank中选取了3个实验样本,从N50、Contigs number、Coverage等角度,比较了在3个单一策略下的算法和ODO构造算法的拼接结果,分析了coverage depth和k值的变化对拼接结果的影响.实验结果表明:该文实现的ODO算法比单一策略在序列拼接时所产生的结果在N50和Coverage等参数上均有一定的优势.
Abstract:
Based on the analysis of three basic strategies of de novo sequence assembly,namely greedy strategy,OLC(Overlap-Layout-Consensus)strategy and DBG(De Bruijn Graph)strategy,the construction process of hybrid strategy sequence assembly algorithm is studied,so as to integrate the advantages of multiple single strategies.Taking advantage of the team's advantages in formal methods and platforms,combined with the methods of domain analysis modeling and generative programming,two algorithms based on OLC strategies(OLC_assembly_1,OLC_assembly_2)and an algorithm based on DBG(DBG-assembly)strategies are constructed,and the algorithms in the(OLC+DBG)→OLC hybrid mode(referred to as ODO algorithms)are further assembled.Finally,three experimental samples are selected from GenBank,and the stitching results of the algorithm and ODO construction algorithm under three single strategies are compared from the perspectives of N50,Contigs number,Coverage,etc.,and the effect of cover depth and k value change on the stitching result is analyzed.Experimental results show that the ODO algorithm implemented in this paper has certain advantages over the results of sequence assembly in terms of parameters such as N50 and Coverage.

参考文献/References:

[1] YANG Huanming.MEOMIC(medicine in omics)and the HGP(human genome project)[J].Medicine in Omics,2021,1:100004.
[2] SANGER F,NICKLEN S,COULSON A R.DNA sequencing with chain-terminating inhibitors[J].Proceedings of the National Academy of Sciences of the United States of America,1977,74(12):5463-5467.
[3] MAKHLUF H,BUCK M D,KING K,et al.Tracking the evolution of dengue virus strains D2S10 and D2S20 by 454 pyrosequencing[J].PLoS One,2013,8(1):e54220.
[4] FEDURCO M,ROMIEU A,WILLIAMS S,et al.BTA,a novel reagent for DNA attachment on glass and efficient generation of solid-phase amplified DNA colonies[J].Nucleic Acids Research,2006,34(3):e22.
[5] ONDOV B D,VARADARAJAN A,PASSALACQUA K D,et al.Efficient mapping of applied biosystems SOLiD sequence data to a reference genome for functional genomic applications[J].Bioinformatics,2008,24(23):2776-2777.
[6] 张丁予,章婷曦,王国祥.第二代测序技术的发展及应用[J].环境科学与技术,2016,39(9):96-102.
[7] THOMPSON J F,STEINMANN K E.Single molecule sequencing with a Heli Scope genetic analysis system[J].Current Protocols in Molecular Biology,2010,92(1):10-36.
[8] CLARKE J,WU Haichen,JAYASINGHE L,et al.Continuous base identification for single-molecule nanopore DNA sequencing[J].Nature Nanotechnology,2009,4(4):265-270.
[9] MUNROE D J,HARRIS T J R.Third-generation sequencing fireworks at Marco Island[J].Nature Biotechnology,2010,28(5):426-428.
[10] 范佳雯.高通量测序数据识别拼接错误方法[J].电子技术与软件工程,2019(23):146-147.
[11] 俞晓玲,姜文倩,郑玲,等.单分子测序技术及应用研究进展[J].生物化学与生物物理进展,2020,47(1):5-16.
[12] 蒋帅,周永坤,王涛.启发式搜索思想在路径规划中的应用[J].指挥信息系统与技术,2021,12(4):57-63.
[13] 张旭初.多序列比对算法族的构件组装研究[D].南昌:江西师范大学,2020.
[14] WARREN R L,SUTTON G G,JONES S J M,et al.Assembly millions of short DNA sequences using SSAKE[J].Bioinformatics,2007,23(4):500-501.
[15] JECK W R,REINHARDT J A,BALTRUS D A,et al.Extending assembly of short DNA sequences to handle error[J].Bioinformatics,2007,23(21):2942-2944.
[16] DOHM J C,LOTTAZ C,BORODINA T,et al.SHARCGS,a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing[J].Genome Research,2007,17(11):1697-1706.
[17] CAO M D,NGUYEN S H,GANESAMOORTHY D,et al.Scaffolding and completing genome assemblies in real-time with nanopore sequencing[J].Nature Communications,2017,8(1):14515.
[18] ZIMMERMANN T.Description of a genome assembler:CABOG[EB/OL].[2021-11-02].https://www.theozimmermann.net/pdf/presentation-CABOG.pdf.
[19] GORDON D.Viewing and editing assembled sequences using Consed[EB/OL].[2021-11-02].https://currentprotocols.onlinelibrary.wiley.com/doi/10.1002/0471250953.bi1102s02.
[20] MYERS E W,SUTTON G G,DELCHER A L,et al.A whole-genome assembly of Drosophila[J].Science,2000,287(5461):2196-2204.
[21] HERNANDEZ D,FRANÇOIS P,FARINELLI L,et al.De novo bacterial genome sequencing:millions of very short reads assembled on a desktop computer[J].Genome Research,2008,18(5):802-809.
[22] LI Ruiqiang,ZHU Hongmei,RUAN Jue,et al.De novo assembly of human genomes with massively parallel short read sequencing[J].Genome Research,2010,20(2):265-272.
[23] ZERBINO D R,BIRNEY E.Velvet:algorithms for de novo short read assembly using de Bruijn graphs[J].Genome Research,2008,18(5):821-829.
[24] SIMPSON J T,WONG K,JACKMAN S D,et al.ABySS:a parallel assembler for short read sequence data[J].Genome Research,2009,19(6):1117-1123.
[25] PEVZNER P A,TANG Haixu,WATERMAN M S.An Eulerian path approach to DNA fragment assembly[J].Proceedings of the National Academy of Sciences of the United States of America,2001,98(17):9748-9753.
[26] BUTLER J,MACCALLUM I,KLEBER M,et al.ALLPATHS:de novo assembly of whole-genome shotgun microreads[J].Genome Research,2008,18(5):810-820.
[27] KUSUMA W A,ISHIDA T,AKIYAMA Y.A combined approach for de novo DNA sequence assembly of very short reads[J].IPSJ Transactions on Bioinformatics,2011,4(10):21-23.
[28] 石海鹤.形式化框架下置换和查找类算法的组装生成[M].北京:科学出版社,2017:25-27.
[29] SHI Haihe,WU Gang.Gene sequence assembly algorithm model based on the DBG strategy and its application[J].Journal of Healthcare Engineering,2021,2021:6676194.

备注/Memo

备注/Memo:
收稿日期:2022-01-12
基金项目:国家自然科学基金(62062039,61662035)和江西省自然科学基金(20202BAB202024,20212BAB202017)资助项目.
通信作者:石海鹤(1979—),女,江西乐平人,教授,博士,主要从事生物信息学、形式化验证研究.E-mail:haiheshi@jxnu.edu.cn
更新日期/Last Update: 2022-05-25