J4 ›› 2016, Vol. 1 ›› Issue (6): 4-7.

• 数学与计算机科学 • 上一篇    下一篇

基于MapReduce的基因组组装算法改进

  

  1. (大理大学数学与计算机学院,云南大理671003)
  • 收稿日期:2016-01-25 出版日期:2016-06-15 发布日期:2016-06-15
  • 作者简介:何远,实验师,主要从事计算机科学研究.
  • 基金资助:

    大理大学青年教师科研基金资助项目(KYQN201218)

Improvement of Genome Assembly Algorithm Based on MapReduce

  1. (College of Mathematics and Computer, Dali University, Dali, Yunnan 671003, China)
  • Received:2016-01-25 Online:2016-06-15 Published:2016-06-15

摘要:

生物信息数据的飞速增长需要新的技术引入到该学科,目前的基因组组装算法还存在着精度不高、并行化不足等缺
点。对目前组装算法的分析后,提出了基于MapReduce的组装算法,通过统计去除组装过程中的错误数据,通过增加k-mer的
长度消除组装过程中的重复数据,最后在MapReduce平台实现了并行组装算法,实验结果表明算法提高了组装的准确度和计
算速度。

关键词: 基因组组装, 高通量测序, de Bruijn, MapReduce

Abstract:

The rapid increase of biological information data requires the import of new technology. At present, the genome assembly
algorithm is neither precised nor parallelize. A new algorithm based on MapReduce is proposed after analysis of the current assembly
algorithm. The error data is removed through statistics way, and the duplicate data is eliminated by increasing the length of the k-mer
in the process of assembly. Finally, the parallel assembly algorithm is realized in MapReduce platform. The experimental results show
that the accuracy and speed of this algorithm are improved.

Key words: genome assembly, high throughput sequencing, de Bruijn, MapReduce

中图分类号: