Gap Filling as Exact Path Length Problem

Wed, 01.04.2015
An article titled "Gap Filling as Exact Path Length Problem" is to be presented at RECOMB 2015 - a leading computational biology conference with 21% acceptance ratio this year. 
 
The problem considered is one of the last steps in a genome assembly project, that is, filling the gaps between consecutive contigs in the scaffolds. This problem can be naturally stated as finding a s-t path in a directed graph whose sum of arc costs belongs to a given range (the estimate on the gap length). Here s and t are any two contigs flanking a gap. This problem is known to be NP-hard in general. The article develops a simple pseudo-polynomial dynamic programming solution along with its implementation with various practical optimizations. The new exact gap filling solution is experimentally compared to popular gap filling tools. Summing over all the bacterial assemblies considered in the experiments, the new tool can in total fill 28% more gaps than the best previous tool and the gaps filled by the new method span 80% more sequence. Furthermore, the error level of the newly introduced sequence is comparable to that of the previous tools.
 
The research was conducted by Leena Salmela, Alexandru I. Tomescu, and Veli Mäkinen from the Department of Computer Science at the University of Helsinki, Finland, in collaboration with Kristoffer Sahlin from the KTH Royal Institute of Technology, Sweden. The main ideas for the article were invented while Kristoffer was visiting Helsinki in Autumn 2014.
 

Contact person: Veli Mäkinen


Last updated on 1 Apr 2015 by Veli Mäkinen - Page created on 1 Apr 2015 by Veli Mäkinen