Papers
Topics
Authors
Recent
Search
2000 character limit reached

EPR-dictionaries: A practical and fast data structure for constant time searches in unidirectional and bidirectional FM-indices

Published 8 Aug 2016 in cs.DS | (1608.02413v2)

Abstract: We introduce a new, practical method for conducting an exact search in a uni- and bidirectional FM index in $O(1)$ time per step while using $O(\log \sigma * n) + o(\log \sigma * \sigma * n)$ bits of space. This is done by replacing the binary wavelet tree by a new data structure, the Enhanced Prefixsum Rank dictionary (EPR-dictionary). We implemented this method in the SeqAn C++ library and experimentally validated our theoretical results. In addition we compared our implementation with other freely available implementations of bidirectional indices and show that we are between $\approx 2.6-4.8$ times faster. This will have a large impact for many bioinformatics applications that rely on practical implementations of (2)FM indices e.g. for read mapping. To our knowledge this is the first implementation of a constant time method for a search step in 2FM indices.

Citations (9)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.