首页 | 本学科首页   官方微博 | 高级检索  
     


An efficient similarity search based on indexing in large DNA databases
Authors:In-Seon Jeong  Kyoung-Wook Park  Seung-Ho Kang  Hyeong-Seok Lim
Affiliation:School of Electronics & Computer Eng., Chonnam National University, 300 YongBong-Dong, Buk-Gu, Gwangju 500-757, Republic of Korea
Abstract:Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号