期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Bit-parallel approximate string matching algorithms with transposition 总被引：1，自引：0，他引：1

Heikki Hyyr 《Journal of Discrete Algorithms》2005,3(2-4):215-229

Using bit-parallelism has resulted in fast and practical algorithms for approximate string matching under Levenshtein edit distance, which permits a single edit operation to insert, delete or substitute a character. Depending on the parameters of the search, currently the fastest non-filtering algorithms in practice are the O(km/wn) algorithm of Wu and Manber, the O((k+2)(m−k)/wn) algorithm of Baeza-Yates and Navarro, and the O(m/wn) algorithm of Myers, where m is the pattern length, n is the text length, k is the error threshold and w is the computer word size. In this paper we discuss a uniform way of modifying each of these algorithms to permit also a fourth type of edit operation: transposing two adjacent characters in the pattern. This type of edit distance is also known as Damerau edit distance. In the end we also present an experimental comparison of the resulting algorithms. 相似文献

2.

Average-optimal string matching 总被引：2，自引：0，他引：2

Kimmo Fredriksson Szymon Grabowski 《Journal of Discrete Algorithms》2009,7(4):579-594

The exact string matching problem is to find the occurrences of a pattern of length m from a text of length n symbols. We develop a novel and unorthodox filtering technique for this problem. Our method is based on transforming the problem into multiple matching of carefully chosen pattern subsequences. While this is seemingly more difficult than the original problem, we show that the idea leads to very simple algorithms that are optimal on average. We then show how our basic method can be used to solve multiple string matching as well as several approximate matching problems in average optimal time. The general method can be applied to many existing string matching algorithms. Our experimental results show that the algorithms perform very well in practice. 相似文献

3.

Practical algorithms for transposition-invariant string-matching

Kjell Lemstrm Gonzalo Navarro Yoan Pinzon 《Journal of Discrete Algorithms》2005,3(2-4):267-292

We consider the problems of (1) longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (that is, transposed) to match the second, and (2) transposition-invariant text searching using indel distance. These problems have applications in music comparison and retrieval. We introduce two novel techniques to solve these problems efficiently. The first is based on the branch and bound method, the second on bit-parallelism. Our branch and bound algorithm computes the longest common transposition-invariant subsequence (LCTS) in time O((m²+loglogσ)logσ) in the best case and O((m²+logσ)σ) in the worst case, where m and σ, respectively, are the length of the strings and the size of the alphabet. On the other hand, we show that the same problem can be solved by using bit-parallelism and thus obtain a speedup of O(w/logm) over the classical algorithms, where the computer word has w bits. The advantage of this latter algorithm over the present bit-parallel ones is that it allows the use of more complex distances, including general integer weights. Since our branch and bound method is very flexible, it can be further improved by combining it with other efficient algorithms such as our novel bit-parallel algorithm. We experiment on several combination possibilities and discuss which are the best settings for each of those combinations. Our algorithms are easily extended to other musically relevant cases, such as δ-matching and polyphony (where there are several parallel texts to be considered). We also show how our bit-parallel algorithm is adapted to text searching and illustrate its effectiveness in complex cases where the only known competing method is the use of brute force. 相似文献

4.

Bit-parallel -matching and suffix automata

Maxime Crochemore Costas S. Iliopoulos Gonzalo Navarro Yoan J. Pinzon Alejandro Salinger 《Journal of Discrete Algorithms》2005,3(2-4):198-214

(δ,γ)-matching is a string matching problem with applications to music retrieval. The goal is, given a pattern P_1…m and a text T_1…n on an alphabet of integers, find the occurrences P^′ of the pattern in the text such that (i) , and (ii) . The problem makes sense for δγδm. Several techniques for (δ,γ)-matching have been proposed, based on bit-parallelism or on skipping characters. We first present an O(mnlog(γ)/w) worst-case time and O(n) average-case time bit-parallel algorithm (being w the number of bits in the computer word). It improves the previous O(mnlog(δm)/w) worst-case time algorithm of the same type. Second, we combine our bit-parallel algorithm with suffix automata to obtain the first algorithm that skips characters using both δ and γ. This algorithm examines less characters than any previous approach, as the others do just δ-matching and check the γ-condition on the candidates. We implemented our algorithms and drew experimental results on real music, showing that our algorithms are superior to current alternatives with high values of δ. 相似文献

5.

Regular expression searching on compressed text

Gonzalo Navarro 《Journal of Discrete Algorithms》2003,1(5-6):423-443

We present a solution to the problem of regular expression searching on compressed text. The format we choose is the Ziv–Lempel family, specifically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text in O(2^m+mn+Rmlogm) worst case time. On average this drops to O(m²+(n+Rm)logm) or O(m²+n+Ru/n) for most regular expressions. This is the first nontrivial result for this problem. The experimental results show that our compressed search algorithm needs half the time necessary for decompression plus searching, which is currently the only alternative. 相似文献

6.

Practical and flexible pattern matching over Ziv–Lempel compressed text

Gonzalo Navarro Mathieu Raffinot 《Journal of Discrete Algorithms》2004,2(3):347-371

We address the problem of string matching on Ziv–Lempel compressed text. The goal is to search for a pattern in a text without uncompressing it. This is a highly relevant issue to keep compressed text databases where efficient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts the essential features of Ziv–Lempel compression. We then apply the scheme to each particular type of compression. We present an algorithm to find all the matches of a pattern in a text compressed using LZ77. When we apply our scheme to LZ78, we obtain a much more efficient search algorithm, which is faster than uncompressing the text and then searching it. Finally, we propose a new hybrid compression scheme which is between LZ77 and LZ78, being in practice as good to compress as LZ77 and as fast to search as LZ78. We show also how to search for some extended patterns on Ziv–Lempel compressed text, such as classes of characters and approximate string matching. 相似文献