期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Simple,compact and robust approximate string dictionary

《Journal of Discrete Algorithms》2014

相似文献

2.

Efficient exponential-time algorithms for edit distance between unordered trees

《Journal of Discrete Algorithms》2014

The edit distance problem for rooted unordered trees is known to be NP-hard. Based on this fact, this paper studies exponential-time algorithms for the problem. For a general case, an

O (\min ({1.26}^{n_{1} + n_{2}}, 2^{b_{1} + b_{2}} \cdot poly (n_{1}, n_{2})))

time algorithm is presented, where

n_{1}

and

n_{2}

are the numbers of nodes and

b_{1}

and

b_{2}

are the numbers of branching nodes in two input trees. This algorithm is obtained by a combination of dynamic programming, exhaustive search, and maximum weighted bipartite matching. For bounded degree trees over a fixed alphabet, it is shown that the problem can be solved in

O ({(1 + ϵ)}^{n_{1} + n_{2}})

time for any fixed

ϵ > 0

. This result is achieved by avoiding duplicate calculations for identical subsets of small subtrees. 相似文献

3.

Bit-parallel approximate string matching algorithms with transposition 总被引：1，自引：0，他引：1

Heikki Hyyr 《Journal of Discrete Algorithms》2005,3(2-4):215-229

Using bit-parallelism has resulted in fast and practical algorithms for approximate string matching under Levenshtein edit distance, which permits a single edit operation to insert, delete or substitute a character. Depending on the parameters of the search, currently the fastest non-filtering algorithms in practice are the O(km/wn) algorithm of Wu and Manber, the O((k+2)(m−k)/wn) algorithm of Baeza-Yates and Navarro, and the O(m/wn) algorithm of Myers, where m is the pattern length, n is the text length, k is the error threshold and w is the computer word size. In this paper we discuss a uniform way of modifying each of these algorithms to permit also a fourth type of edit operation: transposing two adjacent characters in the pattern. This type of edit distance is also known as Damerau edit distance. In the end we also present an experimental comparison of the resulting algorithms. 相似文献

4.

Text indexing with errors

Moritz G. Maaß Johannes Nowak 《Journal of Discrete Algorithms》2007,5(4):662-681

相似文献

5.

Approximate search of short patterns with high error rates using the 01⁎0 lossless seeds

《Journal of Discrete Algorithms》2016

相似文献

6.

Hlne Touzet 《Journal of Discrete Algorithms》2007,5(4):696-705

相似文献

7.

Distributed suffix trees

Raphaël Clifford 《Journal of Discrete Algorithms》2005,3(2-4):176-197

We present a new variant of the suffix tree called a distributed suffix tree (DST) which allows for much larger databases of strings to be handled efficiently. The method is based on a new linear time construction algorithm for subtrees of a suffix tree. The new data structure tackles the memory bottleneck problem by constructing these subtrees independently and in parallel. It is designed for distributed memory parallel computing environments (e.g., Beowulf clusters). The central advantage is that standard operations of biological importance on suffix trees are shown to be easily translatable to this new data structure. While none of these operations on the DST require inter-process communication, many have optimal expected parallel running times. 相似文献

8.

A dynamic edit distance table

Sung-Ryul Kim Kunsoo Park 《Journal of Discrete Algorithms》2004,2(2):303

In this paper we consider the incremental/decremental version of the edit distance problem: given a solution to the edit distance between two strings A and B, find a solution to the edit distance between A and B′ where B′=aB (incremental) or bB′=B (decremental). As a solution for the edit distance between A and B, we define the difference representation of the D-table, which leads to a simple and intuitive algorithm for the incremental/decremental edit distance problem. 相似文献

9.

Bit-parallel -matching and suffix automata

Maxime Crochemore Costas S. Iliopoulos Gonzalo Navarro Yoan J. Pinzon Alejandro Salinger 《Journal of Discrete Algorithms》2005,3(2-4):198-214

(δ,γ)-matching is a string matching problem with applications to music retrieval. The goal is, given a pattern P_1…m and a text T_1…n on an alphabet of integers, find the occurrences P^′ of the pattern in the text such that (i) , and (ii) . The problem makes sense for δγδm. Several techniques for (δ,γ)-matching have been proposed, based on bit-parallelism or on skipping characters. We first present an O(mnlog(γ)/w) worst-case time and O(n) average-case time bit-parallel algorithm (being w the number of bits in the computer word). It improves the previous O(mnlog(δm)/w) worst-case time algorithm of the same type. Second, we combine our bit-parallel algorithm with suffix automata to obtain the first algorithm that skips characters using both δ and γ. This algorithm examines less characters than any previous approach, as the others do just δ-matching and check the γ-condition on the candidates. We implemented our algorithms and drew experimental results on real music, showing that our algorithms are superior to current alternatives with high values of δ. 相似文献

10.

Indexing text with approximate q-grams

Gonzalo Navarro Erkki Sutinen Jorma Tarhio 《Journal of Discrete Algorithms》2005,3(2-4):157-175

We present a new index for approximate string matching. The index collects text q-samples, that is, disjoint text substrings of length q, at fixed intervals and stores their positions. At search time, part of the text is filtered out by noticing that any occurrence of the pattern must be reflected in the presence of some text q-samples that match approximately inside the pattern. Hence the index points out the text areas that could contain occurrences and must be verified. The index parameters permit load balancing between filtering and verification work, and provide a compromise between the space requirement of the index and the error level for which the filtration is still efficient. We show experimentally that the index is competitive against others that take more space, being in fact the fastest choice for intermediate error levels, an area where no current index is useful. 相似文献

11.

A graph-theoretic model to solve the approximate string matching problem allowing for translocations

《Journal of Discrete Algorithms》2013

In this paper, we study the approximate string matching problem under a string distance whose edit operations are translocations of equal length factors. We extend a graph-theoretic approach proposed by Rahman and Illiopoulos (2008) to model our problem. In the sequel, we devise efficient algorithms based on this model to solve a number of variants of the string matching problem with translocations. 相似文献

12.

Faster algorithms for string matching with k mismatches

Amihood Amir Moshe Lewenstein Ely Porat 《Journal of Algorithms in Cognition, Informatics and Logic》2004,50(2):257-275

The string matching with mismatches problem is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T. Currently, the fastest algorithms for this problem are the following. The Galil–Giancarlo algorithm finds all locations where the pattern has at most k errors (where k is part of the input) in time O(nk). The Abrahamson algorithm finds the number of mismatches at every location in time . We present an algorithm that is faster than both. Our algorithm finds all locations where the pattern has at most k errors in time . We also show an algorithm that solves the above problem in time O((n+(nk³)/m)logk). 相似文献

13.

Intuitionistic Fuzzy Automaton for Approximate String Matching

《佛山科学技术学院》2014,6(1):29-39

This paper introduces an intuitionistic fuzzy automaton model for computing the similarity between pairs of strings. The model details the possible edit operations needed to transform any input (observed) string into a target (pattern) string by providing a membership and non-membership value between them. In the end, an algorithm is given for approximate string matching and the proposed model computes the similarity and dissimilarity between the pair of strings leading to better approximation. 相似文献

14.

Dynamic edit distance table under a general weighted cost function

《Journal of Discrete Algorithms》2015

相似文献

15.

Approximating reversal distance for strings with bounded number of duplicates

Petr Kolman Tomasz Waleń 《Discrete Applied Mathematics》2007,155(3):327-336

For a string A=a₁…a_n, a reversalρ(i,j), 1?i?j?n, transforms the string A into a string A^′=a₁…a_i-1a_ja_j-1…a_ia_j+1…a_n, that is, the reversal ρ(i,j) reverses the order of symbols in the substring a_i…a_j of A. In the case of signed strings, where each symbol is given a sign + or -, the reversal operation also flips the sign of each symbol in the reversed substring. Given two strings, A and B, signed or unsigned, sorting by reversals (SBR) is the problem of finding the minimum number of reversals that transform the string A into the string B.Traditionally, the problem was studied for permutations, that is, for strings in which every symbol appears exactly once. We consider a generalization of the problem, k-SBR, and allow each symbol to appear at most k times in each string, for some k?1. The main result of the paper is an O(k²)-approximation algorithm running in time O(n). For instances with , this is the best known approximation algorithm for k-SBR and, moreover, it is faster than the previous best approximation algorithm. 相似文献

16.

Space efficient linear time construction of suffix arrays 总被引：1，自引：0，他引：1

Pang Ko Srinivas Aluru 《Journal of Discrete Algorithms》2005,3(2-4):143-156

We present a linear time algorithm to sort all the suffixes of a string over a large alphabet of integers. The sorted order of suffixes of a string is also called suffix array, a data structure introduced by Manber and Myers that has numerous applications in pattern matching, string processing, and computational biology. Though the suffix tree of a string can be constructed in linear time and the sorted order of suffixes derived from it, a direct algorithm for suffix sorting is of great interest due to the space requirements of suffix trees. Our result is one of the first linear time suffix array construction algorithms, which improve upon the previously known O(nlogn) time direct algorithms for suffix sorting. It can also be used to derive a different linear time construction algorithm for suffix trees. Apart from being simple and applicable for alphabets not necessarily of fixed size, this method of constructing suffix trees is more space efficient. 相似文献

17.

Applying an edit distance to the matching of tree ring sequences in dendrochronology

Carola Wenk 《Journal of Discrete Algorithms》2003,1(5-6):367-385

In dendrochronology wood samples are dated according to the tree rings they contain. The dating process consists of comparing the sequence of tree ring widths in the sample to a dated master sequence. Assuming that a tree forms exactly one ring per year a simple sliding algorithm solves this matching task.

But sometimes a tree produces no ring or even two rings in a year. If a sample sequence contains this kind of inconsistencies it cannot be dated correctly by the simple sliding algorithm. We therefore introduce a algorithm for dating such a sample sequence against an error-free master sequence, where n and m are the lengths of the sequences. Our algorithm takes into account that the sample might contain up to missing or double rings and suggests possible positions for these kind of inconsistencies. This is done by employing an edit distance as the distance measure. 相似文献

18.

Scenario tree generation and multi-asset financial optimization problems

Alois Geyer Michael Hanke Alex Weissensteiner 《Operations Research Letters》2013,41(5):494-498

We compare two popular scenario tree generation methods in the context of financial optimization: moment matching and scenario reduction. Using a simple problem with a known analytic solution, moment matching–when ensuring absence of arbitrage–replicates this solution precisely. On the other hand, even if the scenario trees generated by scenario reduction are arbitrage-free, the solutions are biased and highly variable. These results hold for correlated and uncorrelated asset returns, as well as for normal and non-normal returns. 相似文献

19.

Noisy colored point set matching

Yago DiezAuthor Vitae J. Antoni Sellarès^{Author Vitae} 《Discrete Applied Mathematics》2011,159(6):433-449

We propose a process for determining approximated matches, in terms of the bottleneck distance, under color preserving rigid motions, between two colored point sets A,B∈R², |A|≤|B|. We solve the matching problem by generating all representative motions that bring A close to a subset B^′ of set B and then using a graph matching algorithm. We also present an approximate matching algorithm with improved computational time. In order to get better running times for both algorithms we present a lossless filtering preprocessing step. By using it, we determine some candidate zones which are regions that contain a subset S of B such that A may match one or more subsets B^′ of S. Then, we solve the matching problem between A and every candidate zone. Experimental results using both synthetic and real data are reported to prove the effectiveness of the proposed approach. 相似文献

20.

An integer linear programming approach for approximate string comparison

Marcus Ritt Alysson M. Costa Sergio Mergen Viviane M. Orengo 《European Journal of Operational Research》2009

We introduce a problem called maximum common characters in blocks (MCCB), which arises in applications of approximate string comparison, particularly in the unification of possibly erroneous textual data coming from different sources. We show that this problem is NP-complete, but can nevertheless be solved satisfactorily using integer linear programming for instances of practical interest. Two integer linear formulations are proposed and compared in terms of their linear relaxations. We also compare the results of the approximate matching with other known measures such as the Levenshtein (edit) distance. 相似文献