首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
New text indexing functionalities of the compressed suffix arrays are proposed. The compressed suffix array proposed by Grossi and Vitter is a space-efficient data structure for text indexing. It occupies only O(n) bits for a text of length n; however it also uses the text itself that occupies bits for the alphabet . In this paper we modify the data structure so that pattern matching can be done without any access to the text. In addition to the original functions of the compressed suffix array, we add new operations search, decompress and inverse to the compressed suffix arrays. We show that the new index can find occ occurrences of any substring P of the text in O(|P|logn+occlogεn) time for any fixed 1ε>0 without access to the text. The index also can decompress a part of the text of length m in O(m+logεn) time. For a text of length n on an alphabet such that , our new index occupies only bits where is the order-0 entropy of the text. Especially for ε=1 the size is bits. Therefore the index will be smaller than the text, which means we can perform fast queries from compressed texts.  相似文献   

2.
Let a text of u characters over an alphabet of size σ be compressible to n phrases by the LZ78 algorithm. We show how to build a data structure based on the Ziv–Lempel trie, called the LZ-index, that takes 4nlog2n(1+o(1)) bits of space (that is, 4 times the entropy of the text for ergodic sources) and reports the R occurrences of a pattern of length m in worst case time O(m3logσ+(m+R)logn). We present a practical implementation of the LZ-index, which is faster than current alternatives when we take into consideration the time to report the positions or text contexts of the occurrences found.  相似文献   

3.
The compressed matching problem is the problem of finding all occurrences of a pattern in a compressed text. In this paper we discuss the 2-dimensional compressed matching problem in Lempel–Ziv compressed images. Given a pattern P of (uncompressed) size m×m, and a text T of (uncompressed) size n×n, both in 2D-LZ compressed form, our algorithm finds all occurrences of P in T. The algorithm is strongly inplace, that is, the amount of extra space used is proportional to the best possible compression of a pattern of size m2. The best compression that the 2D-LZ technique can obtain for a file of size m2 is O(m). The time for performing the search is O(n2) and the preprocessing time is O(m3). Our algorithm is general in the sense that it can be used for any 2D compression which can be sequentially decompressed in small space.  相似文献   

4.
Denis S. Krotov   《Discrete Mathematics》2008,308(22):5289-5297
An n-ary operation Q:ΣnΣ is called an n-ary quasigroup of order |Σ| if in the relation x0=Q(x1,…,xn) knowledge of any n elements of x0,…,xn uniquely specifies the remaining one. Q is permutably reducible if Q(x1,…,xn)=P(R(xσ(1),…,xσ(k)),xσ(k+1),…,xσ(n)) where P and R are (n-k+1)-ary and k-ary quasigroups, σ is a permutation, and 1<k<n. An m-ary quasigroup S is called a retract of Q if it can be obtained from Q or one of its inverses by fixing n-m>0 arguments. We prove that if the maximum arity of a permutably irreducible retract of an n-ary quasigroup Q belongs to {3,…,n-3}, then Q is permutably reducible.  相似文献   

5.
For fixed integers m,k2, it is shown that the k-color Ramsey number rk(Km,n) and the bipartite Ramsey number bk(m,n) are both asymptotically equal to kmn as n→∞, and that for any graph H on m vertices, the two-color Ramsey number is at most (1+o(1))nm+1/(logn)m-1. Moreover, the order of magnitude of is proved to be nm+1/(logn)m if HKm as n→∞.  相似文献   

6.
Let Rn×p, (n), Gl(p) and +(p) denote respectively the set of n×p matrices, the set of n×n orthogonal matrices, the set of p×p nonsingular matrices and the set of p × p positive definite matrices. In this paper, it is first shown that a bijective and bimeasurable transformation (BBT) g on RpRp×1 preserving the multivariate normality of Np(μ, Σ) for fixed μ=μ1, μ21≠μ2) and for all Σ +(p) is of the form g(x)=Ax+b a.e. for some (A, b)Gl(pRp. Second, a BBT g on Rn×p preserving the form for certain 's and all Σ +(p) is shown to be of the form g(x)=QxA+E a.e. for some (Q, A, E) (nGl(p)×Rn×p. Third, a BBT h on +(p) preserving the Wishart-ness of Wp(Σ, m) (mp) for all Σ +(p) is shown to be of the form h(w)=AwA a.e. for some AGl(p). Fourth, a BBT k(x, w)=(k1(x, w), k2(x, w)) on Rn×p× +(p) which preserves the form of for certain 's and all Σ +(p) is shown to be of the form k(x, w)=(QxA+E, AwA) a.e. for some (Q, A, E) (nGl(p)×Rn×p.  相似文献   

7.
The indexing problem is where a text is preprocessed and subsequent queries of the form “Find all occurrences of pattern P in the text” are answered in time proportional to the length of the query and the number of occurrences. In the dictionary matching problem a set of patterns is preprocessed and subsequent queries of the form “Find all occurrences of dictionary patterns in text T” are answered in time proportional to the length of the text and the number of occurrences.There exist efficient worst-case solutions for the indexing problem and the dictionary matching problem, but none that find approximate occurrences of the patterns, i.e., where the pattern is within a bound edit (or Hamming) distance from the appropriate text location.In this paper we present a uniform deterministic solution to both the indexing and the general dictionary matching problem with one error. We preprocess the data in time O(n log2 n), where n is the text size in the indexing problem and the dictionary size in the dictionary matching problem. Our query time for the indexing problem is O(m log n log log n + tocc), where m is the query string size and tocc is the number of occurrences. Our query time for the dictionary matching problem is O(n log3 d log log d + tocc), where n is the text size and d the dictionary size. The time bounds above apply to both bounded and unbounded alphabets.  相似文献   

8.
Greedily Finding a Dense Subgraph   总被引:3,自引:0,他引:3  
Given an n-vertex graph with nonnegative edge weights and a positive integer k ≤ n, our goal is to find a k-vertex subgraph with the maximum weight. We study the following greedy algorithm for this problem: repeatedly remove a vertex with the minimum weighted-degree in the currently remaining graph, until exactly k vertices are left. We derive tight bounds on the worst case approximation ratio R of this greedy algorithm: (1/2 + n/2k)2 − O(n − 1/3) ≤ R ≤ (1/2 + n/2k)2 + O(1/n) for k in the range n/3 ≤ k ≤ n and 2(n/k − 1) − O(1/k) ≤ R ≤ 2(n/k − 1) + O(n/k2) for k < n/3. For k = n/2, for example, these bounds are 9/4 ± O(1/n), improving on naive lower and upper bounds of 2 and 4, respectively. The upper bound for general k compares well with currently the best (and much more complicated) approximation algorithm based on semidefinite programming.  相似文献   

9.
Let a text string T of n symbols and a pattern string P of m symbols from alphabet Σ be given. A swapped version T′ of T is a length n string derived from T by a series of local swaps (i.e., t ← tℓ + 1 and tℓ + 1 ← t), where each element can participate in no more than one swap. The pattern matching with swaps problem is that of finding all locations i for which there exists a swapped version T′ of T with an exact matching of P in location i of T′. It has been an open problem whether swapped matching can be done in less than O(nm) time. In this paper we show the first algorithm that solves the pattern matching with swaps problem in time o(nm). We present an algorithm whose time complexity is O(nm1/3 log m log σ) for a general alphabet Σ, where σ = min(m,Σ).  相似文献   

10.
An efficient fixed-parameter algorithm for 3-Hitting Set   总被引:1,自引:0,他引:1  
Given a collection C of subsets of size three of a finite set S and a positive integer k, the 3-Hitting Set problem is to determine a subset SS with |S′|k, so that S′ contains at least one element from each subset in C. The problem is NP-complete, and is motivated, for example, by applications in computational biology. Improving previous work, we give an O(2.270k+n) time algorithm for 3-Hitting Set, which is efficient for small values of k, a typical occurrence in some applications. For d-Hitting Set we present an O(ck+n) time algorithm with c=d−1+O(d−1).  相似文献   

11.
(δ,γ)-matching is a string matching problem with applications to music retrieval. The goal is, given a pattern P1…m and a text T1…n on an alphabet of integers, find the occurrences P of the pattern in the text such that (i) , and (ii) . The problem makes sense for δγδm. Several techniques for (δ,γ)-matching have been proposed, based on bit-parallelism or on skipping characters. We first present an O(mnlog(γ)/w) worst-case time and O(n) average-case time bit-parallel algorithm (being w the number of bits in the computer word). It improves the previous O(mnlog(δm)/w) worst-case time algorithm of the same type. Second, we combine our bit-parallel algorithm with suffix automata to obtain the first algorithm that skips characters using both δ and γ. This algorithm examines less characters than any previous approach, as the others do just δ-matching and check the γ-condition on the candidates. We implemented our algorithms and drew experimental results on real music, showing that our algorithms are superior to current alternatives with high values of δ.  相似文献   

12.
Let denote the set of continuous n×n matrices on an interval . We say that is a nontrivial k-involution if where ζ=e-2πi/k, d0+d1++dk-1=n, and with . We say that is R-symmetric if R(t)A(t)R-1(t)=A(t), , and we show that if A is R-symmetric then solving x=A(t)x or x=A(t)x+f(t) reduces to solving k independent d×d systems, 0k-1. We consider the asymptotic behavior of the solutions in the case where . Finally, we sketch analogous results for linear systems of difference equations.  相似文献   

13.
Let ω be a primitive element of GF(2n), where . Let d=(22k+2s+1-2k+1-1)/(2s-1), where n=2k, and s is such that 2s divides k. We prove that the binary m-sequences s(t)=tr(ωt) and s(dt) have a four-level cross-correlation function and give the distribution of the values.  相似文献   

14.
Given an immersion of a manifoldf: M→R n+k , dimensionM=n, the parallel groupP(f) off is formed by the diffeomorphisms ofM such that the normalk-planes at points of each orbit are parallel. In [3] we studied the parallel group of a plane closed curve. Here we concentrate on immersionsf: R n R n+1, special attention being paid to graphs of smooth maps fromR toR. Graphs of smooth mapsf: S n R m are also dealt with and we characterise those maps of which the graph has nontrivial parallel group. To end up we find a sufficient condition for the triviality of the tangent group.  相似文献   

15.
In this paper, we consider the pattern matching problem in DNA and RNA sequences where either the pattern or the text can be degenerate, i.e., contain sets of characters. We present an asymptotically faster algorithm for the above problem that works in O(n log m) time, where n and m is the length of the text and the pattern respectively. We also suggest an efficient implementation of our algorithm, which works in linear time when the pattern size is small. Finally, we also describe how our approach can be used to solve the distributed pattern matching problem. The preliminary version of this paper appeared in [26].  相似文献   

16.
It follows from the theory of trace identities developed by Procesi and Razmyslov that the trace cocharacters arising from the trace identities of the algebra Mr(F) of r×r matrices over a field F of characteristic zero are given by TCr,n=∑λΛr(n)χλχλ where χλχλ denotes the Kronecker product of the irreducible characters of the symmetric group associated with the partition λ with itself and Λr(n) denotes the set of partitions of n with r or fewer parts, i.e. the set of partitions λ=(λ1λk) with kr. We study the behavior of the sequence of trace cocharacters TCr,n. In particular, we study the behavior of the coefficient of χ(ν,nm) in TCr,n as a function of n where ν=(ν1νk) is some fixed partition of m and nmνk. Our main result shows that such coefficients always grow as a polynomial in n of degree r−1.  相似文献   

17.
We propose succinct data structures for text retrieval systems supporting document listing queries and ranking queries based on the tf*idf (term frequency times inverse document frequency) scores of documents. Traditional data structures for these problems support queries only for some predetermined keywords. Recently Muthukrishnan proposed a data structure for document listing queries for arbitrary patterns at the cost of data structure size. For computing the tf*idf scores there has been no efficient data structures for arbitrary patterns.Our new data structures support these queries using small space. The space is only 2/ times the size of compressed documents plus 10n bits for a document collection of length n, for any 0<1. This is much smaller than the previous O(nlogn) bit data structures. Query time is O(m+qlogn) for listing and computing tf*idf scores for all q documents containing a given pattern of length m. Our data structures are flexible in a sense that they support queries for arbitrary patterns.  相似文献   

18.
In [4] we constructed certain homology representations of a finite group G of type An, Bn or Cn, and showed that these representations can be used to sift out the reflection compound characters of G. In the present note, we show that for a group G of type Dn, each reflection compound character π(k), 2 k n − 2, determines a unique “obstruction” character θ(k), which occurs with positive multiplicity in every homology representation containing π(k).  相似文献   

19.
The string matching with mismatches problem requires finding the Hamming distance between a pattern P of length m and every length m substring of text T with length n. Fischer and Paterson's FFT-based algorithm solves the problem without error in O(σnlogm), where σ is the size of the alphabet Σ [SIAM–AMS Proc. 7 (1973) 113–125]. However, this in the worst case reduces to O(nmlogm). Atallah, Chyzak and Dumas used the idea of randomly mapping the letters of the alphabet to complex roots of unity to estimate the score vector in time O(nlogm) [Algorithmica 29 (2001) 468–486]. We show that the algorithm's score variance can be substantially lowered by using a bijective mapping, and specifically to zero in the case of binary and ternary alphabets. This result is extended via alphabet remappings to deterministically solve the string matching with mismatches problem with a constant factor of 2 improvement over Fischer–Paterson's method.  相似文献   

20.
We consider the class of primitive stochastic n×n matrices A, whose exponent is at least (n2−2n+2)/2+2. It is known that for such an A, the associated directed graph has cycles of just two different lengths, say k and j with k>j, and that there is an α between 0 and 1 such that the characteristic polynomial of A is λn−αλnj−(1−α)λnk. In this paper, we prove that for any mn, if α1/2, then Am+kAmAm1wT, where 1 is the all-ones vector and wT is the left-Perron vector for A, normalized so that wT1=1. We also prove that if jn/2, n31 and , then Am+jAmAm1wT for all sufficiently large m. Both of these results lead to lower bounds on the rate of convergence of the sequence Am.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号