首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper generalizes the dynamic text indexing problem, introduced in [15], to insertion and deletion of strings. The problem is to quickly answer on-line queries about the occurrences of arbitrary pattern strings in a text that may change due to insertion or deletion of strings from it. To treat strings as atomic objects, we provide new sequential techniques and related data structures, which combine the suffix tree with the naming technique used in parallel computation on strings. We also introduce a geometric interpretation of the problem of finding the occurrences of a pattern in a given substring of the text. As a result, the algorithm allows the insertion in the text of a stringSinO(|S| log(n + |S|)) amortized time, and the deletion from the text of a stringSinO(|S| log n) amortized time, wherenis the length of the current text. A pattern search requiresO(p log p + upd ( + log p) + pocc) worst-case time, wherepis the length of the pattern andpoccis the number of its occurrences in the current text, obtained after the execution ofupdupdate operations. This solution requiresO(n2 log n) space, which is not initialized.We also provide a technique to reduce the space toO(n log n), yielding a solution that requiresO((p + upd) log p + pocc) query time in the worst-case,O(|S| log3/2(|S| + n)) amortized time to insert a stringSin, andO(|S| log3/2n) amortized time to delete a stringSfrom the current text.Furthermore, we use our techniques to solve the novel on-line dynamic tree matching problem that requires the on-line detection of the occurrences of an arbitrary subtree in a forest of ordered labeled trees. The forest may change due to insertion or deletion of subtrees or by renaming of some nodes. Such a problem is solved by a simple reduction to the dynamic text indexing problem.  相似文献   

2.
The indexing problem is where a text is preprocessed and subsequent queries of the form “Find all occurrences of pattern P in the text” are answered in time proportional to the length of the query and the number of occurrences. In the dictionary matching problem a set of patterns is preprocessed and subsequent queries of the form “Find all occurrences of dictionary patterns in text T” are answered in time proportional to the length of the text and the number of occurrences.There exist efficient worst-case solutions for the indexing problem and the dictionary matching problem, but none that find approximate occurrences of the patterns, i.e., where the pattern is within a bound edit (or Hamming) distance from the appropriate text location.In this paper we present a uniform deterministic solution to both the indexing and the general dictionary matching problem with one error. We preprocess the data in time O(n log2 n), where n is the text size in the indexing problem and the dictionary size in the dictionary matching problem. Our query time for the indexing problem is O(m log n log log n + tocc), where m is the query string size and tocc is the number of occurrences. Our query time for the dictionary matching problem is O(n log3 d log log d + tocc), where n is the text size and d the dictionary size. The time bounds above apply to both bounded and unbounded alphabets.  相似文献   

3.
Multilingual text compression exploits the existence of the same text in several languages to compress the second and subsequent copies by reference to the first. This is done based on bilingual text alignment, a mapping of words and phrases in one text to their semantic equivalents in the translation. A new multilingual text compression scheme is suggested, which improves over an immediate generalization of bilingual algorithms. The idea is to store the necessary markup data within the source language text; the incurred compression loss due to this overhead is smaller than the savings in the compressed target language texts, for a large enough number of the latter. Experimental results are presented for a parallel corpus in six languages extracted from the EUR-Lex website of the European Union. These results show the superiority of the new algorithm as a function of the number of languages.  相似文献   

4.
基于动态规划,利用反向搜索的方法,通过计算词语的最大“花费”给出了中文文本的切分算法,从而建立了一个能够消除中文分词中切分歧义的中文分词模型。通过对模型中算法求解的运行效率及空间耗费进行分析得出,在统计意义上,该算法具有接近与文本规模成线性关系的复杂度,空间的耗费是常数规模的。  相似文献   

5.
6.
A hash structure, Overflow Indexing (OVI), using an index for the overflows is presented. The index contains one entry (key, bucket number) for each overflow. Formulas for computing the expected number of entries in the index and the standard deviation are derived and the numerical results obtained using these formulae are presented in a graph. It is concluded that storing the index in the main memory when operating on the file is feasible for small to medium-sized, and sometimes even large files. The number of probes for both a successful and unsuccessful search is one. Deletion requires two probes and insertion two or three probes. Details of OVI are presented and illustrated by simulation experiments. The structure of the index is discussed and one possible structure, hashing with dynamic buckets, is presented.  相似文献   

7.
We study some structural properties for tree-decompositions of 2-connected planar graphs that we use to improve upon the runtime of tree-decomposition based dynamic programming approaches for several NP-hard planar graph problems. E.g., we derive the fastest algorithm for Planar Dominating Set of runtime 3twnO(1), when we take the width tw of a given tree-decomposition as the measure for the exponential worst case behavior. We also introduce a tree-decomposition based approach to solve non-local problems efficiently, such as Planar Hamiltonian Cycle in runtime 6twnO(1). From any input tree-decomposition of a 2-connected planar graph, one computes in time O(nm) a tree-decomposition with geometric properties, which decomposes the plane into disks, and where the graph separators form Jordan curves in the plane.  相似文献   

8.
结构动力方程求解的改进-精细Runge-Kutta方法   总被引:4,自引:2,他引:2  
在已有精细Runge-Kutta(龙格-库塔)方法的基础上,考虑了状态空间方程非齐次项的特点和外荷载的特殊性,提出了求解结构动力方程的改进精细Runge-Kutta方法.通过对矩阵进行分块计算,在利用原有精细Runge-Kutta方法高精度的同时进一步提高了计算效率,有利于大型结构的长时间仿真.将改进精细Runge-Kutta方法应用于结构动力方程求解,为其求解提供一种新方法.数值算例表明了改进方法的正确性和有效性.  相似文献   

9.
LSI潜在语义信息检索模型   总被引:5,自引:0,他引:5  
本文介绍了基于向量空间的信息检索方法 ,检索词和文件之间的关系表示成一个矩阵 ,查寻信息表示为检索词权重的向量 ,通过求查寻和文件向量的夹角余弦确定出数据库中的相关文件 .使用矩阵的 QR分解和奇异值分解 ( SVD)来处理数据库本身的不确定性 ,本文的目的是说明线性代数中的基本概念可以很好解决信息检索 ( IR)问题  相似文献   

10.
Based upon the understanding of the global topologies of the singular subset, its complement, and the hyperbolic subset in the symplectic group, in this paper we study the domains of instability for hyperbolic Hamiltonian systems and define a characteristic index for such domains. This index is defined via the Maslov-type index theory for symplectic paths starting from the identity defined by C. Conley, E. Zehnder, and Y. Long, and the hyperbolic index of symplectic matrices. The old problem of the relation between the non-degenerate local minimality and the instability of hyperbolic extremal loops in the calculus of variation is also studied via this new index for the domains of instability. Received July 4, 1997  相似文献   

11.
We present a new index for approximate string matching. The index collects text q-samples, that is, disjoint text substrings of length q, at fixed intervals and stores their positions. At search time, part of the text is filtered out by noticing that any occurrence of the pattern must be reflected in the presence of some text q-samples that match approximately inside the pattern. Hence the index points out the text areas that could contain occurrences and must be verified. The index parameters permit load balancing between filtering and verification work, and provide a compromise between the space requirement of the index and the error level for which the filtration is still efficient. We show experimentally that the index is competitive against others that take more space, being in fact the fastest choice for intermediate error levels, an area where no current index is useful.  相似文献   

12.
考虑到轨道结构长度随系统响应持时的增加而增长,提出了一种改进的车辆 轨道垂向耦合系统的动力响应求解算法.该算法事先选定某一定长度的轨道结构,并获得该轨道结构的质量矩阵、阻尼矩阵和刚度矩阵;通过在求解过程中不断地对车辆子系统定位,判断是否需要对车辆子系统的位置和轨道结构的响应矩阵进行调整,以此来达到仅增加系统响应持时而不增加轨道结构长度的目的.算例表明:该改进加快算法是精确、高效的,不仅可以真实地模拟车辆在轨道上的前进运行状态,而且可以保证轨道子系统的轨道单元数量不随系统响应持时的增加而增长,这为快速求解车辆 轨道垂向耦合系统提供了一种有效的计算方法.  相似文献   

13.
14.
Abstract

There are many examples of text data bases, including literary corpora and computer source code, in which statistics are associated with each line. A visualization technique for this class of data represents the text lines as thin colored rows within columns. The position, length, and indentation of each row corresponds to that of the text. The color of each row is determined by a statistic associated with each line. The display looks like a miniature picture of the text with the color showing the spatial distribution of the statistic within the text. Using this technique, SeeSoft?, a dynamic graphics software tool, can easily display 50,000 lines of text simultaneously on a high-resolution monitor.  相似文献   

15.
针对建设项目的复杂性和动态性,建立基于改进微粒群算法的多目标动态优化模型.首先,为提高算法性能,引入外部归档集和阈值并构建基于理想点法的适应度函数;其次,分别建立工期模型、加入系统可靠度的质量模型以及加入费用现值的成本模型,由其得到综合优化模型;最后结合工程实例对算法进行验证并与非劣分类遗传算法(NSGA-Ⅱ算法)对比.结果表明:方法比NSGA-Ⅱ算法的优化结果更科学、收敛速度更快.  相似文献   

16.
Methods and algorithms for physical compression of text are considered. Their comparative efficiency is analyzed, allowing for the tradeoff between volume compression and the coding/decoding time and for their robustness to various changes in the statistical characteristics of the texts being processed.Translated from Itogi Nauki i Tekhniki, Seriya Teoriya Veroyatnostei, Matematicheskaya Statistika, Teoreticheskaya Kibernetika, Vol. 28, pp. 85–109, 1988.  相似文献   

17.
In this paper, we study a semigroup induced by the index morphism on finite trees. This index morphism classify the set of all finite trees under certain equivalence relation, and this classification forms a semigroup. We define the indexing α by a map from this semigroup into the family of all von Neumann algebras. Then the images of α become new von Neumann algebras containing their natural diagonal W*-subalgebras. For naturally determined conditional expectations with affiliation, the Watatani’s extended Jones index of each image of α is preserved by the elements of the semigroup.  相似文献   

18.
将黄金数据的尖峰厚尾、异方差性及杠杆效应等统计特征与马尔科夫概率转移矩阵所具有的动态变化规律结合,提出一种改进的灰色马尔科夫模型.模型首先对数据进行统计分析,建立相应的概率统计模型并用此模型对系统发展变化趋势进行拟合.在拟合序列的基础上利用马尔科夫链的动态转移变化建立状态转移概率矩阵,采用动态数据驱动原理对未来每一步数据进行动态预测.模型既是统计方法与数据动态驱动的结合,克服了传统的灰色马尔科夫模型中对数据内在统计规律的忽视,实证表明其预测精度较灰色马尔科夫模型预测高,具有较好的实用性.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号