首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
对XML数据建立有效的索引,是左右XML数据处理性能的重要因素.现有的索引和存储策略,大部分以丢失结构信息为代价,不利于结构查询和更新.XMLSchema作为描述XML文档结构信息的标准,对文档和查询路径进行有效性验证提供保证,基于此提出了一种基于XMLmSchema模式约束的XML文档数据索引技术SBXI,用于文档数据存储和查询的导航,提高了存储和查询效率,具有较高的空间利用率和较低的索引维护代价,并支持含有多个谓词的复杂查询.  相似文献   

2.
Many data dissemination techniques have been proposed for wireless sensor networks (WSNs) to facilitate data dissemination and query processing. However, these techniques may not work well in a large scale sensor network where a huge amount of sensing data is generated. In this paper, we propose an integrated distributed connected dominating set based indexing (CBI) data dissemination scheme to support scalable handling of large amount of sensing data in large scale WSNs. Our CBI can minimize the use of limited network and computational resources while providing timely responses to queries. Moreover, our data dissemination framework ensures scalability and load balance as well as adaptivity in the presence of dynamic changes. Analysis and simulations are conducted to evaluate the performance of our CBI scheme. The results show that the CBI scheme outperforms the external storage-based scheme, local storage-based scheme and the data-centric storage-based scheme in overall performance.  相似文献   

3.
We consider efficient indexing methods for conditioning graphs, which are a form of recursive decomposition for Bayesian networks. We compare two well-known methods for indexing, a top-down method and a bottom-up method, and discuss the redundancy that each of these suffer from. We present a new method for indexing that combines the advantages of each model in order to reduce this redundancy. We also introduce the concept of an update manager, which is a node in the conditioning graph that controls when other nodes update their current index. Empirical evaluations over a suite of standard test networks show a considerable reduction both in the amount of indexing computation that takes place, and the overall runtime required by the query algorithm.  相似文献   

4.
Similarity search has been proved suitable for searching in large collections of unstructured data objects. A number of practical index data structures for this purpose have been proposed. All of them have been devised to process single queries sequentially. However, in large-scale systems such as Web Search Engines indexing multi-media content, it is critical to deal efficiently with streams of queries rather than with single queries. In this paper we show how to achieve efficient and scalable performance in this context. To this end we transform a sequential index based on clustering into a distributed one and devise algorithms and optimizations specially tailored to support high-performance parallel query processing.  相似文献   

5.
Graphs are important structures to model complex relationships such as chemical compounds, proteins, geometric or hierarchical parts, and XML documents. Given a query graph, indexing has become a necessity to retrieve similar graphs quickly from large databases. We propose a novel technique for indexing databases, whose entries can be represented as graph structures. Our method starts by representing the topological structure of a graph as well as that of its subgraphs as vectors in which the components correspond to the sorted laplacian eigenvalues of the graph or subgraphs. By doing a nearest neighbor search around the query spectra, similar but not necessarily isomorphic graphs are retrieved. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

6.
可用性观点下的烟叶质量粗糙集评价方法   总被引:1,自引:0,他引:1  
谭旭  毛太田  邹凯 《运筹与管理》2015,24(3):219-226
通过对现代卷烟产业中烟叶质量新需求的理解和归纳,构建了可用性观点下的烟叶质量综合评价新指标体系。考虑到实际烟叶质量评价中的数据的复杂性和问题求解的特殊性,将基于等价关系的Pawlak粗糙集模型扩展为基于上、下近似相似关系的扩展粗糙集模型,并设计了相应的可用性观点下烟叶质量粗糙集智能评价模型,在实现了不依赖主观先验信息求取各指标客观权重的同时,进一步引入了专家的主观权重信息,以达到对烟叶“可用性”需求的动态调整。文章首次尝试了基于定量化解决途径来应用和阐释烟叶的“可用性”概念,文末的实证分析验证了本文方法的可行性和一定程度的优越性。  相似文献   

7.
李倩  孙林岩  鲍亮 《运筹与管理》2009,18(6):117-125
本文基于克隆选择学说及基于克隆选择学说及生物免疫响应过程的相关机理,提出用于指数化投资的免疫记忆克隆算法,并将其应用于指数化投资组合优化构建模型的求解,旨在探索指数化投资的优化构建策略。文章首先提出多目标的指数化投资组合构建模型。其次,分别设计了适用于指数化投资组合构建策略的抗原、抗体、亲和度函数、克隆选择算子、免疫记忆算子和相应的进化算法。该算法有效避免了传统遗传算法所存在的计算后期解的多样性差、易早熟以及收敛速度慢等缺点。同时,提出了限制投资组合中股票数量的启发式算法。最后,使用包括上证180指数在内的6组世界主要股票市场指数及其成份股的历史数据对模型及算法进行测算,结果表明算法具有良好的求解能力和收敛速度,所建模型的合理性和有效性亦被论证,模型和算法均具有很强的实践价值;  相似文献   

8.
This paper generalizes the dynamic text indexing problem, introduced in [15], to insertion and deletion of strings. The problem is to quickly answer on-line queries about the occurrences of arbitrary pattern strings in a text that may change due to insertion or deletion of strings from it. To treat strings as atomic objects, we provide new sequential techniques and related data structures, which combine the suffix tree with the naming technique used in parallel computation on strings. We also introduce a geometric interpretation of the problem of finding the occurrences of a pattern in a given substring of the text. As a result, the algorithm allows the insertion in the text of a stringSinO(|S| log(n + |S|)) amortized time, and the deletion from the text of a stringSinO(|S| log n) amortized time, wherenis the length of the current text. A pattern search requiresO(p log p + upd ( + log p) + pocc) worst-case time, wherepis the length of the pattern andpoccis the number of its occurrences in the current text, obtained after the execution ofupdupdate operations. This solution requiresO(n2 log n) space, which is not initialized.We also provide a technique to reduce the space toO(n log n), yielding a solution that requiresO((p + upd) log p + pocc) query time in the worst-case,O(|S| log3/2(|S| + n)) amortized time to insert a stringSin, andO(|S| log3/2n) amortized time to delete a stringSfrom the current text.Furthermore, we use our techniques to solve the novel on-line dynamic tree matching problem that requires the on-line detection of the occurrences of an arbitrary subtree in a forest of ordered labeled trees. The forest may change due to insertion or deletion of subtrees or by renaming of some nodes. Such a problem is solved by a simple reduction to the dynamic text indexing problem.  相似文献   

9.
In the dynamic text indexing problem, a text string has to be maintained under string insertions and deletions in order to answer on-line queries about arbitrary pattern occurrences. By means of some new techniques and data structures, we achieve improved worst-case bounds. We show that finding allpoccoccurrences of a pattern of lengthpin the current text of lengthntakesO(p + pocc + updlog p + log n) time, whereupdis the number of text uptakes performed so far; inserting or deleting a string of lengthsfrom the current text takesO(s log(s + n)) time.  相似文献   

10.
The MultidimensionalB-tree (MDBT) is a new method for multiple attribute indexing which uses B-trees to maintain the filial sets at each level and imposes an ordering on these filial sets in order to ensure efficient searching for various associative queries. In this paper, we show that the MDBT provides an attractive alternative to other indexing structures when frequent changes to the database occur. We present algorithms for maintaining the MDBT structure when insertions or deletions are posted which also account for some storage reclamation. Procedures for evaluating the average and worst-case times of our algorithms are given, showing that the maintenance of the MDBT structure can be done at a relatively low cost.  相似文献   

11.
The suffix tree data structure has been intensively described, studied and used in the eighties and nineties, its linear-time construction counterbalancing his space-consuming requirements. An equivalent data structure, the suffix array, has been described by Manber and Myers in 1990. This space-economical structure has been neglected during more than a decade, its construction being too slow. Since 2003, several linear-time suffix array construction algorithms have been proposed, and this structure has slowly replaced the suffix tree in many string processing problems. All these constructions are building the suffix array from the text, and any edit operation on the text leads to the construction of a brand new suffix array. In this article, we are presenting an algorithm that modifies the suffix array and the Longest Common Prefix (LCP) array when the text is edited (insertion, substitution or deletion of a letter or a factor). This algorithm is based on a recent four-stage algorithm developed for dynamic Burrows–Wheeler Transforms (BWT). For minimizing the space complexity, we are sampling the Suffix Array, a technique used in BWT-based compressed indexes. We furthermore explain how this technique can be adapted for maintaining a sample of the Extended Suffix Array, containing a sample of the Suffix Array, a sample of the Inverse Suffix Array and the whole LCP array. Our practical experiments show that it operates very well in practice, being quicker than the fastest suffix array construction algorithm.  相似文献   

12.
The dynamic planar point location problem is the task of maintaining a dynamic set S of n nonintersecting (except possibly at endpoints) line segments in the plane under the following operations:
• Locate (: point): Report the segment immediately above , i.e., the first segment intersected by an upward vertical ray starting at ;
• Insert (: segment): Add segment to the collection of segments;
• Delete (: segment): Remove segment from the collection of segments.
We present a solution which requires space O(n) and has query and insertion time O(log n log log n) and deletion time O(log2n). The bounds for insertions and deletions are amortized. A query time below O(log2n) was previously only known for monotone subdivisions and subdivisions consisting of horizontal segments and required nonlinear space.  相似文献   

13.
The index (or spectral radius) of a simple graph is the largest eigenvalue of its adjacency matrix. For connected graphs of fixed order and size the graphs with maximal index are not yet identified (in the general case). It is known (for a long time) that these graphs are nested split graphs (or threshold graphs). In this paper we use the eigenvector techniques for getting some new (lower and upper) bounds on the index of nested split graphs. Besides we give some computational results in order to compare these bounds.  相似文献   

14.
In the last years we have witnessed remarkable progress in providing efficient algorithmic solutions to the problem of computing best journeys (or routes) in schedule-based public transportation systems. We have now models to represent timetables that allow us to answer queries for optimal journeys in a few milliseconds, also at a very large scale. Such models can be classified into two types: those representing the timetable as an array, and those representing it as a graph. Array-based models have been shown to be very effective in terms of query time, while graph-based ones usually answer queries by computing shortest paths, and hence they are suitable to be combined with the speed-up techniques developed for road networks.In this paper, we study the behavior of graph-based models in the prominent case of dynamic scenarios, i.e., when delays might occur to the original timetable. In particular, we make the following contributions. First, we consider the graph-based reduced time-expanded model and give a simplified and optimized routine for handling delays, and a re-engineered and fine-tuned query algorithm. Second, we propose a new graph-based model, namely the dynamic timetable model, natively tailored to efficiently incorporate dynamic updates, along with a query algorithm and a routine for handling delays. Third, we show how to adapt the ALT algorithm to such graph-based models. We have chosen this speed-up technique since it supports dynamic changes, and a careful implementation of it can significantly boost its performance. Finally, we provide an experimental study to assess the effectiveness of all proposed models and algorithms, and to compare them with the array-based state of the art solution for the dynamic case. We evaluate both new and existing approaches by implementing and testing them on real-world timetables subject to synthetic delays.Our experimental results show that: (i) the dynamic timetable model is the best model for handling delays; (ii) graph-based models are competitive to array-based models with respect to query time in the dynamic case; (iii) the dynamic timetable model compares favorably with both the original and the reduced time-expanded model regarding space; (iv) combining the graph-based models with speed-up techniques designed for road networks, such as ALT, is a very promising approach.  相似文献   

15.
We propose algorithms to perform two new operations on an arrangement of line segments in the plane, represented by a trapezoidal map: the split of the map along a given vertical line D, and the union of two trapezoidal maps computed in two vertical slabs of the plane that are adjacent through a vertical line D. The data structure we use is a modified Influence Graph, still allowing dynamic insertions and deletions of line segments in the map. The algorithms for both operations run in O(sD logn+log2n) time, where n is the number of line segments in the map, and sD is the number of line segments intersected by D.  相似文献   

16.
Scheduling problems in agriculture are often solved using techniques such as linear programming (the multi-period formulation) and dynamic programming. But it is difficult to obtain an optimal schedule with these techniques for any but the smallest problems, because the model is unwieldly and much time is needed to solve the problem. Therefore, a new algorithm, a heuristic, has been developed to handle scheduling problems in agriculture. It is based on a search technique (i.e. hill-climbing) supported by a strong heuristic evaluation function. In this paper the heuristic performance is compared with dynamic programming. The heuristic offers near-optimal solutions and is much faster than the dynamic programming model. When tested against dynamic programming the difference in results was about 3%. This heuristic could probably also be applied in an industrial environment (e.g. agribusiness or road construction).  相似文献   

17.

Privacy-preserving data splitting is a technique that aims to protect data privacy by storing different fragments of data in different locations. In this work we give a new combinatorial formulation to the data splitting problem. We see the data splitting problem as a purely combinatorial problem, in which we have to split data attributes into different fragments in a way that satisfies certain combinatorial properties derived from processing and privacy constraints. Using this formulation, we develop new combinatorial and algebraic techniques to obtain solutions to the data splitting problem. We present an algebraic method which builds an optimal data splitting solution by using Gröbner bases. Since this method is not efficient in general, we also develop a greedy algorithm for finding solutions that are not necessarily minimally sized.

  相似文献   

18.
19.
The indexing problem is where a text is preprocessed and subsequent queries of the form “Find all occurrences of pattern P in the text” are answered in time proportional to the length of the query and the number of occurrences. In the dictionary matching problem a set of patterns is preprocessed and subsequent queries of the form “Find all occurrences of dictionary patterns in text T” are answered in time proportional to the length of the text and the number of occurrences.There exist efficient worst-case solutions for the indexing problem and the dictionary matching problem, but none that find approximate occurrences of the patterns, i.e., where the pattern is within a bound edit (or Hamming) distance from the appropriate text location.In this paper we present a uniform deterministic solution to both the indexing and the general dictionary matching problem with one error. We preprocess the data in time O(n log2 n), where n is the text size in the indexing problem and the dictionary size in the dictionary matching problem. Our query time for the indexing problem is O(m log n log log n + tocc), where m is the query string size and tocc is the number of occurrences. Our query time for the dictionary matching problem is O(n log3 d log log d + tocc), where n is the text size and d the dictionary size. The time bounds above apply to both bounded and unbounded alphabets.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号