首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
This paper studies the design of a system to handle partial-match queries from a file. A partial-match query is a specification of the value of zero or more fields in a record. An answer to a query consists of a listing of all records in the file satisfying the values specified.Aho and Ullman have considered the case when the probability that a field is specified in a query is independent of which other fields are specified. We consider here the more realistic case where the independence assumption is dropped. This leads to an optimization problem more general than that considered by Aho and Ullman. The major part of the paper is the presentation of an efficient algorithm for solving this optimization problem.  相似文献   

2.
Recursive linear hashing is a hashing technique proposed for files which can grow and shrink dynamically. The scheme is an extension of linear hashing, a method originally proposed by Litwin, but unlike Litwin's scheme, it does not require conventional overflow pages. In this paper, we investigate the application of recursive linear hashing to partial match retrieval problems. Consistent with the results for primary key retrieval, recursive linear hashing performs better than the conventional scheme on these problems, especially at high load factors.  相似文献   

3.
This paper examines a partial match retrieval scheme which supports range queries for highly dynamic databases. The scheme relies on order preserving multi-attribute hashing. In general, designing optimal indexes is NP-hard. Greedy algorithms used to determine the optimal indexes for simple partial match queries are not directly applicable because there are a larger number of queries to consider in determining the optimal indexes. In this paper we present heuristic algorithms which provide near-optimal solutions. The optimisation scheme we propose can be used to design other dynamic file structures such as the grid file, BANG file and multilevel grid file to further enhance their retrieval performance taking into consideration the query distribution.  相似文献   

4.
Cartesian product (CP) files have been shown to be very effective for partial match retrieval. Further the Disk Modulo (DM) allocation method has been shown to be a simple and effective method for allocating CP files onto multiple independently accessible disks to further facilitate partial match retrieval. In this paper, a useful criterion for the DM method to be the best allocation method for a given CP file is presented. The presented criterion is much more general than that suggested previously which makes the DM allocation method much more applicable for real applications.  相似文献   

5.
A hashing method is presented where the amount of storage required for a file can expand and shrink by very large factors. The performance of this method as measured by lookup time, insertion time and deletion time is very good even when the total storage utilization is as high as 90 percent. The user can completely control the storage utilization between two chosen bounds so that the storage requirement varies linearly with the number of records currently in the file.Unlike previous methods, no separate overflow storage pool is involved and one need not be concerned with expected and worst case requirements for overflow space. Indeed, the absence of requirements for such a separate overflow pool could allow the use of this method with primitive microprocessor operating systems.The choice of hashing functions is discussed and simulation results show great danger in blindly using the popular remainder method.Both an elementary analysis and simulation results are given.This research was supported by the National Science and Engineering Research Council of Canada.  相似文献   

6.
A new access control scheme for the growth of users and files in file protection systems is proposed. Our scheme associates each user with a user key and each file with a file key. For each key, there are some corresponding locks, that can be extracted from a nonsingular matrix. Through simple operations on keys and locks, privacy decisions of the protection system can easily be revealed. Furthermore, by employing our method, whenever a new user or file is joined, the corresponding key values and lock values will be determined immediately without changing any previously defined keys and locks.  相似文献   

7.
The study of partial match file designs is continued. This study is concerned with a class of file designs which properly contains theABD designs of Rivest. A new family of partial match files is presented, the worst case performance is determined, and the implementation of these files is discussed.  相似文献   

8.
A performance analysis of an overflow handling method for hash files, here called repeated hashing, is reported. The basic idea of repeated hashing is to rehash the overflow records into a smaller separate storage area; the overflow records from this area are in turn hashed into a still smaller separate storage area, etc. The expected retrieval performance and the storage requirements are analysed, both for initial loading and steady state. The problem of optimally partitioning the total storage area is considered and the optimal solution is given. It is concluded, however, that the usefulness of repeated hashing is in doubt because there are methods having the same performance but requiring less maintenance.  相似文献   

9.
The average cost for answering partial-match queries can be dramatically reduced by storing multiple copies of the data, each with a different clustering. We analyse the cost benefits (in terms of page accesses) of this arrangement and present heuristic algorithms for determining a near-minimal-cost file organisation for a given probability distribution of queries. We also show how constraining the range of values for specific attributes affects the usefulness of maintaining multiple copies.  相似文献   

10.
Since a file is usually stored on a disk, the response time to a query is dominated by the disk access time. In order to reduce the disk access time, a file can be stored on several independently accessible disks. In this paper, we discuss the problem of allocating buckets in a file among disks such that the maximum disk accessing concurrency can be achieved. We are particularly concerned with the disk allocation problem for binary Cartesian product files. A new allocation method is first proposed for the cases when the number (m) of available disks is a power of 2. Then it is extended to fit the cases wherem is not a power of 2. The proposed algorithm has a near strict optimal performance for a partial match query in which the number of unspecified attributes is greater than a small number (5 or 6).This work was supported in part by NSF Grant DCR-8405498.  相似文献   

11.
The dynamic external hashing proposed in this paper allocates records according to the spiral storage technique. Separators derived from the signature technique are used for distinguishing primary from overflow records and for subdividing overflow chains into segments allocated into the primary file. Single access retrieval is obtained by means of a main memory index with an entry per bucket and containing separators and pointers. While this method uses a larger index than other recent proposals, it is much more convenient regarding load factor and insertion cost. Furthermore, file expansion is directed by various control parameters, thus allowing the user to choose the most suitable policy for his application.  相似文献   

12.
13.
Dynamic hashing     
A new file organisation called dynamic hashing is presented. The organisation is based on normal hashing, but the allocated storage space can easily be increased and decreased without reorganising the file, according to the number of records actually stored in the file. The expected storage utilisation is analysed and is shown to be approximately 69% all the time. Algorithms for inserting and deleting a record are presented and analysed. Retrieval of a record is fast, requiring only one access to secondary storage. There are no overflow records. The proposed scheme necessitates maintenance of a relatively small index structured as a forest of binary trees or slightly modified binary tries. The expected size of the index is analysed and a compact representation of the index is suggested.  相似文献   

14.
It is often necessary to join (naturally) two or more fuzzy relations to answer a user's query. However, the result of a join may not be what one expects. In this paper, we study the decomposition problem of fuzzy relation schemes. We show that the algorithm proposed by Aho et al. [1] to test lossless join decomposition for classical relations can be applied in the fuzzy domain when particularizations are induced by functional dependencies.  相似文献   

15.
Signature file is a well-studied method in information retrieval for indexing large text databases. Because of the small index size in this method, it is a good candidate for environments where memory is scarce. This small index size, however, comes at the cost of high false positive error rate. In this paper we address the problem of high false positive error rate of signature files by introducing COCA filters, a new variation of Bloom filters which exploits the co-occurrence probability of words in documents to reduce the false positive error. We show experimentally that by using this technique in real document collections we can reduce the false positive error by up to 21 times, for the same index size. It is also shown that in some extreme cases this technique is even able to completely eliminate the false positive error. COCA filters can be considered as a good replacement for Bloom filters wherever the co-occurrence of any two members of the universe is identifiable.  相似文献   

16.
17.
In commercial and administrative ADP it often happens that there is one large master file and many output files are to be made from this. One possible solution to avoid using the master file many times is to construct from it a few extraction files and from these compile the output files. In the following paper a method is described to define the extraction files and to optimize their contents by zero-one programming. An algorithm and an example are presented in a simplified case.  相似文献   

18.
LSI潜在语义信息检索模型   总被引:5,自引:0,他引:5  
本文介绍了基于向量空间的信息检索方法 ,检索词和文件之间的关系表示成一个矩阵 ,查寻信息表示为检索词权重的向量 ,通过求查寻和文件向量的夹角余弦确定出数据库中的相关文件 .使用矩阵的 QR分解和奇异值分解 ( SVD)来处理数据库本身的不确定性 ,本文的目的是说明线性代数中的基本概念可以很好解决信息检索 ( IR)问题  相似文献   

19.
In the present paper, we answer a question raised in the paper Constructions and Bounds for Unconditionally Secure Non-Interactive Commitment Schemes, by Blundo et al., 2002, showing that there is a close relation between unconditionally secure commitment schemes and unconditionally secure authentication schemes, and that an unconditionally secure commitment scheme can be built from such an authentication scheme and an unconditionally secure cipher system. To investigate the opposite direction, we define optimal commitment systems and show that these must be resolvable design commitment schemes. Then, a proof is given that the resolvable design commitment schemes are a composition of an authentication system and a cipher system and the conclusion follows that this is the case for all optimal commitment systems. We also show how to build optimal schemes from transversal designs that are easy to build and can be more efficiently implemented than the proposal in the previously cited paper.  相似文献   

20.
In the last years we have witnessed remarkable progress in providing efficient algorithmic solutions to the problem of computing best journeys (or routes) in schedule-based public transportation systems. We have now models to represent timetables that allow us to answer queries for optimal journeys in a few milliseconds, also at a very large scale. Such models can be classified into two types: those representing the timetable as an array, and those representing it as a graph. Array-based models have been shown to be very effective in terms of query time, while graph-based ones usually answer queries by computing shortest paths, and hence they are suitable to be combined with the speed-up techniques developed for road networks.In this paper, we study the behavior of graph-based models in the prominent case of dynamic scenarios, i.e., when delays might occur to the original timetable. In particular, we make the following contributions. First, we consider the graph-based reduced time-expanded model and give a simplified and optimized routine for handling delays, and a re-engineered and fine-tuned query algorithm. Second, we propose a new graph-based model, namely the dynamic timetable model, natively tailored to efficiently incorporate dynamic updates, along with a query algorithm and a routine for handling delays. Third, we show how to adapt the ALT algorithm to such graph-based models. We have chosen this speed-up technique since it supports dynamic changes, and a careful implementation of it can significantly boost its performance. Finally, we provide an experimental study to assess the effectiveness of all proposed models and algorithms, and to compare them with the array-based state of the art solution for the dynamic case. We evaluate both new and existing approaches by implementing and testing them on real-world timetables subject to synthetic delays.Our experimental results show that: (i) the dynamic timetable model is the best model for handling delays; (ii) graph-based models are competitive to array-based models with respect to query time in the dynamic case; (iii) the dynamic timetable model compares favorably with both the original and the reduced time-expanded model regarding space; (iv) combining the graph-based models with speed-up techniques designed for road networks, such as ALT, is a very promising approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号