首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 625 毫秒
1.
Dynamic hashing     
A new file organisation called dynamic hashing is presented. The organisation is based on normal hashing, but the allocated storage space can easily be increased and decreased without reorganising the file, according to the number of records actually stored in the file. The expected storage utilisation is analysed and is shown to be approximately 69% all the time. Algorithms for inserting and deleting a record are presented and analysed. Retrieval of a record is fast, requiring only one access to secondary storage. There are no overflow records. The proposed scheme necessitates maintenance of a relatively small index structured as a forest of binary trees or slightly modified binary tries. The expected size of the index is analysed and a compact representation of the index is suggested.  相似文献   

2.
The problem of file organization which we consider involves altering the placement of records on pages of a secondary storage device. In addition, we want this reorganization to be done in-place, i.e., using the file's original storage space for the newly reorganized file. The motivation for such a physical change is to improve the database system's performance. For example, by placing frequently and jointly accessed records on the same page or pages, we can try to minimize the number of page accesses made in answering a set of queeries. The optimal assignment (or reassignment) of records to clusters is exactly what record clustering algorithms attempt to do. However, record clustering algorithms usually do not solve the entire problem, i.e., they do not specify how to efficiently reorganize the file to reflect the clustering assignment which they determine. Our algorithm is a companion to general record clustering algorithms since it actually transforms the file. The problem of optimal file reorganization isNP-hard. Consequently, our reorganization algorithm is based on heuristics. The algorithm's time and space requirements are reasonable and its solution is near optimal. In addition, the reorganization problem which we consider in this paper is similar to the problem of join processing when indexes are used.The research of this author was partially supported by the National Science Foundation under grant IST-8696157.  相似文献   

3.
A hashing method is presented where the amount of storage required for a file can expand and shrink by very large factors. The performance of this method as measured by lookup time, insertion time and deletion time is very good even when the total storage utilization is as high as 90 percent. The user can completely control the storage utilization between two chosen bounds so that the storage requirement varies linearly with the number of records currently in the file.Unlike previous methods, no separate overflow storage pool is involved and one need not be concerned with expected and worst case requirements for overflow space. Indeed, the absence of requirements for such a separate overflow pool could allow the use of this method with primitive microprocessor operating systems.The choice of hashing functions is discussed and simulation results show great danger in blindly using the popular remainder method.Both an elementary analysis and simulation results are given.This research was supported by the National Science and Engineering Research Council of Canada.  相似文献   

4.
A performance analysis of an overflow handling method for hash files, here called repeated hashing, is reported. The basic idea of repeated hashing is to rehash the overflow records into a smaller separate storage area; the overflow records from this area are in turn hashed into a still smaller separate storage area, etc. The expected retrieval performance and the storage requirements are analysed, both for initial loading and steady state. The problem of optimally partitioning the total storage area is considered and the optimal solution is given. It is concluded, however, that the usefulness of repeated hashing is in doubt because there are methods having the same performance but requiring less maintenance.  相似文献   

5.
For an ordered file of records with uniformly distributed key values, we examine an existing batched searching algorithm based on recursive use of interpolation searches. The algorithm, called Recursive Batched Interpolation Search (RBIS) in this paper, uses a divide-and-conquer technique for batched searching. The expected-case complexity of the algorithm is shown to beO(m loglog (2n/m) +m), wheren is the size of the file andm is the size of the query batch. Simulations are performed to demonstrate the savings of batched searching using RBIS. Also, simulations are performed to compare alternative batched searching algorithms which are based on either interpolation search or binary search. When the file's key values are uniformly distributed, the simulation results confirm that interpolation-search based algorithms are superior to binary-search based algorithms. However, when the file's key values are not uniformly distributed, a straight-forward batched interpolation search deteriorates quickly as the batch size increases, but algorithm RBIS still outperforms binary-search based algorithms when the batch size passes a threshold value.  相似文献   

6.
A hash structure, Overflow Indexing (OVI), using an index for the overflows is presented. The index contains one entry (key, bucket number) for each overflow. Formulas for computing the expected number of entries in the index and the standard deviation are derived and the numerical results obtained using these formulae are presented in a graph. It is concluded that storing the index in the main memory when operating on the file is feasible for small to medium-sized, and sometimes even large files. The number of probes for both a successful and unsuccessful search is one. Deletion requires two probes and insertion two or three probes. Details of OVI are presented and illustrated by simulation experiments. The structure of the index is discussed and one possible structure, hashing with dynamic buckets, is presented.  相似文献   

7.
This paper studies file designs for answering partial-match queries for dynamic files. A partial-match query is a specification of the value of zero or more fields in a record. An answer to a query consists of a listing of all records in the file satisfying the values specified.The main contribution is a general method whereby certain primary key hasing schemes can be extended to partial-match retrieval schemes. These partial-match retrieval designs can handle arbitrarily dynamic files and can be optimized with respect to the number of page faults required to answer a query.We illustrate the method by considering in detail the extension of two recent dynamic primary key hashing schemes.  相似文献   

8.
This paper examines a partial match retrieval scheme which supports range queries for highly dynamic databases. The scheme relies on order preserving multi-attribute hashing. In general, designing optimal indexes is NP-hard. Greedy algorithms used to determine the optimal indexes for simple partial match queries are not directly applicable because there are a larger number of queries to consider in determining the optimal indexes. In this paper we present heuristic algorithms which provide near-optimal solutions. The optimisation scheme we propose can be used to design other dynamic file structures such as the grid file, BANG file and multilevel grid file to further enhance their retrieval performance taking into consideration the query distribution.  相似文献   

9.
Signature file is a well-studied method in information retrieval for indexing large text databases. Because of the small index size in this method, it is a good candidate for environments where memory is scarce. This small index size, however, comes at the cost of high false positive error rate. In this paper we address the problem of high false positive error rate of signature files by introducing COCA filters, a new variation of Bloom filters which exploits the co-occurrence probability of words in documents to reduce the false positive error. We show experimentally that by using this technique in real document collections we can reduce the false positive error by up to 21 times, for the same index size. It is also shown that in some extreme cases this technique is even able to completely eliminate the false positive error. COCA filters can be considered as a good replacement for Bloom filters wherever the co-occurrence of any two members of the universe is identifiable.  相似文献   

10.
We design and analyze integrated ways of applying the signature file approach for text and attributes simultaneously. In traditional signature file methods, the records are stored sequentially in the main file; for every record, a hash-coded abstraction of it (record signature) is created and stored in the signature file (usually, sequentially). To resolve a query, the signature file is scanned; the signatures retrieved correspond to all the qualifying records, plus some false drops.Here, we extend some signature file methods, namely superimposed coding and disjoint coding, to handle text and attributes. We develop a mathematical model and derive formulas for the optimal choice of parameters. The proposed methods achieve significant performance improvements, because they can take advantage of the skewed distribution of the queries. Depending on the query frequencies, the false drop probability can be reduced 40–45 times ( 97% savings), for the same overhead.Also with the University of Maryland Institute for Advanced Computer Studies (U.M.I.A.C.S.). This research was sponsored partially by the National Science Foundation under the grant DCR-86-16833.  相似文献   

11.
12.
Compression of a formatted file by a minimal spanning tree (MST) is studied. Here the records of the file are considered as the nodes of a weighted undirected graph. Each record pair is connected in the graph and the corresponding arc is weighted by the sum of field lengths of those fields which differ in the two records. The actual compression is made by constructing an MST of the graph and by storing it in an economic way to preserve the information of the file. The length of the MST is a useful measure in the estimation of the power of the compression. In the paper we study upper bounds of this length, especially in the case where the field lengths of the different fields may vary. The upper bounds are derived by analyzing the so-called Gray-code sequences of the records. These sequences may be considered as spanning paths of the graph and their lengths give upper bounds of the length of the MST. In the study we show how a short spanning path can be constructed in this way. The results are also experimentally tested.  相似文献   

13.
The records of a data base can be accessed from other records or from a set of data items (inverted access, primary and secondary index of IMS, search keys of CODASYL etc.) which we call selectors. The implementation of this selectors can use different techniques as hash coding, inverted lists or hierarchical index (indexed sequential, B-trees etc…) We consider here the last one and we search for a given set of selectors an optimal index structure. We show how this problem can be put as the search of an optimal rooted tree among the partial subgraphs of a given graph G (this problem is known in graph theory as Steiner problem) and we give several properties which allow the graph G to be notabily reduced. Then we present a branch and bound algorithm to solve this problem.  相似文献   

14.
Recursive linear hashing is a hashing technique proposed for files which can grow and shrink dynamically. The scheme is an extension of linear hashing, a method originally proposed by Litwin, but unlike Litwin's scheme, it does not require conventional overflow pages. In this paper, we investigate the application of recursive linear hashing to partial match retrieval problems. Consistent with the results for primary key retrieval, recursive linear hashing performs better than the conventional scheme on these problems, especially at high load factors.  相似文献   

15.
The problem examined in this report is the calculation of the average wasted space at the end of the block when variable length records are inserted in the file. Previous efforts are based in approximations. Here, a detailed analysis based on Markov chains gives the exact solution. A framework is presented which shows the relations between the previous approaches. The proposed model includes the previous models as special limiting cases. Simulation results close to the analytic results are also presented.This research was sponsored partially by the National Science Foundation under the grants DCR-86-16833, IRI-8719458 and IRI-8958546 and by the Air Force Office of Scientific Research under grant AFOSR-89-0303.  相似文献   

16.
A new access control scheme for the growth of users and files in file protection systems is proposed. Our scheme associates each user with a user key and each file with a file key. For each key, there are some corresponding locks, that can be extracted from a nonsingular matrix. Through simple operations on keys and locks, privacy decisions of the protection system can easily be revealed. Furthermore, by employing our method, whenever a new user or file is joined, the corresponding key values and lock values will be determined immediately without changing any previously defined keys and locks.  相似文献   

17.
This paper studies the design of a system to handle partial-match queries from a file. A partial-match query is a specification of the value of zero or more fields in a record. An answer to a query consists of a listing of all records in the file satisfying the values specified.Aho and Ullman have considered the case when the probability that a field is specified in a query is independent of which other fields are specified. We consider here the more realistic case where the independence assumption is dropped. This leads to an optimization problem more general than that considered by Aho and Ullman. The major part of the paper is the presentation of an efficient algorithm for solving this optimization problem.  相似文献   

18.
The distribution of wasted space at the end of fixed size file blocks is studied when the blocks are filled with variable length records in some predescribed sequence, e. g. in key order, without splitting the records into consecutive blocks. The distribution of the length of wasted space is shown to be the same as the distribution of the residual life of components in the renewal theory. Previous estimates of the size of this kind of wasted space are shown to be erroneous. The validity of the assumptions is verified by simulation and the range of the applicability of the methods presented is explored.  相似文献   

19.
Sections and individual fields in variable format records of a file are identified by means of a hierarchy of label symbols in the records and located by means of direct table lookup. The same identification system is used in transaction records, which may contain operation codes with data sections as their operands.  相似文献   

20.
The expected performance of hashing with chaining in the prime area is analyzed. The method analyzed is briefly characterized as hashing with chaining of overflow records in the prime storage area, using one or several noncoalescing chains per bucket, and with random search for empty space. The analysis is asymptotic, and it is assumed that deletions do not occur. Simple closed formulas cannot be obtained, except in the case of bucket size one, but numerical results are readily computed. The expected performance compares favorably with that of other methods for handling overflow records.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号