首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A hashing method is presented where the amount of storage required for a file can expand and shrink by very large factors. The performance of this method as measured by lookup time, insertion time and deletion time is very good even when the total storage utilization is as high as 90 percent. The user can completely control the storage utilization between two chosen bounds so that the storage requirement varies linearly with the number of records currently in the file.Unlike previous methods, no separate overflow storage pool is involved and one need not be concerned with expected and worst case requirements for overflow space. Indeed, the absence of requirements for such a separate overflow pool could allow the use of this method with primitive microprocessor operating systems.The choice of hashing functions is discussed and simulation results show great danger in blindly using the popular remainder method.Both an elementary analysis and simulation results are given.This research was supported by the National Science and Engineering Research Council of Canada.  相似文献   

2.
A performance analysis of an overflow handling method for hash files, here called repeated hashing, is reported. The basic idea of repeated hashing is to rehash the overflow records into a smaller separate storage area; the overflow records from this area are in turn hashed into a still smaller separate storage area, etc. The expected retrieval performance and the storage requirements are analysed, both for initial loading and steady state. The problem of optimally partitioning the total storage area is considered and the optimal solution is given. It is concluded, however, that the usefulness of repeated hashing is in doubt because there are methods having the same performance but requiring less maintenance.  相似文献   

3.
The dynamic external hashing proposed in this paper allocates records according to the spiral storage technique. Separators derived from the signature technique are used for distinguishing primary from overflow records and for subdividing overflow chains into segments allocated into the primary file. Single access retrieval is obtained by means of a main memory index with an entry per bucket and containing separators and pointers. While this method uses a larger index than other recent proposals, it is much more convenient regarding load factor and insertion cost. Furthermore, file expansion is directed by various control parameters, thus allowing the user to choose the most suitable policy for his application.  相似文献   

4.
The expected performance of hashing with chaining in the prime area is analyzed. The method analyzed is briefly characterized as hashing with chaining of overflow records in the prime storage area, using one or several noncoalescing chains per bucket, and with random search for empty space. The analysis is asymptotic, and it is assumed that deletions do not occur. Simple closed formulas cannot be obtained, except in the case of bucket size one, but numerical results are readily computed. The expected performance compares favorably with that of other methods for handling overflow records.  相似文献   

5.
This paper studies file designs for answering partial-match queries for dynamic files. A partial-match query is a specification of the value of zero or more fields in a record. An answer to a query consists of a listing of all records in the file satisfying the values specified.The main contribution is a general method whereby certain primary key hasing schemes can be extended to partial-match retrieval schemes. These partial-match retrieval designs can handle arbitrarily dynamic files and can be optimized with respect to the number of page faults required to answer a query.We illustrate the method by considering in detail the extension of two recent dynamic primary key hashing schemes.  相似文献   

6.
A hash structure, Overflow Indexing (OVI), using an index for the overflows is presented. The index contains one entry (key, bucket number) for each overflow. Formulas for computing the expected number of entries in the index and the standard deviation are derived and the numerical results obtained using these formulae are presented in a graph. It is concluded that storing the index in the main memory when operating on the file is feasible for small to medium-sized, and sometimes even large files. The number of probes for both a successful and unsuccessful search is one. Deletion requires two probes and insertion two or three probes. Details of OVI are presented and illustrated by simulation experiments. The structure of the index is discussed and one possible structure, hashing with dynamic buckets, is presented.  相似文献   

7.
The problem of file organization which we consider involves altering the placement of records on pages of a secondary storage device. In addition, we want this reorganization to be done in-place, i.e., using the file's original storage space for the newly reorganized file. The motivation for such a physical change is to improve the database system's performance. For example, by placing frequently and jointly accessed records on the same page or pages, we can try to minimize the number of page accesses made in answering a set of queeries. The optimal assignment (or reassignment) of records to clusters is exactly what record clustering algorithms attempt to do. However, record clustering algorithms usually do not solve the entire problem, i.e., they do not specify how to efficiently reorganize the file to reflect the clustering assignment which they determine. Our algorithm is a companion to general record clustering algorithms since it actually transforms the file. The problem of optimal file reorganization isNP-hard. Consequently, our reorganization algorithm is based on heuristics. The algorithm's time and space requirements are reasonable and its solution is near optimal. In addition, the reorganization problem which we consider in this paper is similar to the problem of join processing when indexes are used.The research of this author was partially supported by the National Science Foundation under grant IST-8696157.  相似文献   

8.
A new interpolation-based order preserving hashing algorithm suitable for on-line maintenance of large dynamic external files under sequences of four kinds of operationsinsertion, update, deletion, andorthogonal range query is proposed. The scheme, an adaptation of linear hashing, requires no index or address directory structure and utilizesO(n) space for files containingn records; all of the benefits of linear hashing are inherited by this new scheme. File implementations yielding average successful search lengths much less than 2 and average unsuccessful search lengths much less than 4 for individual records are obtainable; the actual storage required is controllable by the implementor.  相似文献   

9.
A model is developed to describe the growth of a medical records store subject to rules for microfilming or disposing of record files when they have not been used within a given period of time. Separate projections are given for the number of files in the system and for the volume of file contents. The model allows different microfilm policies to be compared and aids the long term planning of storage facilities.  相似文献   

10.
non-expansive hashing scheme, similar inputs are stored in memory locations which are close. We develop a non-expansive hashing scheme wherein any set of size from a large universe may be stored in a memory of size (any , and ), and where retrieval takes operations. We explain how to use non-expansive hashing schemes for efficient storage and retrieval of noisy data. A dynamic version of this hashing scheme is presented as well. Received: February 5, 1996  相似文献   

11.
This paper is concerned with the allocation of multi-attribute records on several disks so as to achieve high degree of concurrency of disk access when responding to partial match queries.An algorithm to distribute a set of multi-attribute records onto different disks is presented. Since our allocation method will use the principal component analysis, this concept is first introduced. We then use it to generate a set of real numbers which are the projections on the first principal component direction and can be viewed as hashing addresses.Then we propose an algorithm based upon these hashing addresses to allocate multi-attribute records onto different disks. Some experimental results show that our method can indeed be used to solve the multi-disk data allocation problem for concurrent accessing.  相似文献   

12.
A computer system manages disc storage of finite capacity c blocks. This storage must be divided among N files in such a way that the expected number of disc accesses accomplished until the necessary reorganization is maximized. Each access to the disc appends a record of a fixed length to the ith file with probability p i (i=1,h., N). The reorganization is needed when the chosen file has run out of space. It is shown that the above problem is a generalization of Banach's match-box problem known from the probability theory. A detailed separate analysis for the N=2 case and for the multivariate case is performed and some approximate results for large c are given.  相似文献   

13.
We propose a binary quantum hashing technique that allows to present binary inputs by quantum states. We prove the cryptographic properties of the quantum hashing, including its collision resistance and preimage resistance. We also give an efficient quantum algorithm that performs quantum hashing, and altogether this means that this function is quantum one-way. The proposed construction is asymptotically optimal in the number of qubits used.  相似文献   

14.
A file of records, each with an associated request probability, is dynamically maintained as a serial list. Successive requests are mutually independent. The list is reordered according to the move-to-front (MTF) rule: The requested record is moved to the front of the list. We derive the stationary distribution of search cost (=depth of requested item) by embedding in Poisson processes and derive certain finite-time stochastic ordering results for the MTF chain so embedded. A connection with cache fault probabilities is discussed. We also establish a Schur-concavity result for stationary expected search cost. © 1996 John Wiley & Sons, Inc.  相似文献   

15.
Compression of a formatted file by a minimal spanning tree (MST) is studied. Here the records of the file are considered as the nodes of a weighted undirected graph. Each record pair is connected in the graph and the corresponding arc is weighted by the sum of field lengths of those fields which differ in the two records. The actual compression is made by constructing an MST of the graph and by storing it in an economic way to preserve the information of the file. The length of the MST is a useful measure in the estimation of the power of the compression. In the paper we study upper bounds of this length, especially in the case where the field lengths of the different fields may vary. The upper bounds are derived by analyzing the so-called Gray-code sequences of the records. These sequences may be considered as spanning paths of the graph and their lengths give upper bounds of the length of the MST. In the study we show how a short spanning path can be constructed in this way. The results are also experimentally tested.  相似文献   

16.
This paper examines a partial match retrieval scheme which supports range queries for highly dynamic databases. The scheme relies on order preserving multi-attribute hashing. In general, designing optimal indexes is NP-hard. Greedy algorithms used to determine the optimal indexes for simple partial match queries are not directly applicable because there are a larger number of queries to consider in determining the optimal indexes. In this paper we present heuristic algorithms which provide near-optimal solutions. The optimisation scheme we propose can be used to design other dynamic file structures such as the grid file, BANG file and multilevel grid file to further enhance their retrieval performance taking into consideration the query distribution.  相似文献   

17.
We study the expected value of the maximum number of accesses needed to locate an element in a hashing file constructed by using an order-preserving hashing function and with collision resolution by the method of separate chaining. It is assumed that X1, …, Xn are independent [0,1]-valued random variables with common density f, and that Xi is hashed to the nXi + 1st bucket (chain). For all densities that are bounded, the expected value of the maximum number of accesses is shown to be asymptotic to log nlog log n, and the dependency of this expected value on f is made explicit by exhibiting the first few terms in the asymptotic expansion. For unbounded f, a tight upper bound is given for the expected value.  相似文献   

18.
This paper studies the problem of finding efficient deletion algorithms for the coalesced hashing method, in which a portion of memory (called the address region) serves as the range of the hash function while the rest of memory (called the cellar) is devoted solely to storing records that collide when inserted. First we present a deletion algorithm, which solves the open problem described in [6, Sect. 6.4–23]. The main result of this paper, Theorem 3, shows that the deletion algorithm preserves randomness for the special case of standard coalesced hashing (when there is no cellar), in that deleting a record is in some sense like never having inserted it. This means that the formulas for the search times (which are analyzed in [8, 9]) are still valid after deletions. There is as yet no known deletion algorithm that preserves randomness for the general case (when there is a cellar). We give some reasons why and then discuss some heuristics that seem to make deletions practical anyway.  相似文献   

19.
Cuckoo hashing is a hash table data structure introduced by Pagh and Rodler, that offers constant worst case search time. As a major contribution of this paper, we analyze modified versions of this algorithm with improved performance. Further, we provide an asymptotic analysis of the search costs of all these variants of cuckoo hashing and compare these results with the well known properties of double hashing and linear probing. The analysis is supported by numerical results. Finally, our analysis shows, that the expected number of steps of search operations can be reduced by using a modified version of cuckoo hashing instead of standard algorithms based on open addressing.  相似文献   

20.
A search tree grown from an n-long random file of numerical records is studied. Each node of the tree accommodates an ordered subfile consisting of at most (m−1) records; no particular assumptions are made about how the local search within a node is executed. The depth and the total number of comparisons of the search are shown to be asymptotically Gaussian with means a1lnn, a2lnn, and covariance matrix aijlnn. The a's depend on m and the first and second order moments of the local search time. The locally binary and sequential cases serve as an illustration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号