首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Topic analysis of search engine user queries is an important task, since successful exploitation of the topic of queries can result in the design of new information retrieval algorithms for more efficient search engines. Identification of topic changes within a user search session is a key issue in analysis of search engine user queries. This study presents an application of Markov chains in the area of search engine research to automatically identify topic changes in a user session by using statistical characteristics of queries, such as time intervals, query reformulation patterns and the continuation/shift status of the previous query. The findings show that Markov chains provide fairly successful results for automatic new topic identification with a high level of estimation for topic continuations and shifts. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

2.
This paper presents an algebraic formalism for reasoning on finite increasing sequences over Boolean algebras in general and on generalizations of rough set concepts in particular. We argue that these generalizations are suitable for modeling relevance of documents in an information retrieval system.  相似文献   

3.
Math search is a new area of research with many enabling technologies but also many challenges. Some of the enabling technologies include XML, XPath, XQuery, and MathML. Some of the challenges involve enabling search systems to recognize mathematical symbols and structures. Several math search projects have made considerable progress in meeting those challenges. One of the remaining challenges is the creation and implementation of a math query language that enables the general users to express their information needs intuitively yet precisely. This paper will present such a language and detail its features. The new math query language offers an alternative way to describe mathematical expressions that is more consistent and less ambiguous than conventional mathematical notation. In addition, the language goes beyond the Boolean and proximity query syntax found in standard text search systems. It defines a powerful set of wildcards that are deemed important for math search. These wildcards provide for more precise structural search and multi-levels of abstractions. Three new sets of wildcards and their implementation details will also be discussed.   相似文献   

4.
This paper studies file designs for answering partial-match queries for dynamic files. A partial-match query is a specification of the value of zero or more fields in a record. An answer to a query consists of a listing of all records in the file satisfying the values specified.The main contribution is a general method whereby certain primary key hasing schemes can be extended to partial-match retrieval schemes. These partial-match retrieval designs can handle arbitrarily dynamic files and can be optimized with respect to the number of page faults required to answer a query.We illustrate the method by considering in detail the extension of two recent dynamic primary key hashing schemes.  相似文献   

5.
We study iterative methods for solving a set of sparse non-negative tensor equations (multivariate polynomial systems) arising from data mining applications such as information retrieval by query search and community discovery in multi-dimensional networks. By making use of sparse and non-negative tensor structure, we develop Jacobi and Gauss-Seidel methods for solving tensor equations. The multiplication of tensors with vectors are required at each iteration of these iterative methods, the cost per iteration depends on the number of non-zeros in the sparse tensors. We show linear convergence of the Jacobi and Gauss-Seidel methods under suitable conditions, and therefore, the set of sparse non-negative tensor equations can be solved very efficiently. Experimental results on information retrieval by query search and community discovery in multi-dimensional networks are presented to illustrate the application of tensor equations and the effectiveness of the proposed methods.  相似文献   

6.
LSI潜在语义信息检索模型   总被引:5,自引:0,他引:5  
本文介绍了基于向量空间的信息检索方法 ,检索词和文件之间的关系表示成一个矩阵 ,查寻信息表示为检索词权重的向量 ,通过求查寻和文件向量的夹角余弦确定出数据库中的相关文件 .使用矩阵的 QR分解和奇异值分解 ( SVD)来处理数据库本身的不确定性 ,本文的目的是说明线性代数中的基本概念可以很好解决信息检索 ( IR)问题  相似文献   

7.
This paper studies topological properties of different topologies that are possible on the space of documents as they are induced by queries in a query space together with a similarity function between queries and documents. The main topologies studied here are the retrieval topology (introduced by Everett and Cater) and the similarity topology (introduced by Egghe and Rousseau).The studied properties are the separation properties T0, T1, and T2 (Hausdorff), proximity and connectedness. Full characterizations are given for the diverse topologies to be T0, T1, or T2. It is shown that the retrieval topology is not necessarily a proximity space, while the similarity topology and the pseudo-metric topology always are proximity spaces. A characterization of connectedness in terms of the Boolean NOT-operator is given, hereby showing the intimate relationship between IR and topology.  相似文献   

8.
This article presents a survey of techniques for ranking results in search engines, with emphasis on link-based ranking methods and the PageRank algorithm. The problem of selecting, in relation to a user search query, the most relevant documents from an unstructured source such as the WWW is discussed in detail. The need for extending classical information retrieval techniques such as boolean searching and vector space models with link-based ranking methods is demonstrated. The PageRank algorithm is introduced, and its numerical and spectral properties are discussed. The article concludes with an alternative means of computing PageRank, along with some example applications of this new method.  相似文献   

9.
This paper studies the design of a system to handle partial-match queries from a file. A partial-match query is a specification of the value of zero or more fields in a record. An answer to a query consists of a listing of all records in the file satisfying the values specified.Aho and Ullman have considered the case when the probability that a field is specified in a query is independent of which other fields are specified. We consider here the more realistic case where the independence assumption is dropped. This leads to an optimization problem more general than that considered by Aho and Ullman. The major part of the paper is the presentation of an efficient algorithm for solving this optimization problem.  相似文献   

10.
《Fuzzy Sets and Systems》2004,148(1):85-104
In this paper, we present an application of association rules to query refinement. Starting from an initial set of documents retrieved from the web, text transactions are constructed and association rules are extracted. A fuzzy extension of text transactions and association rules is employed, where the presence of the terms (items) in the documents (transactions) is determined with a value between 0 and 1. The obtained rules offer the user additional terms to be added to the query with the purpose of guiding the search and improving the retrieval.  相似文献   

11.
This paper proposes an information retrieval (IR) model based on possibilistic directed networks. The relevance of a document w.r.t a query is interpreted by two degrees: the necessity and the possibility. The necessity degree evaluates the extent to which a given document is relevant to a query, whereas the possibility degree evaluates the reasons of eliminating irrelevant documents. This new interpretation of relevance led us to revisit the term weighting scheme by explicitly distinguishing between informative and non-informative terms in a document. Experiments carried out on three standard TREC collections show the effectiveness of the model.  相似文献   

12.
In this paper, we are interested in taking preferences into account for a family of queries inspired by the antidivision. An antidivision query aims at retrieving the elements associated with none of the elements of a specified set of values. We suggest the introduction of preferences inside such queries with the following specificities: (i) the user gives his/her preferences in an ordinal way and (ii) the preferences apply to the divisor which is defined as a hierarchy of sets. Different uses of the hierarchy are investigated, which leads to queries conveying different semantics and the property of the result delivered is characterized. Furthermore, the case where a conjunctive stratified antidivision query returns an empty set of answers is dealt with, and an approach aimed at relaxing such queries is proposed.  相似文献   

13.
This paper examines a partial match retrieval scheme which supports range queries for highly dynamic databases. The scheme relies on order preserving multi-attribute hashing. In general, designing optimal indexes is NP-hard. Greedy algorithms used to determine the optimal indexes for simple partial match queries are not directly applicable because there are a larger number of queries to consider in determining the optimal indexes. In this paper we present heuristic algorithms which provide near-optimal solutions. The optimisation scheme we propose can be used to design other dynamic file structures such as the grid file, BANG file and multilevel grid file to further enhance their retrieval performance taking into consideration the query distribution.  相似文献   

14.
15.
For a submitted query to multiple search engines finding relevant results is an important task. This paper formulates the problem of aggregation and ranking of multiple search engines results in the form of a minimax linear programming model. Besides the novel application, this study detects the most relevant information among a return set of ranked lists of documents retrieved by distinct search engines. Furthermore, two numerical examples aree used to illustrate the usefulness of the proposed approach.  相似文献   

16.
B—模糊集合代数和广义互信息公式   总被引:1,自引:0,他引:1  
基于两种概率的区分,推导出了一个广义Shannon熵公式和一个广义互信息公式。后者和模糊性有关,并且柯用于语言和感觉中的信息度量。为了由原子语句为真的条件概率求出合语句为真的条件概率,提出了一个遵循存尔运算的模糊集合代数。所谓的模糊信息被还原为概率信息。新的理论在经典理论-概率论,集合论及Shannon信息论-的基础上容易理解。  相似文献   

17.
Semantic hashing   总被引:1,自引:0,他引:1  
We show how to learn a deep graphical model of the word-count vectors obtained from a large set of documents. The values of the latent variables in the deepest layer are easy to infer and give a much better representation of each document than Latent Semantic Analysis. When the deepest layer is forced to use a small number of binary variables (e.g. 32), the graphical model performs “semantic hashing”: Documents are mapped to memory addresses in such a way that semantically similar documents are located at nearby addresses. Documents similar to a query document can then be found by simply accessing all the addresses that differ by only a few bits from the address of the query document. This way of extending the efficiency of hash-coding to approximate matching is much faster than locality sensitive hashing, which is the fastest current method. By using semantic hashing to filter the documents given to TF-IDF, we achieve higher accuracy than applying TF-IDF to the entire document set.  相似文献   

18.
This paper surveys methods for query of systems of social accounts that are constructed from collections of intersecting data sets. Such query methods are used to ascertain, e.g., the number of people who possess certain attribute characteristics not explicitly provided by an original account or survey. Three approaches to the query problem are considered: (1) a generalized inverse solution suggested by Ijiri (1965); (2) a row reduced echelon matrix formulation devised by Hellerman and Cavallo (1977); and (3) a Boolean matrix operation introduced by Homer (1974). The latter two methods are extended to provide a general approach for using existing multi‐attribute information for the construction of new systems of social accounts.  相似文献   

19.
We propose succinct data structures for text retrieval systems supporting document listing queries and ranking queries based on the tf*idf (term frequency times inverse document frequency) scores of documents. Traditional data structures for these problems support queries only for some predetermined keywords. Recently Muthukrishnan proposed a data structure for document listing queries for arbitrary patterns at the cost of data structure size. For computing the tf*idf scores there has been no efficient data structures for arbitrary patterns.Our new data structures support these queries using small space. The space is only 2/ times the size of compressed documents plus 10n bits for a document collection of length n, for any 0<1. This is much smaller than the previous O(nlogn) bit data structures. Query time is O(m+qlogn) for listing and computing tf*idf scores for all q documents containing a given pattern of length m. Our data structures are flexible in a sense that they support queries for arbitrary patterns.  相似文献   

20.
This paper presents a new image retrieval scheme using visually significant point features. The clusters of points around significant curvature regions (high, medium, and weak type) are extracted using a fuzzy set theoretic approach. Some invariant color features are computed from these points to evaluate the similarity between images. A set of relevant and non-redundant features is selected using the mutual information based minimum redundancy-maximum relevance framework. The relative importance of each feature is evaluated using a fuzzy entropy based measure, which is computed from the sets of retrieved images marked relevant and irrelevant by the users. The performance of the system is evaluated using different sets of examples from a general purpose image database. The robustness of the system is also shown when the images undergo different transformations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号