首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 921 毫秒
1.
In any literature searh one must be able to predict which modes of expression have been used to represent the concepts of interest in the search file. It is these lingual expressions which will be looked up or will constitute the search parameters in a mechanized search. In the case of general concepts and statements, uncontrolled natural language lacks the required representational predictability, as is demonstrated, for example, for classes of chemical compounds and for types of reactions. Hence, considerable loss of relevant information is bound to occur in searches for these concepts in free text files, regardless of how advanced a computer program may be. By restricting the modes of expression as is achieved by means of an indexing language and its reliable employment, predicability is increased and, consequently, information loss is drastically reduced.—In contrast, the modes of expression for individual concepts (e.g. individual chemical substances, authors, institutions) are, by their vary nature, often sufficiently predictable in uncontrolled natural language. In these cases, the value of translating them into indexing-language expressions is often questionable. This holds true in particular if an indexing language does not (yet) provide adequate expressions to represent the concepts, and also in the case of ambiguous natural language expressions. Here, preserving the original, uncontrolled terms in the search file is advisable, at least in addition to an attempted translation into the indexing language.—The weaknesses of uncontrolled natural language files become apparent only at a rather late stage in the continual expansion of an information system. In its early stages the necessity of predictability is reduced through the effectiveness of human memory and the—initially—relatively high homogeneity of the experts' linguistic usage during that short period of time. Furthermore, loss of relevant information as a consequence of the lack of predictability will at first be minimal and will not yet be noticed as a steadily increasing source of search failure. Nor does the steady rise in costs for excogitating and phrasing queries with more and more alternatives become apparent at this stage. It is largely for these reasons that uncontrolled natural language systems have unjustifiably frequently been preferred. The typical weakness and strengths of both kinds of language suggest combining them so that their inherent strengths are retained and utilized as far as possible.  相似文献   

2.
Eijkel J 《Lab on a chip》2008,8(11):1781-1783
This paper investigates the problem of searching literature in a multidisciplinary environment. It is found that much relevant literature is not found because other disciplines use a different terminology, different units, or slightly different (but related) concepts. The paper suggests some approaches to enhance interdisciplinary understanding and improve exchange of ideas and literature.  相似文献   

3.
In the mechanized documentation of chemical literature, the definition of structural concept is very important. The usual for definitions for ring structures are inadequate. Essential ring structures are sometimes not recognized on the basis of these definitions and are therefore missed in a literature search. This is particularly true of bridged ring systems. The ring concept and ring condensation types are now redefined on a topological basis in the closest possible analogy to the intuitive approach of the chemist. In complicated molecular structures, these “fundamental rings” can be easily determined, either manually or by means of a programmed computer. The concept of the “ring complex” is defined and suggested as a preliminary screen in literature searches for ring structures. This will save machine time, and so reduce the cost of searches.  相似文献   

4.
The encoding and searching of generic chemical structures, so-called Markush structures, have received little attention in the literature of late. The ability to encode and search these complex entities is of use in various branches of chemoinformatics. We describe a general language for encoding Markush structures and algorithms for searching them and give three examples of the utility of such a system: development of general Free-Wilson analyses of chemical series, detection of controlled substances within a large database of molecular structures, and searching of large databases of virtual compounds.  相似文献   

5.
6.
7.
Combination of fingerprint-based similarity coefficients using data fusion   总被引:3,自引:0,他引:3  
Many different types of similarity coefficients have been described in the literature. Since different coefficients take into account different characteristics when assessing the degree of similarity between molecules, it is reasonable to combine them to further optimize the measures of similarity between molecules. This paper describes experiments in which data fusion is used to combine several binary similarity coefficients to get an overall estimate of similarity for searching databases of bioactive molecules. The results show that search performances can be improved by combining coefficients with little extra computational cost. However, there is no single combination which gives a consistently high performance for all search types.  相似文献   

8.
Similarity searching using a single bioactive reference structure is a well-established technique for accessing chemical structure databases. This paper describes two extensions of the basic approach. First, we discuss the use of group fusion to combine the results of similarity searches when multiple reference structures are available. We demonstrate that this technique is notably more effective than conventional similarity searching in scaffold-hopping searches for structurally diverse sets of active molecules; conversely, the technique will do little to improve the search performance if the actives are structurally homogeneous. Second, we make the assumption that the nearest neighbors resulting from a similarity search, using a single bioactive reference structure, are also active and use this assumption to implement approximate forms of group fusion, substructural analysis, and binary kernel discrimination. This approach, called turbo similarity searching, is notably more effective than conventional similarity searching.  相似文献   

9.
Three-Dimensional (3D) structural database pharmacophore searching has become a very effective approach for discovery of novel lead compounds in drug discovery. Although several commercial programs are available, these commercial programs are primarily used as a stand alone and require a local database. In recent years, the Internet has become the main medium of choice for multiuser application program distribution. Herein, we describe our development of a Web-based 3D-database pharmacophore-searching tool based on the server-client Web architecture. Both rigid and conformationally flexible searching methods are implemented. Our results show that for a typical three-center rigid pharmacophore search, the run time for searching 50 000 compounds is less than three minutes, and for four-center pharmacophore searching, the run time is less than 10 minutes on a desktop computer. For a flexible 3D-pharmacophore search, the run time for searching 50 000 compounds generally takes between one and several hours. The search results are comparable to those obtained using a commercial program. We expect that this Web-based tool will be very useful for scientists who are interested in 3D-database pharmacophore searching via the Internet.  相似文献   

10.
FTIR and Raman spectral imaging can be used to simultaneously image a latent fingerprint and detect exogenous substances deposited within it. These substances might include drugs of abuse or traces of explosives or gunshot residue. In this work, spectral searching algorithms were tested for their efficacy in finding targeted substances deposited within fingerprints. “Reverse” library searching, where a large number of possibly poor-quality spectra from a spectral image are searched against a small number of high-quality reference spectra, poses problems for common search algorithms as they are usually implemented. Out of a range of algorithms which included conventional Euclidean distance searching, the spectral angle mapper (SAM) and correlation algorithms gave the best results when used with second-derivative image and reference spectra. All methods tested gave poorer performances with first derivative and undifferentiated spectra. In a search against a caffeine reference, the SAM and correlation methods were able to correctly rank a set of 40 confirmed but poor-quality caffeine spectra at the top of a dataset which also contained 4,096 spectra from an image of an uncontaminated latent fingerprint. These methods also successfully and individually detected aspirin, diazepam and caffeine that had been deposited together in another fingerprint, and they did not indicate any of these substances as a match in a search for another substance which was known not to be present. The SAM was used to successfully locate explosive components in fingerprints deposited on silicon windows. The potential of other spectral searching algorithms used in the field of remote sensing is considered, and the applicability of the methods tested in this work to other modes of spectral imaging is discussed.  相似文献   

11.
磷酸化修饰的分析一直是蛋白质组学研究的热点之一.在鸟枪法的蛋白质组学研究中,通过在数据库检索中设定磷酸化为可变修饰可以直接鉴定磷酸化修饰的位点.但是翻译后修饰的引入会增加数据检索空间,造成鉴定灵敏度的降低.为了解决这一问题,我们构建了一种位点注释的数据库,这种数据库包含蛋白质的磷酸化位点信息,并开发了一种新的数据库检索策略用于磷酸化肽段的可靠鉴定.用不同类型的数据作为分析对象,通过Mascot检索软件对这种新的数据库检索策略进行了考察,证明了这种方法在保证鉴定结果可靠性的前提下提高了磷酸化肽段鉴定的灵敏度.  相似文献   

12.
The commercially available methods for keeping mass spectrometrists aware of a vast literature are assessed. CA Selects: Mass Speetrometry was found convenient for browsing of current literature but disappointing in its coverage. The lack of an index makes it impracticable for retrospective searching. On the other hand, the Mass Speetrometry Bulletin contains over three times more references, is sectionalized for casual browsing, and its individual issue and annual indices make it very suitable for current and retrospective searching of specific topics. Chemical Abstracts itself contains many more mass spectrometric references than its CA Selects: Mass Speetrometry derivative but that information is difficult to extract comprehensively using the volume indices provided. Both Mass Speetrometry Bulletin and Chemical Abstracts can be accessed by on-line systems. Use of computer searches proved to be excellent for current awareness and retrospective searching of selected topics within mass spectrometry. The mass spectrometrist can readily generate a tailor-made set of terms to extract just the information of personal interest from Mass Speetrometry Bulletin and/or Chemical Abstracts. A detailed search requires use of both databases. On-line searching provided faster and more comprehensive extraction of information than manual searching. On-line systems would, however, be too expensive for browsing in general mass Speetrometry. Some problems and improvements in the various services are described and the role of some publications reviewing mass Speetrometry is briefly discussed.  相似文献   

13.
Information retrieval for planning and executing research projects and for publishing results is considered a routine task that is usually neither mentioned explicitly in a scientific publication nor described in any detail. In the information searches for the preceding publication (‘Building an Organic Zeolite from a Macrocyclic TADDOL Derivative or How to Teach an Old Dog New Tricks'), we were confronted with so many problems during retrieval of the desired information about related work that we decided to deviate from this tradition. We had to use the Cambridge Structural Database, the Chemical Abstracts structure and literature databases, and the Beilstein database to the full extent of their contents, indexing, and search facilities to retrieve the necessary information about ‘organic zeolites'. In the process, we found important limitations and deficiencies in any one of these databases, and we had to conceive search procedures that we considered rather unusual even after more than 20 years of experience in searching chemistry databases. The results and, particularly, the problems encountered underline the necessity for enhanced integration of individual compound and property databases and improved standardization as a prerequisite for this.  相似文献   

14.
15.
SBASE is a project initiated to detect known domain types and predicting domain architectures using sequence similarity searching (Simon et al., Protein Seq Data Anal, 5: 39-42, 1992, Pongor et al, Nucl. Acids. Res. 21:3111-3115, 1992). The current approach uses a curated collection of domain sequences - the SBASE domain library - and standard similarity search algorithms, followed by postprocessing which is based on a simple statistics of the domain similarity network (http://hydra.icgeb.trieste.it/sbase/). It is especially useful in detecting rare, atypical examples of known domain types which are sometimes missed even by more sophisticated methodologies. This approach does not require multiple alignment or machine learning techniques, and can be a useful complement to other domain detection methodologies. This article gives an overview of the project history as well as of the concepts and principles developed within this the project.  相似文献   

16.
17.
Database searching is the technique of choice for shotgun proteomics, and to date much research effort has been spent on improving its effectiveness. However, database searching faces a serious challenge of efficiency, considering the large numbers of mass spectra and the ever fast increase in peptide databases resulting from genome translations, enzymatic digestions, and post‐translational modifications. In this study, we conducted systematic research on speeding up database search engines for protein identification and illustrate the key points with the specific design of the pFind 2.1 search engine as a running example. Firstly, by constructing peptide indexes, pFind achieves a speedup of two to three compared with that without peptide indexes. Secondly, by constructing indexes for observed precursor and fragment ions, pFind achieves another speedup of two. As a result, pFind compares very favorably with predominant search engines such as Mascot, SEQUEST and X!Tandem. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

18.
In studying molecules with unusual bonding and structures, it is desirable to be able to find all the isomers that are minima on the energy surface. A stochastic search procedure is described for seeking all the isomers on a surface defined by quantum mechanical calculations involving random kicks followed by optimization. It has been applied to searching for singlet structures for C6 using the restricted Hartree-Fock/6-311G basis set. In addition to the linear chain and ring previously investigated, 11 additional structures (A-K) were located at this level. These provide a basis for discussing qualitative bonding motifs for this carbon cluster. The application of a similar idea to searching for transition states is discussed.  相似文献   

19.
Cost comparisons and the cost effectiveness of on-line searching of information are reviewed. Topics discussed include on-line vs. manual searching, charge-out of search costs, efficacy of on-line searching, on-line vs. batch computer searching, vendor system comparisons, networking, searcher productivity, telecommunications, role of the intermediary, search transmission rates and on-line charges, editing of recorded searches, and increasing cost of on-line searching of chemical information.  相似文献   

20.
Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a number of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homology. (iii) Search parameters can be readily optimised by iteration. The strengths and limitations of probability-based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号