首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, we propose a new method for clustering of chemical databases based on the representation of measurements of structural similarity onto multidimensional spaces. The proposed method permits the tuning of the clustering process through the selection of the dimension of the projection space, the normal vectors and the sensibility of the projection process. The structural similarity of each element regarding to the database elements is projected onto the defined spaces generating clusters that represent the characteristics and diversity of the database and whose size and characteristics can be easily adjusted.  相似文献   

2.
This paper describes a program for 3D similarity searching, called CLIP (for Candidate Ligand Identification Program), that uses the Bron-Kerbosch clique detection algorithm to find those structures in a file that have large structures in common with a target structure. Structures are characterized by the geometric arrangement of pharmacophore points and the similarity between two structures calculated using modifications of the Simpson and Tanimoto association coefficients. This modification takes into account the fact that a distance tolerance is required to ensure that pairs of interatomic distances can be regarded as equivalent during the clique-construction stage of the matching algorithm. Experiments with HIV assay data demonstrate the effectiveness and the efficiency of this approach to virtual screening.  相似文献   

3.
This paper reports an evaluation of both graph-based and fingerprint-based measures of structural similarity, when used for virtual screening of sets of 2D molecules drawn from the MDDR and ID Alert databases. The graph-based measures employ a new maximum common edge subgraph isomorphism algorithm, called RASCAL, with several similarity coefficients described previously for quantifying the similarity between pairs of graphs. The effectiveness of these graph-based searches is compared with that resulting from similarity searches using BCI, Daylight and Unity 2D fingerprints. Our results suggest that graph-based approaches provide an effective complement to existing fingerprint-based approaches to virtual screening.  相似文献   

4.
5.
6.
This article describes the use of the ICL Distributed Array Processor (DAP) for the automatic classification of chemical structure databases using the Jarvis-Patrick clustering method. This method is based upon the calculation of a table containing the nearest neighbors for each of the molecules in the database which is to be clustered. These nearest neighbors can be identified very efficiently using the DAP since it allows up to 4096 molecules to be compared with a specified molecule in parallel. Experiments with files of 4096 and 8192 structures from the Fine Chemicals Database show that clustering with the DAP is up to 6.7 times as fast as using a highly efficient, inverted file algorithm on an IBM 3083 mainframe.  相似文献   

7.
From 1970 to 1984, the U.S. Government cooperated with various organizations in the support of the development, maintenance, and distribution of a computer-based chemical information system of spectral and other numeric databases, known as the NIH/EPA Chemical Information System (CIS). This presentation discusses the history of the project and related activities in the area of numeric database activities and summarizes the current state of the project.  相似文献   

8.
This paper evaluates the use of the fuzzy k-means clustering method for the clustering of files of 2D chemical structures. Simulated property prediction experiments with the Starlist file of logP values demonstrate that use of the fuzzy k-means method can, in some cases, yield results that are superior to those obtained with the conventional k-means method and with Ward's clustering method. Clustering of several small sets of agrochemical compounds demonstrate the ability of the fuzzy k-means method to highlight multicluster membership and to identify outlier compounds, although the former can be difficult to interpret in some cases.  相似文献   

9.
Defining and using microbial spectral databases   总被引:1,自引:0,他引:1  
This work shows how fingerprints of mass spectral patterns from microbial isolates are affected by variations in instrumental condition, by sample environment, and by sample handling factors. It describes a novel method by which pattern distortions can be mathematically corrected for variations in factors not amenable to experimental control. One uncontrollable variable is "between-batch" differences in culture media. Another, relevant for determination of noncultured extracts, is differences between the cells' environmental experience (e.g., starved environmental extracts versus cultured standards). The method suggests that, after a single growth cycle on a solid medium (perhaps, a selective one), pyrolysis MS spectra of microbial isolates can be algorithmically compensated and an unknown isolate identified using a spectral database defined by culture on a different (perhaps, nonselective) medium. This reduces identification time to as few as 24 h from sample collection. The concept also proposes a possible way to compensate certain noncultured, nonisolated samples (e.g., cells concentrated from urine or impacted from aerosol or semi-selectively extracted by immunoaffinity methods from heavily contaminated matrices) for identification within half an hour. Using the method, microbial mass spectra from different labs can be assembled into coherent databases similar to those routinely used to identify pure compounds. This type of data treatment is applicable for rapid detection in biowarfare and bioterror events as well as in forensic, research, and clinical laboratory contexts.  相似文献   

10.
11.
Molecular target identification is of central importance to drug discovery. Here, we developed a computational approach, named bioactivity profile similarity search (BASS), for associating targets to small molecules by using the known target annotations of related compounds from public databases. To evaluate BASS, a bioactivity profile database was constructed using 4296 compounds that were commonly tested in the US National Cancer Institute 60 human tumor cell line anticancer drug screen (NCI-60). Each compound was used as a query to search against the entire bioactivity profile database, and reference compounds with similar bioactivity profiles above a threshold of 0.75 were considered as neighbor compounds of the query. Potential targets were subsequently linked to the identified neighbor compounds by using the known targets of the query compound. About 45% of the predicted compound-target associations were successfully verified retrospectively, suggesting the possible application of BASS in identifying the targets of uncharacterized compounds and thus providing insight into the study of promiscuity and polypharmacology. Furthermore, BASS identified a significant fraction of structurally diverse compounds with similar bioactivities, indicating its feasibility of "scaffold hopping" in searching novel molecules against the target of interest.  相似文献   

12.
We present a new method (fFLASH) for the virtual screening of compound databases that is based on explicit three-dimensional molecular superpositions. fFLASH takes the torsional flexibility of the database molecules fully into account, and can deal with an arbitrary number of conformation-dependent molecular features. The method utilizes a fragmentation-reassembly approach which allows for an efficient sampling of the conformational space. A fast clique-based pattern matching algorithm generates alignments of pairs of adjacent molecular fragments on the rigid query molecule that are subsequently reassembled to complete database molecules. Using conventional molecular features (hydrogen bond donors and acceptors, charges, and hydrophobic groups) we show that fFLASH is able to rapidly produce accurate alignments of medium-sized drug-like molecules. Experiments with a test database containing a diverse set of 1780 drug-like molecules (including all conformers) have shown that average query processing times of the order of 0.1 seconds per molecule can be achieved on a PC.  相似文献   

13.
The dynamics of chemical reaction networks often takes place on widely differing time scales--from the order of nanoseconds to the order of several days. This is particularly true for gene regulatory networks, which are modeled by chemical kinetics. Multiple time scales in mathematical models often lead to serious computational difficulties, such as numerical stiffness in the case of differential equations or excessively redundant Monte Carlo simulations in the case of stochastic processes. We present a model reduction method for study of stochastic chemical kinetic systems that takes advantage of multiple time scales. The method applies to finite projections of the chemical master equation and allows for effective time scale separation of the system dynamics. We implement this method in a novel numerical algorithm that exploits the time scale separation to achieve model order reductions while enabling error checking and control. We illustrate the efficiency of our method in several examples motivated by recent developments in gene regulatory networks.  相似文献   

14.
A recently proposed, multi-parameter correlation: log k (25 degrees C)=s(f) (Ef + Nf), where Ef is electrofugality and Nf is nucleofugality, for the substituent and solvent effects on the rate constants for solvolyses of benzhydryl and substituted benzhydryl substrates, is re-evaluated. A new formula (Ef=log k (RCl/EtOH/25 degrees C) -1.87), where RCl/EtOH refers to ethanolysis of chlorides, reproduces published values of Ef satisfactorily, avoids multi-parameter optimisations and provides additional values of Ef. From the formula for Ef, it is shown that the term (sfxEf) is compatible with the Hammett-Brown (rho+sigma+) equation for substituent effects. However, the previously published values of N(f) do not accurately account for solvent and leaving group effects (e.g. nucleofuge Cl or X), even for benzhydryl solvolyses; alternatively, if the more exact, two-parameter term, (sfxNf) is used, calculated effects are less accurate. A new formula (Nf=6.14 + log k(BX/any solvent/25 degrees C)), where BX refers to solvolysis of the parent benzhydryl as electrofuge, defines improved Nf values for benzhydryl substrates. The new formulae for Ef and Nf are consistent with an assumption that sf=1.00(,) and so improved correlations for benzhydryl substrates can be obtained from the additive formula: log k(RX/any solvent/25 degrees C)=(Ef + Nf). Possible extensions of this approach are also discussed.  相似文献   

15.
In a constantly expanding world of chemical and environmental information sources, the need for their evaluation gains more and more importance. This paper presents a comparative evaluation of datasources of online databases and databases on CD-ROM (called CD-ROMs in this paper) in the field of environmental chemicals. The approach is based on research results gained in the years 1996/1997. The authors are aware that changes in the database industry may lead to different results. Before the actual evaluation process can be carried out, two major procedures are necessary, namely, the selection of sets of datasources and the definition of evaluation criteria. In order to perform the difficult task of an evaluation based on several criteria, a general order relation has to be introduced. Methods of partially ordered set theory are applied, and the results are visualized by the technique of Hasse diagrams. On the basis of these evaluation results, the datasources are grouped and then evaluated. It will be shown that there are groups of datasources with quite specific property profiles, and only two groups turn out to be relatively better than the others.  相似文献   

16.
Recently a method (RASCAL) for determining graph similarity using a maximum common edge subgraph algorithm has been proposed which has proven to be very efficient when used to calculate the relative similarity of chemical structures represented as graphs. This paper describes heuristics which simplify a RASCAL similarity calculation by taking advantage of certain properties specific to chemical graph representations of molecular structure. These heuristics are shown experimentally to increase the efficiency of the algorithm, especially at more distant values of chemical graph similarity.  相似文献   

17.
18.
For the conversion of nonstructural chemical databases to structure databases, a series of algorithms to find the closest match between existing names to names in a reference database are described. On the basis of the best match, new fields such as the Chemical Abstracts Service Registry Number (CASRN) or structures were added to the database.  相似文献   

19.
It is shown that activation analysis is especially suited to serve as a basis for determining the chemical similarity between samples defined by their trace element concentration patterns. The general problem of classification and identification is discussed. The nature of possible classification structures and their approriate clustering strategies is considered. A practical computer method is suggested and its application as well as the graphical representation of classification results are given. The possibility for classification using information theory is mentioned. Classification of chemical elements is discussed and practically realized after Hadamard transformation of the concerntration variation patterns in a series of samples.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号