共查询到20条相似文献,搜索用时 10 毫秒
1.
Recent studies into the use of a selection of similarity coefficients, when applied to searches of chemical databases represented by binary fingerprints, have shown considerable variation in their retrieval performance and in the sets of compounds being retrieved. The main factor influencing performance is the density distribution of the bitstrings for the active class, a feature which is closely related to molecular size. If this is the case when these coefficients are applied to similarity searches, then we would expect considerable variation in performance when applied to dissimilarity methods, namely clustering and compound selection. Here we report on several studies which have been undertaken to investigate the relative performance of 13 association and correlation coefficients, which have been shown to exhibit complementary performance in similarity searches, when applied to hierarchical and nonhierarchical clustering methods and to a compound selection methodology. Results suggest that the correlation coefficients perform consistently well for clustering and compound selection, as does the Baroni-Urbani/Buser association coefficient. Surprisingly, these often outperform the Tanimoto coefficient, while the Simple Match (effectively the complement of the Squared Euclidean Distance) performs very poorly. 相似文献
2.
3.
4.
Trepalin SV Gerasimenko VA Kozyukov AV Savchuk NP Ivaschenko AA 《Journal of chemical information and computer sciences》2002,42(2):249-258
Some modifications were introduced into the previously described Centroid diversity sorting algorithm, which uses cosine similarity metric. The modified algorithm is suitable for the work with large databases on personal computers. For example, for diversity sorting of the database with the size greater than a million of records, less than 9 h are required (Pentium III, 800 MHz). The problem of selecting new compounds into the existing collection is examined to reach the maximum diversity of the collection. The article describes the new algorithm for the selection of heterocyclic compounds. 相似文献
5.
6.
The computer program described utilizes peak intensity data for ASTM infrared file searching. Peaks in unknown spectra are classified into five groups according to their rela-tive intensities, and the scores for matches with the ASTM data are calculated by means of the intensity data. A test search with 135 compounds proves the excellent performance of the proposed method. The advantages of the system are that correct answers can be found easily and that no particular attention is necessary for selection of the peaks to be entered. 相似文献
7.
8.
9.
It is practically impossible in a short period of time to synthesize and test all compounds in any large exhaustive chemical library. We discuss rational approaches to selecting representative subsets of virtual libraries that help direct experimental synthetic efforts for both targeted and diverse library design. For targeted library design, we consider principles based on the similarity to lead molecules. In the case of diverse library design, we discuss algorithms aimed at the selection of both diverse and representative subsets of the entire chemical library space. We illustrate methodologies with several practical examples. 相似文献
10.
Two succinct linear notation systems to encode the structure of polybenzenoid aromatic hydrocarbons are exemplified. Both notation systems use a labeled dual inner graph to represent the hydrocarbon. A molecular similarity index ranging from unity (identical molecules) to zero (completely different molecules) is defined based on a comparison of the linear notations for a pair of compounds. The similarity index procedure is applied to a correlation of the carcinogenic properties of the benzenoid hydrocarbons. 相似文献
11.
A large-scale similarity search investigation has been carried out on 266 well-defined compound activity classes extracted from the ChEMBL database. The analysis was performed using two widely applied two-dimensional (2D) fingerprints that mark opposite ends of the current performance spectrum of these types of fingerprints, i.e., MACCS structural keys and the extended connectivity fingerprint with bond diameter four (ECFP4). For each fingerprint, three nearest neighbor search strategies were applied. On the basis of these search calculations, a similarity search profile of the ChEMBL database was generated. Overall, the fingerprint search campaign was surprisingly successful. In 203 of 266 test cases (~76%), a compound recovery rate of at least 50% was observed with at least the better performing fingerprint and one search strategy. The similarity search profile also revealed several general trends. For example, fingerprint searching was often characterized by an early enrichment of active compounds in database selection sets. In addition, compound activity classes have been categorized according to different similarity search performance levels, which helps to put the results of benchmark calculations into perspective. Therefore, a compendium of activity classes falling into different search performance categories is provided. On the basis of our large-scale investigation, the performance range of state-of-the-art 2D fingerprinting has been delineated for compound data sets directed against a wide spectrum of pharmaceutical targets. 相似文献
12.
13.
Electronic nose sensor signals provide a digital fingerprint of the product in analysis, which can be subsequently investigated by means of chemometrics. In this paper, the fingerprint characterisation of electronic nose data has been studied by means of a novel chemometric approach based on the partial ordering technique and the Hasse matrix. This matrix can be associated to each data sequence and the similarity between two sequences can be evaluated with the definition of a distance between the corresponding Hasse matrices. Since all the signals achieved along time are intrinsically ordered, the data provided by electronic nose can be also considered as sequential data and consequently characterized by means of the proposed approach. The similarity/diversity measure has been here applied in order to characterize the class discrimination capability of each electronic nose sensor: extra virgin olive oil samples of different geographical origin have been considered and Hasse distances have been used to select the sensors which appear more able to discriminate the olive oil origins. The distance based on the Hasse matrix has showed some useful properties and proved to be able to link each electronic nose time profile to a meaningful mathematical term (the Hasse matrix), which can be consequently studied by multivariate analysis. 相似文献
14.
Summary Organometallic compounds of general formula (SCN)2M(NCSeHgR)2 (M=CoII, NiII, R=n-C5H11,i-C5H11) have been prepared. They behave as Lewis acids, forming complexes with pyridine and 2,2-bipyridyl, characterized by elemental analysis, molecular weight, molar conductance, i.r. spectral (4000–200 cm–1), electronic spectral and magnetic susceptibility measurements. The Lewis acids are monomeric with bridging thiocyanate, or selenocyanate between M2+ and Hg2+. Cobalt and nickel acquire tetrahedral and octahedral configurations respectively through axial bridging, whereas mercury retains its linearity. Pyridine links to the metal in the Lewis acid and forms L2(SCN)2M(NCSeHgR)2 complexes. Bipyridyl ruptures the NCX bridge and forms cationic-anionic [M(bipy)3][(NCS)(NCSe)HgR]2 complexes. 相似文献
15.
Similarity measures for molecules are of basic importance in chemical, biological, and pharmaceutical applications. We introduce a molecular similarity measure defined directly on the annotated molecular graph, based on iterative graph similarity and optimal assignments. We give an iterative algorithm for the computation of the proposed molecular similarity measure, prove its convergence and the uniqueness of the solution, and provide an upper bound on the required number of iterations necessary to achieve a desired precision. Empirical evidence for the positive semidefiniteness of certain parametrizations of our function is presented. We evaluated our molecular similarity measure by using it as a kernel in support vector machine classification and regression applied to several pharmaceutical and toxicological data sets, with encouraging results. 相似文献
16.
Textural analysis is done in the case of the thermotropic liquid crystal (LC), 4-heptyloxybenzylidene-4′-nonyloxyaniline, 7O.O9, using a polarising microscope attached with a hot stage with a high-resolution camera. Natural images are highly structured: their pixels exhibit strong dependencies and carry important information about the structure of the objects. In this article, we consider the structural similarity index measure parameter computed as a function of the temperature. The results exhibit abrupt changes with temperature showing different liquid crystalline phases. This statistical image analysis is compared with the differential scanning calorimeter data and good agreement was found. The proposed methodology is very sensitive and reliable technique to identify the LC phases. 相似文献
17.
Richard S. Hutte Neil G. Johansen Marianne F. Legier 《Journal of separation science》1990,13(6):421-426
The analysis of sulfur-containing compounds using fused silica capillary columns and the Sulfur Chemiluminescence Detector has been investigated. This combination of an inert chromatographic system and a high sensitivity, selective detector provides significant advantages for the analysis of low levels of sulfur compounds in complex matrices over existing techniques. Capillary columns coated with thick films (1–4 μm) of methyl silicone stationary phase permit separation of most sulfur containing compounds and, when used with sub-ambient column temperatures, these columns can be used for the separation of sulfur gases. The effects of stationary film thickness, column length, and internal diameter for the measurement of sulfur compounds in hydrocarbon matrices has been determined. 相似文献
18.
Yao YH Dai Q Nan XY He PA Nie ZM Zhou SP Zhang YZ 《Journal of computational chemistry》2008,29(10):1632-1639
On the basis of a class of 2D graphical representations of DNA sequences, sensitivity analysis has been performed, showing the high-capability of the proposed representations to take into account small modifications of the DNA sequences. And sensitivity analysis also indicates that the absolute differences of the leading eigenvalues of the L/L matrices associated with DNA increase with the increase of the number of the base mutations. Besides, we conclude that the similarity analysis method based on the correlation angles can better eliminate the effects of the lengths of DNA sequences if compared with the method using the Euclidean distances. As application, the examination of similarities/dissimilarities among the coding sequences of the first exon of beta-globin gene of different species has been performed by our method, and the reasonable results verify the validity of our method. 相似文献
19.
Single-cell RNA sequencing technologies have revolutionized biomedical research by providing an effective means to profile gene expressions in individual cells. One of the first fundamental steps to perform the in-depth analysis of single-cell sequencing data is cell type classification and identification. Computational methods such as clustering algorithms have been utilized and gaining in popularity because they can save considerable resources and time for experimental validations. Although selecting the optimal features (i.e., genes) is an essential process to obtain accurate and reliable single-cell clustering results, the computational complexity and dropout events that can introduce zero-inflated noise make this process very challenging. In this paper, we propose an effective single-cell clustering algorithm based on the ensemble feature selection and similarity measurements. We initially identify the set of potential features, then measure the cell-to-cell similarity based on the subset of the potentials through multiple feature sampling approaches. We construct the ensemble network based on cell-to-cell similarity. Finally, we apply a network-based clustering algorithm to obtain single-cell clusters. We evaluate the performance of our proposed algorithm through multiple assessments in real-world single-cell RNA sequencing datasets with known cell types. The results show that our proposed algorithm can identify accurate and consistent single-cell clustering. Moreover, the proposed algorithm takes relative expression as input, so it can easily be adopted by existing analysis pipelines. The source code has been made publicly available at https://github.com/jeonglab/scCLUE. 相似文献
20.
Methods that can screen large databases to retrieve a structurally diverse set of compounds with desirable bioactivity properties are critical in the drug discovery and development process. This paper presents a set of such methods that are designed to find compounds that are structurally different to a certain query compound while retaining its bioactivity properties (scaffold hops). These methods utilize various indirect ways of measuring the similarity between the query and a compound that take into account additional information beyond their structure-based similarities. The set of techniques that are presented capture these indirect similarities using approaches based on analyzing the similarity network formed by the query and the database compounds. Experimental evaluation shows that most of these methods substantially outperform previously developed approaches both in terms of their ability to identify structurally diverse active compounds as well as active compounds in general. 相似文献