首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 649 毫秒
1.
The analysis of structure–activity relationships (SARs) becomes rather challenging when large and heterogeneous compound data sets are studied. In such cases, many different compounds and their activities need to be compared, which quickly goes beyond the capacity of subjective assessments. For a comprehensive large-scale exploration of SARs, computational analysis and visualization methods are required. Herein, we introduce a two-layered SAR visualization scheme specifically designed for increasingly large compound data sets. The approach combines a new compound pair-based variant of generative topographic mapping (GTM), a machine learning approach for nonlinear mapping, with chemical space networks (CSNs). The GTM component provides a global view of the activity landscapes of large compound data sets, in which informative local SAR environments are identified, augmented by a numerical SAR scoring scheme. Prioritized local SAR regions are then projected into CSNs that resolve these regions at the level of individual compounds and their relationships. Analysis of CSNs makes it possible to distinguish between regions having different SAR characteristics and select compound subsets that are rich in SAR information.  相似文献   

2.
The extraction of SAR information from structurally diverse compound data sets is a challenging task. One of the focal points of systematic SAR analysis is the search for activity cliffs, that is, structurally similar compounds having large potency differences, from which SAR determinants can be deduced. The assessment of SAR information is usually based on pairwise similarity and potency comparisons of data set compounds. As a consequence, activity cliffs are mostly evaluated at a compound pair level. Here, we present an extension of the activity cliff concept by introducing "activity ridges" that are formed by overlapping "combinatorial" activity cliffs between participating compounds, giving rise to ridge-like structures in activity landscapes. Activity ridges are rich in SAR information. In a systematic analysis of 242 compound data sets, we have identified well-defined activity ridges in 71 different sets. In addition, an information-theoretic approach has been devised to characterize the structural composition of activity ridges. Taken together, our results show that activity ridges frequently occur in sets of active compounds and that different categories of ridges can be distinguished on the basis of their structural content. The computational identification of activity ridges provides access to compound subsets having high priority for SAR analysis.  相似文献   

3.
In pharmaceutical research, collections of active compounds directed against specific therapeutic targets usually evolve over time. Small molecule discovery is an iterative process. New compounds are discovered, alternative compound series explored, some series discontinued, and others prioritized. The design of new compounds usually takes into consideration prior chemical and structure-activity relationship (SAR) knowledge. Hence, historically grown compound collections represent a viable source of chemical and SAR information that might be utilized to retrospectively analyze roadblocks in compound optimization and further guide discovery projects. However, SAR analysis of large and heterogeneous sets of active compounds is also principally complicated. We have subjected evolving compound data sets to SAR monitoring using activity landscape models in order to evaluate how composition and SAR characteristics might change over time. Chemotype and potency distributions in evolving data sets directed against different therapeutic targets were analyzed and alternative activity landscape representations generated at different points in time to monitor the progression of global and local SAR features. Our results show that the evolving data sets studied here have predominantly grown around seed clusters of active compounds that often emerged early on, while other SAR islands remained largely unexplored. Moreover, increasing scaffold diversity in evolving data sets did not necessarily yield new SAR patterns, indicating a rather significant influence of "me-too-ism" (i.e., introducing new chemotypes that are similar to already known ones) on the composition and SAR information content of the data sets.  相似文献   

4.
As the structural diversity in a quantitative structure-activity relationship (QSAR) model increases, constructing a good model becomes increasingly difficult, and simply performing variable selection might not be sufficient to improve the model quality to make it practically usable. To combat this difficulty, an approach based on piecewise hypersphere modeling by particle swarm optimization (PHMPSO) is developed in this paper. It treats the linear models describing the sought-for subsets as hyperspheres which have different radii in the data space. According to the attribute of each hypersphere, all compounds in the training set are allocated to hyperspheres to construct submodels, and particle swarm optimization (PSO) is applied to search the optimal hyperspheres for finding satisfactory piecewise linear models. A new objective function is formulated to determine the appropriate piecewise models. The performance is assessed using three QSAR data sets. Experimental results have shown the good performance of this technique in improving the QSAR modeling.  相似文献   

5.
Gene expression data are characterized by thousands even tens of thousands of measured genes on only a few tissue samples. This can lead either to possible overfitting and dimensional curse or even to a complete failure in analysis of microarray data. Gene selection is an important component for gene expression-based tumor classification systems. In this paper, we develop a hybrid particle swarm optimization (PSO) and tabu search (HPSOTS) approach for gene selection for tumor classification. The incorporation of tabu search (TS) as a local improvement procedure enables the algorithm HPSOTS to overleap local optima and show satisfactory performance. The proposed approach is applied to three different microarray data sets. Moreover, we compare the performance of HPSOTS on these datasets to that of stepwise selection, the pure TS and PSO algorithm. It has been demonstrated that the HPSOTS is a useful tool for gene selection and mining high dimension data.  相似文献   

6.
Publicly available compound activity data have been analyzed to distinguish between compounds for which single or multiple potency measurements were available and gain insight into data confidence levels. Different potency measurements with defined end points and alternative ways to represent multiple potency values for active compounds have been evaluated in the context of SAR analysis. Approximately 78% of all compounds with multiple potency measurements were found to represent high-confidence data, which corresponded to ~10% of all activity data. The use of different types of potency measurements and alternative representations of multiple potency values changed the SAR information content of compound data sets and resulted in different activity cliff distributions. Thus, the types of activity measurements that were available and how they were used substantially impacted SAR analysis. Compounds with multiple K(i) measurements provided the most reliable basis for SAR exploration.  相似文献   

7.
It is well appreciated that the results of ligand-based virtual screening (LBVS) are much influenced by methodological details, given the generally strong compound class dependence of LBVS methods. It is less well understood to what extent structure-activity relationship (SAR) characteristics might influence the outcome of LBVS. We have assessed the hypothesis that the success of prospective LBVS depends on the SAR tolerance of screening targets, in addition to methodological aspects. In this context, SAR tolerance is rationalized as the ability of a target protein to specifically interact with series of structurally diverse active compounds. In compound data sets, SAR tolerance articulates itself as SAR continuity, i.e., the presence of structurally diverse compounds having similar potency. In order to analyze the role of SAR tolerance for LBVS, activity landscape representations of compounds active against 16 different target proteins were generated for which successful LBVS applications were reported. In all instances, the activity landscapes of known active compounds contained multiple regions of local SAR continuity. When analyzing the location of newly identified LBVS hits and their SAR environments, we found that these hits almost exclusively mapped to regions of distinct local SAR continuity. Taken together, these findings indicate the presence of a close link between SAR tolerance at the target level, SAR continuity at the ligand level, and the probability of LBVS success.  相似文献   

8.
The scaffold concept is widely applied in chemoinformatics and medicinal chemistry to organize bioactive compounds according to common core structures or associate compound classes with specific biological activities. A variety of scaffold analyses have been carried out to derive statistics for scaffold distributions, generate structural organization schemes, or identify scaffolds that preferentially occur in given compound activity classes. Herein we further extend scaffold analysis by identifying scaffolds that display defined SAR profiles consisting of multiple properties. A structural relationship-based scaffold network has been designed as the basic data structure underlying our analysis. From network representations of scaffolds extracted from compounds active against 32 different target families, scaffolds with different SAR profiles have been extracted on the basis of decision trees that capture structural and functional characteristics of scaffolds in different ways. More than 600 scaffolds and 100 scaffold clusters were assigned to 10 SAR profiles. These scaffold sets represent different activity and target selectivity profiles and are provided for further SAR investigations including, for example, the exploration of alternative analog series for a given target of target family or the design of novel compounds on the basis of scaffold(s) with desired SAR profiles.  相似文献   

9.
10.

An activity cliff (AC) is formed by a pair of structurally similar compounds with a large difference in potency. Accordingly, ACs reveal structure–activity relationship (SAR) discontinuity and provide SAR information for compound optimization. Herein, we have investigated the question if ACs could be predicted from image data. Therefore, pairs of structural analogs were extracted from different compound activity classes that formed or did not form ACs. From these compound pairs, consistently formatted images were generated. Image sets were used to train and test convolutional neural network (CNN) models to systematically distinguish between ACs and non-ACs. The CNN models were found to predict ACs with overall high accuracy, as assessed using alternative performance measures, hence establishing proof-of-principle. Moreover, gradient weights from convolutional layers were mapped to test compounds and identified characteristic structural features that contributed to successful predictions. Weight-based feature visualization revealed the ability of CNN models to learn chemistry from images at a high level of resolution and aided in the interpretation of model decisions with intrinsic black box character.

  相似文献   

11.
The transformation of high-dimensional bioactivity spaces into activity landscape representations is as of yet an unsolved problem in computational medicinal chemistry. High-dimensional activity spaces result from the experimental evaluation of compound sets on large numbers of targets. We introduce a first concept to represent and navigate high-dimensional activity landscapes that is based on a data structure termed ligand-target differentiation (LTD) map. This approach is designed to reduce the complexity of high-dimensional bioactivity spaces and enable the identification and further analysis of compound subsets with interesting activity and structural relationships. Its utility has been demonstrated using a set of more than 1400 inhibitors with exact activity measurements for varying numbers of 172 kinases.  相似文献   

12.
Biomarker discovery is a typical application from functional genomics. Due to the large number of genes studied simultaneously in microarray data, feature selection is a key step. Swarm intelligence has emerged as a solution for the feature selection problem. However, swarm intelligence settings for feature selection fail to select small features subsets. We have proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. In this study, we tested our algorithm in 11 microarray datasets for brain, leukemia, lung, prostate, and others. We show that the proposed swarm intelligence algorithm successfully increase the classification accuracy and decrease the number of selected features compared to other swarm intelligence methods.  相似文献   

13.
14.
15.
In this paper, we proposed a wavelength selection method based on random decision particle swarm optimization with attractor for near‐infrared (NIR) spectra quantitative analysis. The proposed method was incorporated with partial least square (PLS) to construct a prediction model. The proposed method chooses the current own optimal or the current global optimal to calculate the attractor. Then the particle updates its flight velocity by the attractor, and the particle state is updated by the random decision with the new velocity. Moreover, the root‐mean‐square error of cross‐validation is adopted as the fitness function for the proposed method. In order to demonstrate the usefulness of the proposed method, PLS with all wavelengths, uninformative variable elimination by PLS, elastic net, genetic algorithm combined with PLS, the discrete particle swarm optimization combined with PLS, the modified particle swarm optimization combined with PLS, the neighboring particle swarm optimization combined with PLS, and the proposed method are used for building the components quantitative analysis models of NIR spectral datasets, and the effectiveness of these models is compared. Two application studies are presented, which involve NIR data obtained from an experiment of meat content determination using NIR and a combustion procedure. Results verify that the proposed method has higher predictive ability for NIR spectral data and the number of selected wavelengths is less. The proposed method has faster convergence speed and could overcome the premature convergence problem. Furthermore, although improving the prediction precision may sacrifice the model complexity under a certain extent, the proposed method is overfitted slightly. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

16.
《Analytical letters》2012,45(18):2849-2859
ABSTRACT

A novel method was developed for the quality control of Ephedrae herba by near-infrared (NIR) spectroscopy. First, qualitative models established by discriminant analysis and support vector machine were used for the preliminary screening of unqualified samples of E. herba. Then quantitative models of ephedrine and the total alkali (ephedrine and pseudoephedrine) were established by partial least squares regression and particle swarm optimization based least square support vector machine. The contents of test samples were predicted by the established NIR quantitative models. As a result, the accuracies of unqualified identification were 98.9% by discriminant analysis and 100% by support vector machine. The performance of the particle swarm optimization based least square support vector machine models were better than the partial least squares regression models. The correlation coefficients were both more than 0.98 and relative standard errors of calibrations were less than 9% in the calibration sets of particle swarm optimization based least square support vector machine models. As for the test sets, the correlation coefficients were both more than 0.93 and the relative standard errors of prediction were less than 13%, indicating satisfactory predicted results. All of these results demonstrated that NIR spectroscopy may be a powerful tool for the quality control of E. herba.  相似文献   

17.
Multivariate curve resolution-particle swarm optimization (MCR-PSO) algorithm is proposed to exploit pure chromatographic and spectroscopic information from multi-component hyphenated chromatographic signals. This new MCR method is based on rotation of mathematically unique PCA solutions into the chemically meaningful MCR solutions. To obtain a proper rotation matrix, an objective function based on non-fulfillment of constraints is defined and is optimized using particle swarm optimization (PSO) algorithm. Initial values of rotation matrix are calculated using local rank analysis and heuristic evolving latent projection (HELP) method. The ability of MCR-PSO in resolving the chromatographic data is evaluated using simulated gas chromatography–mass spectrometry (GC–MS) and high-performance liquid chromatography–diode array detection (HPLC–DAD) data. To present a comprehensive study, different number of components and various levels of noise under proper constraints of non-negativity, unimodality and spectral normalization are considered. Calculation of the extent of rotational ambiguity in MCR solutions for different chromatographic systems using MCR-BANDS method showed that MCR-PSO solutions are always in the range of feasible solutions like true solutions. In addition, the performance of MCR-PSO is compared with other popular MCR methods of multivariate curve resolution-objective function minimization (MCR-FMIN) and multivariate curve resolution-alternating least squares (MCR-ALS). The results showed that MCR-PSO solutions are rather similar or better (in some cases) than other MCR methods in terms of statistical parameters. Finally MCR-PSO is successfully applied in the resolution of real GC–MS data. It should be pointed out that in addition to multivariate resolution of hyphenated chromatographic signals, MCR-PSO algorithm can be straightforwardly applied to other types of separation, spectroscopic and electrochemical data.  相似文献   

18.
Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases with no predefined core structures. This methodological review outlines the most common and recent methodological developments to automatically identify analogue series in large libraries. Initial approaches focused on using predefined rules to extract scaffold structures, such as the popular Bemis–Murcko scaffold. Later on, the matched molecular pair concept led to efficient algorithms to identify similar compounds sharing a common core structure by exploring many putative scaffolds for each compound. Further developments of these ideas yielded, on the one hand, approaches for hierarchical scaffold decomposition and, on the other hand, algorithms for the extraction of analogue series based on single-site modifications (so-called matched molecular series) by exploring potential scaffold structures based on systematic molecule fragmentation. Eventually, further development of these approaches resulted in methods for extracting analogue series defined by a single core structure with several substitution sites that allow convenient representations, such as R-group tables. These methods enable the efficient analysis of large data sets with hundreds of thousands or even millions of compounds and have spawned many related methodological developments.  相似文献   

19.
The search for a global minimum related to molecular electronic structure and chemical bonding has received wide attention based on some theoretical calculations at various levels of theory. Particle swarm optimization (PSO) algorithm and modified PSO have been used to predict the energetically stable/metastable states associated with a given chemical composition. Out of a variety of techniques such as genetic algorithm, basin hopping, simulated annealing, PSO, and so on, PSO is considered to be one of the most suitable methods due to its various advantages over others. We use a swarm‐intelligence based parallel code to improve a PSO algorithm in a multidimensional search space augmented by quantum chemical calculations on gas phase structures at 0 K without any symmetry constraint to obtain an optimal solution. Our currently employed code is interfaced with Gaussian software for single point energy calculations. The code developed here is shown to be efficient. Small population size (small cluster) in the multidimensional space is actually good enough to get better results with low computational cost than the typical larger population. But for larger systems also the analysis is possible. One can try with a large number of particles as well. We have also analyzed how arbitrary and random structures and the local minimum energy structures gravitate toward the target global minimum structure. At the same time, we compare our results with that obtained from other evolutionary techniques.  相似文献   

20.
Multivariate spectral analysis has been widely applied in chemistry and other fields. Spectral data consisting of measurements at hundreds and even thousands of analytical channels can now be obtained in a few seconds. It is widely accepted that before a multivariate regression model is built, a well-performed variable selection can be helpful to improve the predictive ability of the model. In this paper, the concept of traditional wavelength variable selection has been extended and the idea of variable weighting is incorporated into least-squares support vector machine (LS-SVM). A recently proposed global optimization method, particle swarm optimization (PSO) algorithm is used to search for the weights of variables and the hyper-parameters involved in LS-SVM optimizing the training of a calibration set and the prediction of an independent validation set. All the computation process of this method is automatic. Two real data sets are investigated and the results are compared those of PLS, uninformative variable elimination-PLS (UVE-PLS) and LS-SVM models to demonstrate the advantages of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号