首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The development of robust QSAR models to predict the activity of molecules of β-secretase inhibitors is an area of interest due to the increase of Alzheimer’s disease in patients in the global population. In this paper, we present a proposal based on the use of relative distance matrices as input data to the QSAR algorithms. These matrices store measurements of distances between the structural characteristics of pairs of molecules and between the molecules and a structural pattern extracted from the whole data set, thus efficiently representing a correlation between structural changes and activity. For the building of the classification and regression models support vector machine, tree complex and Gaussian process algorithms have been used; and for the validation of the models cross-validation, bootstrapping and y-randomizing techniques have been applied. The results obtained are close to 100% in accuracy and area under receiver operating characteristic values in classification, and close to 1.0 for r 2 and 0.1 for root mean square error in regression in training and in external validation, proving the ‘goodness’ of the proposal.  相似文献   

2.
Implicit solvent models divide solvation free energies into polar and nonpolar additive contributions, whereas polar and nonpolar interactions are inseparable and nonadditive. We present a feature functional theory (FFT) framework to break this ad hoc division. The essential ideas of FFT are as follows: (i) representability assumption: there exists a microscopic feature vector that can uniquely characterize and distinguish one molecule from another; (ii) feature‐function relationship assumption: the macroscopic features, including solvation free energy, of a molecule is a functional of microscopic feature vectors; and (iii) similarity assumption: molecules with similar microscopic features have similar macroscopic properties, such as solvation free energies. Based on these assumptions, solvation free energy prediction is carried out in the following protocol. First, we construct a molecular microscopic feature vector that is efficient in characterizing the solvation process using quantum mechanics and Poisson–Boltzmann theory. Microscopic feature vectors are combined with macroscopic features, that is, physical observable, to form extended feature vectors. Additionally, we partition a solvation dataset into queries according to molecular compositions. Moreover, for each target molecule, we adopt a machine learning algorithm for its nearest neighbor search, based on the selected microscopic feature vectors. Finally, from the extended feature vectors of obtained nearest neighbors, we construct a functional of solvation free energy, which is employed to predict the solvation free energy of the target molecule. The proposed FFT model has been extensively validated via a large dataset of 668 molecules. The leave‐one‐out test gives an optimal root‐mean‐square error (RMSE) of 1.05 kcal/mol. FFT predictions of SAMPL0, SAMPL1, SAMPL2, SAMPL3, and SAMPL4 challenge sets deliver the RMSEs of 0.61, 1.86, 1.64, 0.86, and 1.14 kcal/mol, respectively. Using a test set of 94 molecules and its associated training set, the present approach was carefully compared with a classic solvation model based on weighted solvent accessible surface area. © 2017 Wiley Periodicals, Inc.  相似文献   

3.
4.
5.
Datasets of molecular compounds often contain outliers, that is, compounds which are different from the rest of the dataset. Outliers, while often interesting may affect data interpretation, model generation, and decisions making, and therefore, should be removed from the dataset prior to modeling efforts. Here, we describe a new method for the iterative identification and removal of outliers based on a k‐nearest neighbors optimization algorithm. We demonstrate for three different datasets that the removal of outliers using the new algorithm provides filtered datasets which are better than those provided by four alternative outlier removal procedures as well as by random compound removal in two important aspects: (1) they better maintain the diversity of the parent datasets; (2) they give rise to quantitative structure activity relationship (QSAR) models with much better prediction statistics. The new algorithm is, therefore, suitable for the pretreatment of datasets prior to QSAR modeling. © 2014 Wiley Periodicals, Inc.  相似文献   

6.
Although the UK cervical screening programme has reduced mortality associated with invasive disease, advancement from a high-throughput predictive methodology that is cost-effective and robust could greatly support the current system. We combined analysis by attenuated total reflection Fourier-transform infrared spectroscopy of cervical cytology with self-learning classifier eClass. This predictive algorithm can cope with vast amounts of multidimensional data with variable characteristics. Using a characterised dataset [set A: consisting of UK cervical specimens designated as normal (n = 60), low-grade (n = 60) or high-grade (n = 60)] and one further dataset (set B) consisting of n = 30 low-grade samples, we set out to determine whether this approach could be robustly predictive. Variously extending the training set consisting of set A with set B data produced good classification rates with three two-class cascade classifiers. However, a single three-class classifier was equally efficient, producing a user-friendly, applicable methodology with improved interpretability (i.e., better classification with only one set of fuzzy rules). As data from set B were added incrementally to the training set, the model learned and evolved. Additionally, monitoring of results of the set B low-grade specimens (known to be low-grade cervical cytology specimens) provided the opportunity to explore the possibility of distinguishing patients likely to progress towards invasive disease. eClass exhibited a remarkably robust predictive power in a user-friendly fashion (i.e., high throughput, ease of use) compared to other classifiers (k-nearest neighbours, support vector machines, artificial neural networks). Development of eClass to classify such datasets for applications such as screening exhibits robustness in identifying a dichotomous marker of invasive disease progression.  相似文献   

7.
8.
A novel method (in the context of quantitative structure–activity relationship (QSAR)) based on the k nearest neighbour (kNN) principle, has recently been introduced for the derivation of predictive structure–activity relationships. Its performance has been tested for estimating the estrogen binding affinity of a diverse set of 142 organic molecules. Highly predictive models have been obtained. Moreover, it has been demonstrated that consensus-type kNN QSAR models, derived from the arithmetic mean of individual QSAR models were statistically robust and provided more accurate predictions than the great majority of the individual QSAR models. Finally, the consensus QSAR method was tested with 3D QSAR and log?P data from a widely used steroid benchmark data set.  相似文献   

9.
10.
This article describes the use of the ICL Distributed Array Processor (DAP) for the automatic classification of chemical structure databases using the Jarvis-Patrick clustering method. This method is based upon the calculation of a table containing the nearest neighbors for each of the molecules in the database which is to be clustered. These nearest neighbors can be identified very efficiently using the DAP since it allows up to 4096 molecules to be compared with a specified molecule in parallel. Experiments with files of 4096 and 8192 structures from the Fine Chemicals Database show that clustering with the DAP is up to 6.7 times as fast as using a highly efficient, inverted file algorithm on an IBM 3083 mainframe.  相似文献   

11.
12.
We present a simple geometrical model in which the molecular shape is approximated by a small number of parameters for the dumbbell-like middle group and cylinder-like alkyl end chains. The pair potentials of nearest neighbours are approximated by the sum of anisotropic repulsive terms due to the contact of the different parts of the molecules and attraction due to dispersion forces between different parts of the molecules and attraction due to dispersion forces between different parts of the molecules. Since the number of nearest neighbours at the smectic C/A phase transition is unchanged, the resulting pair potentials are able to describe well the cooperative behaviour of the molecules in the non-ordered layers of the smectic C and A phases. The dependence of the tilt angle on the alkyl chain length and on the temperature and other thermodynamic and structural properties can be interpreted qualitatively very well.  相似文献   

13.
14.
15.
16.
17.
18.
19.
Triplet ESR spectra of irradiated single crystals of β-TKN have been investigated as a function of their orientation with respect to the magnetic field. From the directions of the principal axes of the dipolar coupling tensor and the values of the zero field parameters D = ?96.2 gauss and E = ?2.7 gauss, it follows that the four triplet sites, generated in the crystals, consist of radical pairs. Each pair radicals is formed from two β-TKN molecules, which are nearest neighbours in the lattice. Measurements at the temperatures 4.2 K and 1.2 K show that the triplet state is the ground state of the radical pair.  相似文献   

20.
Gel permeation chromatographic (GPC) separations have been performed with several commercially available column packing materials. The results have been analyzed in the conventional manner to obtain the ratio of weight average to number-average molecular weight, Mw/Mn, for solutes with narrow molecular weight distribution. Various other parameters proposed to measure the efficiency of GPC columns have been evaluated and compared. It is proposed that the experimentally determined value of Mw/Mn for a series of different molecular weight samples with similar, narrow distribution for a given set of columns is a convenient parameter for comparing column efficiency in GPC. This parameter may be calculated from a single chromatogram unlike resolution, R, resolution index, RI, or specific resolution, RS, which require a pair of chromatograms. Results from the Mw/Mn method are usually in agreement with those from the R, RI, and RS calculations but one exception has been found. The number of theoretical plates calculated from the elution of a small molecule or from the polymer peak bears little relation to efficiencies predicted from the proposed Mw/Mn method or from R, RI, or RS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号