共查询到20条相似文献,搜索用时 15 毫秒
1.
Knowledge of the polyprotein cleavage sites by HIV protease will refine our understanding of its specificity, and the information thus acquired is useful for designing specific and efficient HIV protease inhibitors. Recently, several works have approached the HIV-1 protease specificity problem by applying a number of classifier creation and combination methods. The pace in searching for the proper inhibitors of HIV protease will be greatly expedited if one can find an accurate, robust, and rapid method for predicting the cleavage sites in proteins by HIV protease. In this article, we selected HIV-1 protease as the subject of the study. 299 oligopeptides were chosen for the training set, while the other 63 oligopeptides were taken as a test set. The peptides are represented by features constructed by AAIndex (Kawashima et al., Nucleic Acids Res 1999, 27, 368; Kawashima and Kanehisa, Nucleic Acids Res 2000, 28, 374). The mRMR method (Maximum Relevance, Minimum Redundancy; Ding and Peng, Proc Second IEEE Comput Syst Bioinformatics Conf 2003, 523; Peng et al., IEEE Trans Pattern Anal Mach Intell 2005, 27, 1226) combining with incremental feature selection (IFS) and feature forward search (FFS) are applied to find the two important cleavage sites and to select 364 important biochemistry features by jackknife test. Using KNN (K-nearest neighbors) to combine the selected features, the prediction model obtains high accuracy rate of 91.3% for Jackknife cross-validation test and 87.3% for independent-set test. It is expected that our feature selection scheme can be referred to as a useful assistant technique for finding effective inhibitors of HIV protease, especially for the scientists in this field. 相似文献
2.
We present a generalized formulation of the trajectory surface hopping method applicable to a general multidimensional system. The method is based on the Zhu-Nakamura theory of a nonadiabatic transition and therefore includes the treatment of classically forbidden hops. The method uses a generalized recipe for the conservation of angular momentum after forbidden hops and an approximation for determining a nonadiabatic transition direction which is crucial when the coupling vector is unavailable. This method also eliminates the need for a rigorous location of the seam surface, thereby ensuring its applicability to a wide class of chemical systems. In a test calculation, we implement the method for the DH(2) (+) system, and it shows a remarkable agreement with the previous results of C. Zhu, H. Kamisaka, and H. Nakamura, [J. Chem. Phys. 116, 3234 (2002)]. We then apply it to a diatomic-in-molecule model system with a conical intersection, and the results compare well with exact quantum calculations. The successful application to the conical intersection system confirms the possibility of directly extending the present method to an arbitrary potential of general topology. 相似文献
3.
Most of the current expressions used to calculate figures of merit in multivariate calibration have been derived assuming independent and identically distributed (iid) measurement errors. However, it is well known that this condition is not always valid for real data sets, where the existence of many external factors can lead to correlated and/or heteroscedastic noise structures. In this report, the influence of the deviations from the classical iid paradigm is analyzed in the context of error propagation theory. New expressions have been derived to calculate sample dependent prediction standard errors under different scenarios. These expressions allow for a quantitative study of the influence of the different sources of instrumental error affecting the system under analysis. Significant differences are observed when the prediction error is estimated in each of the studied scenarios using the most popular first-order multivariate algorithms, under both simulated and experimental conditions. 相似文献
4.
Tielker Nicolas Güssregen Stefan Kast Stefan M. 《Journal of computer-aided molecular design》2021,35(8):933-941
Journal of Computer-Aided Molecular Design - Inspired by the successful application of the embedded cluster reference interaction site model (EC-RISM), a combination of quantum–mechanical... 相似文献
5.
6.
COSMOfrag: a novel tool for high-throughput ADME property prediction and similarity screening based on quantum chemistry 总被引:1,自引:0,他引:1
The COSMO-RS (Continuum Solvation Model for Real Solvents) method has proven its broad applicability for the accurate prediction of thermodynamic, environmental, or physiological properties. On the basis of quantum chemical calculations with COSMO, COSMO-RS calculations were unavoidably restricted to small- to medium-sized compound sets, because of the time demand of the COSMO calculations. The COSMOfrag method, presented here, overcomes this restriction by replacing the costly quantum chemistry step with a selection of suitable fragments from a database of, presently, 40,000 DFT/COSMO precalculated molecules. Since, in the COSMO-RS picture, any molecular information is gathered in the so-called sigma profiles, COSMOfrag replaces the single sigma profile with a composition of partial sigma profiles, selected by the use of extensive similarity searching algorithms. On five representative datasets, the accuracy loss of COSMOfrag versus full COSMO-RS calculations has been shown to be only in the range of 0.05 log units. From the performance point of view, it is now possible to carry out COSMO-RS property calculations for more than 100,000 compounds a day per standard PC CPU. 相似文献
7.
Hepatocellular carcinoma (HCC) is considered as the sixth most common cancer in the world, and it is also considered as one of the causes of death. Moreover, the poor prognosis of recurrence of HCC after surgery and metastasis is also a big problem for human health. If the disease can be diagnosed earlier, the survival rate of the patients will be improved significantly. In the early stage of hepatocellular carcinoma, the expression of miRNAs is likely to become abnormal. In our work, the expression profile of miRNAs of human HCC in cancer tissue is compared with their adjacent tissue samples collected from tumor cancer genomic Atlas (TCGA) platform, then the genes with significant difference are selected by Limma test. Selected genes are referred to predict miRNAs related to the prognosis of HCC patients. Finally, miRNAs regulated by target genes are selected by our method, and the experimental results demonstrated that our method is more efficient than biology wet experimental method with lower cost. 相似文献
8.
9.
Liu SS Liu HL Yin CS Wang LS 《Journal of chemical information and computer sciences》2003,43(3):964-969
10.
The criterion of orientating group of electrophilic aromatic nitration was discussed by means of pattern recognition method with quantum-chemical parameters as features, and the product ratios of the reactions were quantitatively calculated using artificial neural network (ANN) method with the same parameters as inputs, based on the ab initio calculation of quantum chemistry. The quantum-chemical parameters involved orbital energy, orbital electron population, atomic total electron density and atomic net charge. The predicted values are in agreement with experimental results and (he predicted error of the ANN with quantum-chemical parameters for the reaction is the smallest among the all methods. 相似文献
11.
Prediction of protein accessibility from sequence, as prediction of protein secondary structure is an intermediate step for predicting structures and consequently functions of proteins. Most of the currently used methods are based on single residue prediction, either by statistical means or evolutionary information, and accessibility state of central residue in a window predicted. By expansion of databases of proteins with known 3D structures, we extracted information of pairwise residue types and conformational states of pairs simultaneously. For solving the problem of ambiguity in state prediction by one residue window sliding, we used dynamic programming algorithm to find the path with maximum score. The three state overall per-residue accuracy, Q3, of this method in a Jackknife test with dataset of known proteins is more than 65% which is an improvement on results of methods based on evolutionary information. 相似文献
12.
Synthetic polymers are versatile and widely used materials. Similar to small organic molecules, a large chemical space of such materials is hypothetically accessible. Computational property prediction and virtual screening can accelerate polymer design by prioritizing candidates expected to have favorable properties. However, in contrast to organic molecules, polymers are often not well-defined single structures but an ensemble of similar molecules, which poses unique challenges to traditional chemical representations and machine learning approaches. Here, we introduce a graph representation of molecular ensembles and an associated graph neural network architecture that is tailored to polymer property prediction. We demonstrate that this approach captures critical features of polymeric materials, like chain architecture, monomer stoichiometry, and degree of polymerization, and achieves superior accuracy to off-the-shelf cheminformatics methodologies. While doing so, we built a dataset of simulated electron affinity and ionization potential values for >40k polymers with varying monomer composition, stoichiometry, and chain architecture, which may be used in the development of other tailored machine learning approaches. The dataset and machine learning models presented in this work pave the path toward new classes of algorithms for polymer informatics and, more broadly, introduce a framework for the modeling of molecular ensembles.A graph representation that captures critical features of polymeric materials and an associated graph neural network achieve superior accuracy to off-the-shelf cheminformatics methodologies. 相似文献
13.
14.
The operon is a specific functional organization of genes found in bacterial genomes. Most genes within operons share common features. The support vector machine (SVM) approach is here used to predict operons at the genomic level. Four features were chosen as SVM input vectors: the intergenic distances, the number of common pathways, the number of conserved gene pairs and the mutual information of phylogenetic profiles. The analysis reveals that these common properties are indeed characteristic of the genes within operons and are different from that of non-operonic genes. Jackknife testing indicates that these input feature vectors, employed with RBF kernel SVM, achieve high accuracy. To validate the method, Escherichia coli K12 and Bacillus subtilis were taken as benchmark genomes of known operon structure, and the prediction results in both show that the SVM can detect operon genes in target genomes efficiently and offers a satisfactory balance between sensitivity and specificity. 相似文献
15.
The Generalized Brillouin Theorem Multiconfiguration Method (GBT-MC) of Grein and Chang is extended and applied to the calculation of excited states. Orthogonality constraints to lower states as well as second-order interaction effects of states lying close together have been taken into account. In this way quadratic convergence can be guaranteed. Difficulties with coupling coefficients and Lagrangian multipliers of SCF methods can be circumvented. Test calculations have been performed on valence electron excited states of C, H2O, and CH2O, and on core excited states of Li. 相似文献
16.
17.
Here we report a method to calculate Born radii, an important parameter used in a Generalized Born model. Traditional methods to derive Born radii are mostly based on a complicated formula, while our method is easier and more direct. Atoms are classified according to their atom type, and the Born radii of each type are obtained by fitting to experimental solvation free energy. The SMARTS language is used for the exact definition of atoms types, and Ullmann's subgraph isomorphism algorithm is used to deduce the environment. A generic algorithm is used for the parameter fitting because of its efficiency in searching a huge phase space, and its results are then optimized by using the conjugate gradient method. The final parameter set is fitting from a training set containing 357 molecules and is tested using a test set of 44 small organic molecules, and the average error is 0.58 kcal/mol for 36 neutral molecules and is 1.67 kcal/mol for 8 ions. The model is further tested under organic molecules, biopolymers, and a protein-inhibitor complex and yields reliable results in all these cases. This method can be used to accelerate molecular docking calculations. 相似文献
18.
In this work, we have evaluated how well the general assisted model building with energy refinement (AMBER) force field performs in studying the dynamic properties of liquids. Diffusion coefficients (D) have been predicted for 17 solvents, five organic compounds in aqueous solutions, four proteins in aqueous solutions, and nine organic compounds in nonaqueous solutions. An efficient sampling strategy has been proposed and tested in the calculation of the diffusion coefficients of solutes in solutions. There are two major findings of this study. First of all, the diffusion coefficients of organic solutes in aqueous solution can be well predicted: the average unsigned errors and the root mean square errors are 0.137 and 0.171 × 10(-5) cm(-2) s(-1), respectively. Second, although the absolute values of D cannot be predicted, good correlations have been achieved for eight organic solvents with experimental data (R(2) = 0.784), four proteins in aqueous solutions (R(2) = 0.996), and nine organic compounds in nonaqueous solutions (R(2) = 0.834). The temperature dependent behaviors of three solvents, namely, TIP3P water, dimethyl sulfoxide, and cyclohexane have been studied. The major molecular dynamics (MD) settings, such as the sizes of simulation boxes and with/without wrapping the coordinates of MD snapshots into the primary simulation boxes have been explored. We have concluded that our sampling strategy that averaging the mean square displacement collected in multiple short-MD simulations is efficient in predicting diffusion coefficients of solutes at infinite dilution. 相似文献
19.
The generalized inverse method is applied to the force constants calculation problem having a nearly singular Jacobian matrix, and by substituting it by one of lower rank which is near to it and well conditioned it is possible to eliminate the convergence difficulties of the least squares iteration process. A further significant advantage of the method is that certain matrix theoretical considerations give the possibility of deciding at the end of the iteration which force constants are determined by the available observations. Numerical results for the dimethylmercury molecule are given. 相似文献
20.
A. V. Grigoryan I. Kufareva M. Totrov R. A. Abagyan 《Journal of computer-aided molecular design》2010,24(3):173-182
Similarity of compound chemical structures often leads to close pharmacological profiles, including binding to the same protein
targets. The opposite, however, is not always true, as distinct chemical scaffolds can exhibit similar pharmacology as well.
Therefore, relying on chemical similarity to known binders in search for novel chemicals targeting the same protein artificially
narrows down the results and makes lead hopping impossible. In this study we attempt to design a compound similarity/distance
measure that better captures structural aspects of their pharmacology and molecular interactions. The measure is based on
our recently published method for compound spatial alignment with atomic property fields as a generalized 3D pharmacophoric
potential. We optimized contributions of different atomic properties for better discrimination of compound pairs with the
same pharmacology from those with different pharmacology using Partial Least Squares regression. Our proposed similarity measure
was then tested for its ability to discriminate pharmacologically similar pairs from decoys on a large diverse dataset of
115 protein–ligand complexes. Compared to 2D Tanimoto and Shape Tanimoto approaches, our new approach led to improvement in
the area under the receiver operating characteristic curve values in 66 and 58% of domains respectively. The improvement was
particularly high for the previously problematic cases (weak performance of the 2D Tanimoto and Shape Tanimoto measures) with
original AUC values below 0.8. In fact for these cases we obtained improvement in 86% of domains compare to 2D Tanimoto measure
and 85% compare to Shape Tanimoto measure. The proposed spatial chemical distance measure can be used in virtual ligand screening. 相似文献