首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In recent years, three‐dimensional density maps reconstructed from single particle images obtained by electron cryo‐microscopy (cryo‐EM) have reached unprecedented resolution. However, map interpretation can be challenging, in particular if the constituting structures require de‐novo model building or are very mobile. Herein, we demonstrate the potential of convolutional neural networks for the annotation of cryo‐EM maps: our network Haruspex has been trained on a carefully curated set of 293 experimentally derived reconstruction maps to automatically annotate RNA/DNA as well as protein secondary structure elements. It can be straightforwardly applied to newly reconstructed maps in order to support domain placement or as a starting point for main‐chain placement. Due to its high recall and precision rates of 95.1 % and 80.3 %, respectively, on an independent test set of 122 maps, it can also be used for validation during model building. The trained network will be available as part of the CCP‐EM suite.  相似文献   

2.
Although atomic structures have been determined directly from cryo-EM density maps with high resolutions, current structure determination methods for medium resolution (5 to 10 Å) cryo-EM maps are limited by the availability of structure templates. Secondary structure traces are lines detected from a cryo-EM density map for α-helices and β-strands of a protein. A topology of secondary structures defines the mapping between a set of sequence segments and a set of traces of secondary structures in three-dimensional space. In order to enhance accuracy in ranking secondary structure topologies, we explored a method that combines three sources of information: a set of sequence segments in 1D, a set of amino acid contact pairs in 2D, and a set of traces in 3D at the secondary structure level. A test of fourteen cases shows that the accuracy of predicted secondary structures is critical for deriving topologies. The use of significant long-range contact pairs is most effective at enriching the rank of the maximum-match topology for proteins with a large number of secondary structures, if the secondary structure prediction is fairly accurate. It was observed that the enrichment depends on the quality of initial topology candidates in this approach. We provide detailed analysis in various cases to show the potential and challenge when combining three sources of information.  相似文献   

3.
4.
5.
6.
The hexameric p97 enzyme plays an integral role in cellular homeostasis. Large changes to the orientation of its N-terminal domains (NTDs), corresponding to NTD-down (p97-ADP) or NTD-up (p97-ATP), accompany ATP hydrolysis. The NTDs in a series of p97 disease mutants interconvert rapidly between up and down conformations when p97 is in the ADP-bound state. While the populations of up and down NTDs can be determined from bulk measurements, information about the cooperativity of the transition between conformations is lacking. Here we use cryo-EM to determine populations of the 14 unique up/down NTD states of the homo-hexameric R95G disease-causing p97 ring, showing that NTD orientations do not depend on those of neighboring subunits. In contrast, NMR studies establish that inter-protomer cooperativity is important for regulating the orientation of NTDs in p97 particles comprising mixtures of different subunits, such as wild-type and R95G, emphasizing the synergy between cryo-EM and NMR in establishing how the components of p97 function.  相似文献   

7.
As a conjugated and unsymmetric building block composed of an electron-poor seven-membered sp2 carbon ring and an electron-rich five-membered carbon ring, azulene and its derivatives have been recognized as one of the most promising building blocks for novel electronic devices due to its intrinsic redox activity. By using 1,3,5-tris(4-aminophenyl)-benzene and azulene-1,3-dicarbaldehyde as the starting materials, an azulene(Azu)-based 2D conjugated covalent organic framework, COF-Azu, is prepared through liquid-liquid interface polymerization strategy for the first time. The as-fabricated Al/COF-Azu/indium tin oxide (ITO) memristor shows typical non-volatile resistive switching performance due to the electric filed induced intramolecular charge transfer effect. Associated with the unique memristive performance, a simple convolutional neural network is built for image recognition. After 8 epochs of training, image recognition accuracy of 80 % for a neutral network trained on a larger data set is achieved.  相似文献   

8.
《印度化学会志》2021,98(9):100114
We demonstrate how a back-propagation artificial neural network can be trained to represent a potential energy surface (PES) in a formless manner with limited data points and exploited to predict interaction energies for configurations not included in the training set. A similar exercise is undertaken for predicting the eigenvalues and eigenvectors of a model Hamiltonian matrix that delicately depends on parameters and exhibits crossing of eigen values.  相似文献   

9.
Supersecondary structures (SSSs) are the building blocks of protein 3D structures. Accurate prediction of SSSs can be one important step toward building a tertiary structure from the specified secondary structure. How to improve the accuracy of prediction of SSSs by effectively incorporating the sequence order effects is an important and challenging problem. Based on a different form of Chou's pseudo amino acid composition, a novel approach for feature representation of SSSs is proposed. Amino acid basic compositions, dipeptide components, and amino acid composition distribution are incorporated to represent the compositional features of proteins. Each supersecondary structural motif is characterized as a vector of 36 dimensions. In addition, we propose a novel prediction system by using SVM and IDQD algorithm as classifiers. Our method is trained and tested on ArchDB40 dataset containing 3088 proteins. The highest overall accuracy for the training dataset and the independent testing dataset are 77.7 and 69.4%, respectively. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2011  相似文献   

10.
毛细管电泳径向基神经网络校正法定量分析核苷   总被引:1,自引:0,他引:1  
毛利锋  沈朋  程翼宇 《化学学报》2004,62(19):1917-1921
采用径向基神经网络算法对一组已知样品的核苷及内标物浓度与毛细管电泳峰面积数据进行回归计算,建立峰面积与核苷浓度之间的关系模型,对未知样品中待测核苷浓度作出预测,形成了毛细管电泳定量分析新方法.将其用于鸟嘌呤核苷含量测定,所建模型预测结果平均相对误差为0.86%,明显低于线性回归及BP神经网络模型的2.60%和1.07%.研究结果表明,本方法简便易用,能有效提高毛细管电泳定量分析的准确度,优于线性回归及BP神经网络法.  相似文献   

11.
Partial Least Squares (PLS) is by far the most popular regression method for building multivariate calibration models for spectroscopic data. However, the success of the conventional PLS approach depends on the availability of a ‘representative data set’ as the model needs to be trained for all expected variation at the prediction stage. When the concentration of the known interferents and their correlation with the analyte of interest change in a fashion which is not covered in the calibration set, the predictive performance of inverse calibration approaches such as conventional PLS can deteriorate. This underscores the need for calibration methods that are capable of building multivariate calibration models which can be robustified against the unexpected variation in the concentrations and the correlations of the known interferents in the test set. Several methods incorporating ‘a priori’ information such as pure component spectra of the analyte of interest and/or the known interferents have been proposed to build more robust calibration models. In the present study, four such calibration techniques have been benchmarked on two data sets with respect to their predictive ability and robustness: Net Analyte Preprocessing (NAP), Improved Direct Calibration (IDC), Science Based Calibration (SBC) and Augmented Classical Least Squares (ACLS) Calibration. For both data sets, the alternative calibration techniques were found to give good prediction performance even when the interferent structure in the test set was different from the one in the calibration set. The best results were obtained by the ACLS model incorporating both the pure component spectra of the analyte of interest and the interferents, resulting in a reduction of the RMSEP by a factor 3 compared to conventional PLS for the situation when the test set had a different interferent structure than the one in the calibration set.  相似文献   

12.
Aromaticity is a fundamental concept in chemistry, with many theoretical and practical implications. Although most organic compounds can be categorized as aromatic, non-aromatic, or antiaromatic, it is often difficult to classify borderline compounds as well as to quantify this property. Many aromaticity criteria have been proposed, although none of them gives an entirely satisfactory solution. The inability to fully arrange organic compounds according to a single criterion arises from the fact that aromaticity is a multidimensional phenomenon. Neural networks are computational techniques that allow one to treat a large amount of data, thereby reducing the dimensionality of the input set to a bidimensional output. We present the successful applications of Kohonen's self-organizing maps to classify organic compounds according to aromaticity criteria, showing a good correlation between the aromaticity of a compound and its placement in a particular neuron. Although the input data for the training of the network were different aromaticity criteria (stabilization energy, diamagnetic susceptibility, NICS, NICS(1), and HOMA) for five-membered heterocycles, the method can be extended to other organic compounds. Some useful features of this method are: 1) it is very fast, requiring less than one minute of computational time to place a new compound in the map; 2) the placement of the different compounds in the map is conveniently visualized; 3) the position of a compound in the map depends on its aromatic character, thus allowing us to establish a quantitative scale of aromaticity, based on Euclidean distances between neurons, 4) it has predictive power. Overall, the results reported herein constitute a significant contribution to the longstanding debate on the quantitative treatment of aromaticity.  相似文献   

13.
Modeling toxicity by using supervised kohonen neural networks   总被引:2,自引:0,他引:2  
Counterprogation neural network is shown to be a powerful and suitable tool for the investigation of toxicity. This study mined a data set of 568 chemicals. Two hundred eighty-two objects were used as the training set and 286 as the test set. The final model developed presents high performances on the data set R(2) = 0.83 (R(2) = 0.97 on the training set, R(2) = 0.59 on the test set). This technique distinguishes itself also for the ability to give to the expert two-dimensional maps suitable for the study of the distribution/clustering of the data and the identification of outliers.  相似文献   

14.
The aim of this work is the development of an artificial neural network model, which can be generalized and used in a variety of applications for retention modelling in ion chromatography. Influences of eluent flow-rate and concentration of eluent anion (OH-) on separation of seven inorganic anions (fluoride, chloride, nitrite, sulfate, bromide, nitrate, and phosphate) were investigated. Parallel prediction of retention times of seven inorganic anions by using one artificial neural network was applied. MATLAB Neural Networks ToolBox was not adequate for application to retention modelling in this particular case. Therefore the authors adopted it for retention modelling by programming in MATLAB metalanguage. The following routines were written; the division of experimental data set on training and test set; selection of data for training and test set; Dixon's outlier test; retraining procedure routine; calculations of relative error. A three-layer feed forward neural network trained with a Levenberg-Marquardt batch error back propagation algorithm has been used to model ion chromatographic retention mechanisms. The advantage of applied batch training methodology is the significant increase in speed of calculation of algorithms in comparison with delta rule training methodology. The technique of experimental data selection for training set was used allowing improvement of artificial neural network prediction power. Experimental design space was divided into 8-32 subspaces depending on number of experimental data points used for training set. The number of hidden layer nodes, the number of iteration steps and the number of experimental data points used for training set were optimized. This study presents the very fast (300 iteration steps) and very accurate (relative error of 0.88%) retention model, obtained by using a small amount of experimental data (16 experimental data points in training set). This indicates that the method of choice for retention modelling in ion chromatography is the artificial neural network.  相似文献   

15.
16.
Forward selection improved radial basis function (RBF) network was applied to bacterial classification based on the data obtained by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS). The classification of each bacterium cultured at different time was discussed and the effect of parameters of the RBF network was investigated. The new method involves forward selection to prevent overfitting and generalized cross-validation (GCV) was used as model selection criterion (MSC). The original data was compressed by using wavelet transformation to speed up the network training and reduce the number of variables of the original MS data. The data was normalized prior training and testing a network to define the area the neural network to be trained in, accelerate the training rate, and reduce the range the parameters to be selected in. The one-out-of-n method was used to split the data set of p samples into a training set of size p−1 and a test set of size 1. With the improved method, the classification correctness for the five bacteria discussed in the present paper are 87.5, 69.2, 80, 92.3, and 92.8%, respectively.  相似文献   

17.
18.
19.
Summary It is shown how a self-organizing neural network such as the one introduced by Kohonen can be used to analyze features of molecular surfaces, such as shape and the molecular electrostatic potential. On the one hand, two-dimensional maps of molecular surface properties can be generated and used for the comparison of a set of molecules. On the other hand, the surface geometry of one molecule can be stored in a network and this network can be used as a template for the analysis of the shape of various other molecules. The application of these techniques to a series of steroids exhibiting a range of binding activities to the corticosteroid-binding globulin receptor allows one to pinpoint the essential features necessary for biological activity.  相似文献   

20.
A first step toward predicting the structure of a protein is to determine its secondary structure. The secondary structure information is generally used as starting point to solve protein crystal structures. In the present study, a machine learning approach based on a complete set of two-class scoring functions was used. Such functions discriminate between two specific structural classes or between a single specific class and the rest. The approach uses a hierarchical scheme of scoring functions and a neural network. The parameters are determined by optimizing the recall of learning data. Quality control is performed by predicting separate independent test data. A first set of scoring functions is trained to correlate the secondary structures of residues with profiles of sequence windows of width 15, centered at these residues. The sequence profiles are obtained by multiple sequence alignment with PSI-BLAST. A second set of scoring functions is trained to correlate the secondary structures of the center residues with the secondary structures of all other residues in the sequence windows used in the first step. Finally, a neural network is trained using the results from the second set of scoring functions as input to make a decision on the secondary structure class of the residue in the center of the sequence window. Here, we consider the three-class problem of helix, strand, and other secondary structures. The corresponding prediction scheme "SPARROW" was trained with the ASTRAL40 database, which contains protein domain structures with less than 40% sequence identity. The secondary structures were determined with DSSP. In a loose assignment, the helix class contains all DSSP helix types (α, 3-10, π), the strand class contains β-strand and β-bridge, and the third class contains the other structures. In a tight assignment, the helix and strand classes contain only α-helix and β-strand classes, respectively. A 10-fold cross validation showed less than 0.8% deviation in the fraction of correct structure assignments between true prediction and recall of data used for training. Using sequences of 140,000 residues as a test data set, 80.46% ± 0.35% of secondary structures are predicted correctly in the loose assignment, a prediction performance, which is very close to the best results in the field. Most applications are done with the loose assignment. However, the tight assignment yields 2.25% better prediction performance. With each individual prediction, we also provide a confidence measure providing the probability that the prediction is correct. The SPARROW software can be used and downloaded on the Web page http://agknapp.chemie.fu-berlin.de/sparrow/ .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号