首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
发展了一种基于分子相互识别的蛋白质分类方法, 应用数据挖掘策略与统计学聚类, 根据辅酶A (coenzyme-A, CoA)结合蛋白的结合模式特征数据, 通过对比和分析多种分类方法对该体系的分类准确度, 对这类体内重要的蛋白进行了分类方法学研究, 选择了最优的两步聚类法. 本研究工作设计和建立了一个分类参数, 可以简洁有效地评价出各个结合特征的显著性与重要性, 并以此为依据从所有特征中筛选出决定性的特征变量. 研究结果所得到的CoA结合蛋白的三个分类, 都具有显著的氢键与疏水结合特征; CoA可以与多个生物活性关键氨基酸残基形成氢键作用. 这些相互作用的共性及分类上的差异, 说明了配体与不同受体相互作用过程中结合模式上的细微差别, 对于以CoA结合蛋白为靶点的选择性调控分子设计具有重要的参考意义与指导作用.  相似文献   

2.
Crosslinked polyethylene (PEX‐a) pipes are emerging as promising replacements for traditional metal or concrete pipes used for water, gas, and sewage transport. Understanding the relationship between pipe formulation and performance is critical to their proper design and implementation. We have developed a methodology using principal component analysis (PCA) and the machine learning techniques of k‐means clustering and support vector machines (SVM) to compare and classify different PEX‐a pipe formulations based on characteristic infrared (IR) spectroscopy absorbance peaks. The application of PCA revealed that a large percentage (89%) of the total variance could be explained by the first three principal components (PC1‐PC3), with distinct clustering of the data for each formulation. By examining the contribution of the individual IR bands to the PCs, we determined that PC1 could be attributed to different peroxide crosslinkers, whereas PC2 and PC3 could be attributed to differences in the additives. Using the PCA results as input to k‐means clustering and SVM resulted in very high accuracy of classifying the different pipe formulations. Our approach highlights the advantages of using PCA and machine learning techniques to characterize different formulations of PEX‐a pipes, which is important to achieve a detailed understanding of the pipe formulation and manufacturing process. © 2019 Wiley Periodicals, Inc. J. Polym. Sci., Part B: Polym. Phys. 2019 , 57, 1255–1262  相似文献   

3.
Diagnosing breast cancer based on support vector machines   总被引:8,自引:0,他引:8  
The Support Vector Machine (SVM) classification algorithm, recently developed from the machine learning community, was used to diagnose breast cancer. At the same time, the SVM was compared to several machine learning techniques currently used in this field. The classification task involves predicting the state of diseases, using data obtained from the UCI machine learning repository. SVM outperformed k-means cluster and two artificial neural networks on the whole. It can be concluded that nine samples could be mislabeled from the comparison of several machine learning techniques.  相似文献   

4.
5.
6.
Artificial Neural Networks (ANNs) have seen an explosion of interest over the last two decades and have been successfully applied in all fields of chemistry and particularly in analytical chemistry. Inspired from biological systems and originated from the perceptron, i.e. a program unit that learns concepts, ANNs are capable of gradual learning over time and modelling extremely complex functions. In addition to the traditional multivariate chemometric techniques, ANNs are often applied for prediction, clustering, classification, modelling of a property, process control, procedural optimisation and/or regression of the obtained data. This paper aims at presenting the most common network architectures such as Multi-layer Perceptrons (MLPs), Radial Basis Function (RBF) and Kohonen's self-organisations maps (SOM). Moreover, back-propagation (BP), the most widespread algorithm used today and its modifications, such as quick-propagation (QP) and Delta-bar-Delta, are also discussed. All architectures correlate input variables to output variables through non-linear, weighted, parameterised functions, called neurons. In addition, various training algorithms have been developed in order to minimise the prediction error made by the network. The applications of ANNs in water analysis and water quality assessment are also reviewed. Most of the ANNs works are focused on modelling and parameters prediction. In the case of water quality assessment, extended predictive models are constructed and optimised, while variables correlation and significance is usually estimated in the framework of the predictive or classifier models. On the contrary, ANNs models are not frequently used for clustering/classification purposes, although they seem to be an effective tool. ANNs proved to be a powerful, yet often complementary, tool for water quality assessment, prediction and classification.  相似文献   

7.
In this paper, we study the classifications of unbalanced data sets of drugs. As an example we chose a data set of 2D6 inhibitors of cytochrome P450. The human cytochrome P450 2D6 isoform plays a key role in the metabolism of many drugs in the preclinical drug discovery process. We have collected a data set from annotated public data and calculated physicochemical properties with chemoinformatics methods. On top of this data, we have built classifiers based on machine learning methods. Data sets with different class distributions lead to the effect that conventional machine learning methods are biased toward the larger class. To overcome this problem and to obtain sensitive but also accurate classifiers we combine machine learning and feature selection methods with techniques addressing the problem of unbalanced classification, such as oversampling and threshold moving. We have used our own implementation of a support vector machine algorithm as well as the maximum entropy method. Our feature selection is based on the unsupervised McCabe method. The classification results from our test set are compared structurally with compounds from the training set. We show that the applied algorithms enable the effective high throughput in silico classification of potential drug candidates.  相似文献   

8.
As a result of the recent developments of high-throughput screening in drug discovery, the number of available screening compounds has been growing rapidly. Chemical vendors provide millions of compounds; however, these compounds are highly redundant. Clustering analysis, a technique that groups similar compounds into families, can be used to analyze such redundancy. Many available clustering methods focus on accurate classification of compounds; they are slow and are not suitable for very large compound libraries. Here is described a fast clustering method based on an incremental clustering algorithm and the 2D fingerprints of compounds. This method can cluster a very large data set with millions of compounds in hours on a single computer. A program implemented with this method, called cd-hit-fp, is available from http://chemspace.org.  相似文献   

9.
Voltammetry is a powerful tool for providing quantitative mechanistic information associated with a broad range of chemically or biologically important electron transfer processes. An important step in voltammetric data analysis is to compare experimental data with those derived by simulations based on a mechanism chosen by the experimenter to determine the ‘best fit’, which can be achieved either heuristically or by a computationally supported automated method. In recent years, machine learning methods have emerged as a powerful tool in mechanism classification and parametrisation, owing to the rapid increase in computing power and widespread accessibility of machine learning platforms. This opinion article gives an overview of the historical development and current status of machine learning in this field, highlights the opportunities and challenges, and predicts possible future directions.  相似文献   

10.
具有体积小、功耗低、灵敏度高、硅工艺兼容性好等优点的金属氧化物半导体(MOS)气体传感器现已广泛地应用于军事、科研和国民经济的各个领域。然而MOS传感器的低选择性阻碍了其在物联网(IoT)时代的应用前景。为此,本文综述了解决MOS传感器选择性的研究进展,主要介绍了敏感材料性能提升、电子鼻和热调制三种改善MOS传感器选择性的技术方法,阐述了三种方法目前所存在的问题及其未来的发展趋势。同时,本文还对比介绍了机器嗅觉领域主流的主成分分析(PCA)、线性判别分析(LDA)和神经网络(NN)模式识别/机器学习算法。最后,本综述展望了具有数据降维、特征提取和鲁棒性识别分类性能的卷积神经网络(CNN)深度学习算法在气体识别领域的应用前景。基于敏感材料性能的提升、多种调制手段与阵列技术的结合以及人工智能(AI)领域深度学习算法的最新进展,将会极大地增强非选择性MOS传感器的挥发性有机化合物(VOCs)分子识别能力。  相似文献   

11.
In the present work, the emission and the absorption spectra of numerous Greek olive oil samples and mixtures of them, obtained by two spectroscopic techniques, namely Laser-Induced Breakdown Spectroscopy (LIBS) and Absorption Spectroscopy, and aided by machine learning algorithms, were employed for the discrimination/classification of olive oils regarding their geographical origin. Both emission and absorption spectra were initially preprocessed by means of Principal Component Analysis (PCA) and were subsequently used for the construction of predictive models, employing Linear Discriminant Analysis (LDA) and Support Vector Machines (SVM). All data analysis methodologies were validated by both “k-fold” cross-validation and external validation methods. In all cases, very high classification accuracies were found, up to 100%. The present results demonstrate the advantages of machine learning implementation for improving the capabilities of these spectroscopic techniques as tools for efficient olive oil quality monitoring and control.  相似文献   

12.
Feng Pan 《结构化学》2020,39(1):7-10
Machine learning is an emerging method to discover new materials with specific characteristics.An unsupervised machine learning research is highlighted to discover new potential lithium ionic conductors by screening and clustering lithium compounds,providing inspirations for the development of solid-state electrolytes and practical batteries.  相似文献   

13.
There is growing interest in the application of machine learning techniques in bioinformatics. The supervised machine learning approach has been widely applied to bioinformatics and gained a lot of success in this research area. With this learning approach researchers first develop a large training set, which is a time-consuming and costly process. Moreover, the proportion of the positive examples and negative examples in the training set may not represent the real-world data distribution, which causes concept drift. Active learning avoids these problems. Unlike most conventional learning methods where the training set used to derive the model remains static, the classifier can actively choose the training data and the size of training set increases. We introduced an algorithm for performing active learning with support vector machine and applied the algorithm to gene expression profiles of colon cancer, lung cancer, and prostate cancer samples. We compared the classification performance of active learning with that of passive learning. The results showed that employing the active learning method can achieve high accuracy and significantly reduce the need for labeled training instances. For lung cancer classification, to achieve 96% of the total positives, only 31 labeled examples were needed in active learning whereas in passive learning 174 labeled examples were required. That meant over 82% reduction was realized by active learning. In active learning the areas under the receiver operating characteristic (ROC) curves were over 0.81, while in passive learning the areas under the ROC curves were below 0.50.  相似文献   

14.
15.
在普适的基于能量的分块(GEBF)方法的框架下, 大体系的局域激发(LE)能可通过一系列活性子体系激发能的线性组合近似得到, 从而有效降低了计算的时间标度. 然而, 在体系的局域激发具有多个激发态的情形下, 如何有效识别所有活性子体系的激发特征并将其组合是一个挑战. 提出了一种基于局域激发态聚类的算法. 该方案基于空穴-电子分析和基于密度的聚类(DBSCAN)机器学习算法, 可以自动地聚合不同子体系中最相似的激发态并组合得到相应的局域激发态能量或激发能. 结合该算法改进的LE-GEBF方法在荧光分子衍生物、 荧光染料-水团簇及绿色荧光蛋白模型体系的计算中均获得了令人满意的结果. 该算法有望大大提升LE-GEBF方法在计算局域激发时的稳定性和准确性, 并可以有效处理吸收光谱具有多重峰的大体系.  相似文献   

16.
Machine learning (ML) methods have been present in the field of NMR since decades, but it has experienced a tremendous growth in the last few years, especially thanks to the emergence of deep learning (DL) techniques taking advantage of the increased amounts of data and available computer power. These algorithms are successfully employed for classification, regression, clustering, or dimensionality reduction tasks of large data sets and have been intensively applied in different areas of NMR including metabonomics, clinical diagnosis, or relaxometry. In this article, we concentrate on the various applications of ML/DL in the areas of NMR signal processing and analysis of small molecules, including automatic structure verification and prediction of NMR observables in solution.  相似文献   

17.
Several machine learning algorithms have recently been applied to modeling the specificity of HIV-1 protease. The problem is challenging because of the three issues as follows: (1) datasets with high dimensionality and small number of samples could misguide classification modeling and its interpretation; (2) symbolic interpretation is desirable because it provides us insight to the specificity in the form of human-understandable rules, and thus helps us to design effective HIV inhibitors; (3) the interpretation should take into account complexity or dependency between positions in sequences. Therefore, it is necessary to investigate multivariate and feature-selective methods to model the specificity and to extract rules from the model. We have tested extensively various machine learning methods, and we have found that the combination of neural networks and decompositional approach can generate a set of effective rules. By validation to experimental results for the HIV-1 protease, the specificity rules outperform the ones generated by frequency-based, univariate or black-box methods.  相似文献   

18.
Accurate clustering of cells from single-cell RNA sequencing (scRNA-seq) data is an essential step for biological analysis such as putative cell type identification. However, scRNA-seq data has high dimension and high sparsity, which makes traditional clustering methods less effective to reflect the similarity between cells. Since genetic network fundamentally defines the functions of cell and deep learning shows strong advantages in network representation learning, we propose a novel scRNA-seq clustering framework ScGSLC based on graph similarity learning. ScGSLC effectively integrates scRNA-seq data and protein-protein interaction network to a graph. Then graph convolution network is employed by ScGSLC to embedding graph and clustering the cells by the calculated similarity between graphs. Unsupervised clustering results of nine public data sets demonstrate that ScGSLC shows better performance than the state-of-the-art methods.  相似文献   

19.
The goal of this paper is to present and describe a novel 2D- and 3D-QSAR (quantitative structure-activity relationship) binary classification data set for the inhibition of c-Jun N-terminal kinase-3 with previously unpublished activities for a diverse set of compounds. JNK3 is an important pharmaceutical target because it is involved in many neurological disorders. Accordingly, the development of JNK3 inhibitors has gained increasing interest. 2D and 3D versions of the data set were used, consisting of 313 (70 actives) and 249 (60 actives) compounds, respectively. All compounds, for which activity was only determined for the racemate, were removed from the 3D data set. We investigated the diversity of the data sets by an agglomerative clustering with feature trees and show that the data set contains several different scaffolds. Furthermore, we show that the benchmarks can be tackled with standard supervised learning algorithms with a convincing performance. For the 2D problem, a random decision forest classifier achieves a Matthew's correlation coefficient of 0.744, the 3D problem could be modeled with a Matthew's correlation coefficient of 0.524 with 3D pharmacophores and a support vector machine. The performance of both data sets was evaluated within a nested 10-fold cross-validation. We therefore suggest that the data set is a reasonable basis for generating QSAR models for JNK3 because of its diverse composition and the performance of the classifiers presented in this study.  相似文献   

20.
A new method of polymer classification is described involving dynamic mechanical analysis of polymer properties as temperature is changed. The method is based on the chemometric analysis of the damping factor (tan delta) as a function of temperature. In this study four polymer groups, namely, polypropylene, low density polyethylene, polystyrene and acrylonitrile-butadiene-styrene, each characterised by different grades, were studied. The aim is to distinguish polymer groups from each other. The polymers were studied over a temperature range of -50 degrees C until the minimum stiffness was reached, tan delta values were recorded approximately every 1.5 degrees . Principal components analysis was performed to visualise groupings and also for feature reduction prior to classification and clustering. Several clustering and classification methods were compared including k-means clustering, hierarchical cluster analysis, linear discriminant analysis, k-nearest neighbours, and class distances using both Euclidean and Mahalanobis measures. It is demonstrated that thermal analysis together with chemometrics provides excellent discrimination, representing a new approach for characterisation of polymers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号