首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
In this paper, we study the classifications of unbalanced data sets of drugs. As an example we chose a data set of 2D6 inhibitors of cytochrome P450. The human cytochrome P450 2D6 isoform plays a key role in the metabolism of many drugs in the preclinical drug discovery process. We have collected a data set from annotated public data and calculated physicochemical properties with chemoinformatics methods. On top of this data, we have built classifiers based on machine learning methods. Data sets with different class distributions lead to the effect that conventional machine learning methods are biased toward the larger class. To overcome this problem and to obtain sensitive but also accurate classifiers we combine machine learning and feature selection methods with techniques addressing the problem of unbalanced classification, such as oversampling and threshold moving. We have used our own implementation of a support vector machine algorithm as well as the maximum entropy method. Our feature selection is based on the unsupervised McCabe method. The classification results from our test set are compared structurally with compounds from the training set. We show that the applied algorithms enable the effective high throughput in silico classification of potential drug candidates.  相似文献   

3.
In early diagnosis of lung cancer, a polarization microscopy is a powerful tool to obtain the optical information of biological tissues. In this paper, a new microfluidic polarization imaging and analysis method was proposed for the detection and classification of cancer‐associated fibroblasts and the two kinds of non‐small cell lung cancer cells, A549 and H322. A polarizing microscopy system was constructed based on a commercial microscope to obtain 3*3 Mueller matrix of cells. Based on the Muller matrix decomposition algorithm and analysis in spatial domain and frequency domain, appropriate classification parameters were selected for the characterization of different polarization characteristics of cells. Finally, the logistic regression models based on machine learning were applied to determine optimal feature parameters and classify cells. This method integrated the morphological information of the cells, and the polarization characteristics of the cells in different polarization states. It is for the first time that the polarization microscopic image analysis method has been applied to the detection and classification of non‐small cell lung cancer cells. The results show that the presented microfluidic polarization microscopic image analysis method could classify cells effectively. Compared with the Muller matrix measurement and calculation methods, the method proposed in this paper was greatly simplified in both the acquisition of polarized images and the analysis and processing of polarized images.  相似文献   

4.
Dimension reduction is a crucial technique in machine learning and data mining, which is widely used in areas of medicine, bioinformatics and genetics. In this paper, we propose a two-stage local dimension reduction approach for classification on microarray data. In first stage, a new L1-regularized feature selection method is defined to remove irrelevant and redundant features and to select the important features (biomarkers). In the next stage, PLS-based feature extraction is implemented on the selected features to extract synthesis features that best reflect discriminating characteristics for classification. The suitability of the proposal is demonstrated in an empirical study done with ten widely used microarray datasets, and the results show its effectiveness and competitiveness compared with four state-of-the-art methods. The experimental results on St Jude dataset shows that our method can be effectively applied to microarray data analysis for subtype prediction and the discovery of gene coexpression.  相似文献   

5.
6.
Skin cancer is the most prevalent cancer, and its assessment remains a challenge for physicians. This study reports the application of an optical sensing method, elastic scattering spectroscopy (ESS), coupled with a classifier that was developed with machine learning, to assist in the discrimination of skin lesions that are concerning for malignancy. The method requires no special skin preparation, is non‐invasive, easy to administer with minimal training, and allows rapid lesion classification. This novel approach was tested for all common forms of skin cancer. ESS spectra from a total of 1307 lesions were analyzed in a multi‐center, non‐randomized clinical trial. The classification algorithm was developed on a 950‐lesion training dataset, and its diagnostic performance was evaluated against a 357‐lesion testing dataset that was independent of the training dataset. The observed sensitivity was 100% (14/14) for melanoma and 94% (105/112) for non‐melanoma skin cancer. The overall observed specificity was 36% (84/231). ESS has potential, as an adjunctive assessment tool, to assist physicians to differentiate between common benign and malignant skin lesions.  相似文献   

7.
Summary Preliminary results of a machine learning application concerning computer-aided molecular design applied to drug discovery are presented. The artificial intelligence techniques of machine learning use a sample of active and inactive compounds, which is viewed as a set of positive and negative examples, to allow the induction of a molecular model characterizing the interaction between the compounds and a target molecule. The algorithm is based on a twofold phase. In the first one — the specialization step — the program identifies a number of active/inactive pairs of compounds which appear to be the most useful in order to make the learning process as effective as possible and generates a dictionary of molecular fragments, deemed to be responsible for the activity of the compounds. In the second phase — the generalization step — the fragments thus generated are combined and generalized in order to select the most plausible hypothesis with respect to the sample of compounds. A knowledge base concerning physical and chemical properties is utilized during the inductive process.  相似文献   

8.
Diagnosing breast cancer based on support vector machines   总被引:8,自引:0,他引:8  
The Support Vector Machine (SVM) classification algorithm, recently developed from the machine learning community, was used to diagnose breast cancer. At the same time, the SVM was compared to several machine learning techniques currently used in this field. The classification task involves predicting the state of diseases, using data obtained from the UCI machine learning repository. SVM outperformed k-means cluster and two artificial neural networks on the whole. It can be concluded that nine samples could be mislabeled from the comparison of several machine learning techniques.  相似文献   

9.
10.
The combination of 3D pharmacophore fingerprints and the support vector machine classification algorithm has been used to generate robust models that are able to classify compounds as active or inactive in a number of G-protein-coupled receptor assays. The models have been tested against progressively more challenging validation sets where steps are taken to ensure that compounds in the validation set are chemically and structurally distinct from the training set. In the most challenging example, we simulate a lead-hopping experiment by excluding an entire class of compounds (defined by a core substructure) from the training set. The left-out active compounds comprised approximately 40% of the actives. The model trained on the remaining compounds is able to recall 75% of the actives from the "new" lead series while correctly classifying >99% of the 5000 inactives included in the validation set.  相似文献   

11.
本文应用一种组合遗传算法和共轭梯度法的支持向量机(GA-CG-SVM)方法建立了药物诱导磷脂质病分类预测模型.首先对描述符进行了优化,选出了19个描述符用于模型的构建,所建模型对训练集的预测准确率为81.6%,对测试集的预测精度为87.5%,说明所建SVM分类模型不仅能正确预测训练集药物诱导的磷脂质病,也对其他化合物具...  相似文献   

12.
A new machine learning method is presented for extracting interpretable structure-activity relationships from screening data. The method is based on an evolutionary algorithm and reduced graphs and aims to evolve a reduced graph query (subgraph) that is present within the active compounds and absent from the inactives. The reduced graph representation enables heterogeneous compounds, such as those found in high-throughput screening data, to be captured in a single representation with the resulting query encoding structure-activity information in a form that is readily interpretable by a chemist. The application of the method is illustrated using data sets extracted from the well-known MDDR data set and GSK in-house screening data. Queries are evolved that are consistent with the known SARs, and they are also shown to be robust when applied to independent sets that were not used in training.  相似文献   

13.
Protein kinases are enzymes acting as a source of phosphate through ATP to regulate protein biological activities by phosphorylating groups of specific amino acids. For that reason, inhibiting protein kinases with an active small molecule plays a significant role in cancer treatment. To achieve this aim, computational drug design, especially QSAR model, is one of the best economical approaches to reduce time and save in costs. In this respect, active inhibitors are attempted to be distinguished from inactive ones using hybrid QSAR model. Therefore, genetic algorithm and K-Nearest Neighbor method were suggested as a dimensional reduction and classification model, respectively. Finally, to evaluate the proposed model’s performance, support vector machine and Naïve Bayesian algorithm were examined. The outputs of the proposed model demonstrated significant superiority to other QSAR models.  相似文献   

14.
传统的柑橘黄龙病检测方法存在准确度低、稳定性差等问题,该文提出了一种基于最小角回归结合核极限学习机(Least angle regression combined with kernel extreme learning machine,LAR-KELM_((RBF)))的近红外柑橘黄龙病鉴别方法。该方法将光谱数据通过小波变换进行预处理,然后用最小角回归(LAR)算法进行光谱波长的筛选,最后通过核极限学习机(KELM_((RBF)))实现样本的分类。实验采用柑橘叶片的近红外光谱数据,验证了LAR-KELM_((RBF))算法的性能,其分类准确度最高为99.91%,标准偏差为0.11。不同规模训练集的实验结果表明,LAR-KELM_((RBF))模型较极限学习机(ELM)、波形叠加极限学习机(SWELM)、反向传播神经网络(BP_((2层)))、KELM_((RBF))和支持向量机(SVM)模型分类准确度高、稳定性强,能够广泛应用于柑橘黄龙病的检测鉴别。  相似文献   

15.
Rotational-resonance magic-angle spinning NMR experiments are frequently used to measure dipolar couplings and to determine internuclear distances. So far most measurements were performed on samples containing isolated spin pairs. Thus, extensive structure elucidation, for example in biomolecules, requires the preparation of a whole set of doubly labeled samples. Here, we describe the analysis of the rotational-resonance polarization-exchange curves obtained from a single, uniformly labeled sample. It is shown experimentally that, at a magnetic field of 14.09 T, the rotational-resonance conditions in uniformly (13)C-labeled threonine are sufficiently narrow to permit the measurement of five distances between the four carbon spins with an accuracy of better than 10%. The polarization-exchange curves are analyzed using a modified two-spin model consisting of the two active spins. The modified model includes an additional offset in the final polarization, which comes from the coupling to the additional, passive, spins. The validity of this approach is experimentally verified for uniformly (13)C-labeled threonine. The broader applicability of such a model is demonstrated by numerical simulations which quantify the errors as a function of the most relevant parameters in the spin system.  相似文献   

16.
17.
In biospectroscopy, suitably annotated and statistically independent samples (e.g. patients, batches, etc.) for classifier training and testing are scarce and costly. Learning curves show the model performance as function of the training sample size and can help to determine the sample size needed to train good classifiers. However, building a good model is actually not enough: the performance must also be proven. We discuss learning curves for typical small sample size situations with 5–25 independent samples per class. Although the classification models achieve acceptable performance, the learning curve can be completely masked by the random testing uncertainty due to the equally limited test sample size. In consequence, we determine test sample sizes necessary to achieve reasonable precision in the validation and find that 75–100 samples will usually be needed to test a good but not perfect classifier. Such a data set will then allow refined sample size planning on the basis of the achieved performance. We also demonstrate how to calculate necessary sample sizes in order to show the superiority of one classifier over another: this often requires hundreds of statistically independent test samples or is even theoretically impossible. We demonstrate our findings with a data set of ca. 2550 Raman spectra of single cells (five classes: erythrocytes, leukocytes and three tumour cell lines BT-20, MCF-7 and OCI-AML3) as well as by an extensive simulation that allows precise determination of the actual performance of the models in question.  相似文献   

18.
分别采用支持向量学习机、人工神经网络、调节性逻辑回归和K-最临近等机器学习方法对761个二氢叶酸还原酶抑制剂建立了其活性分类预测模型. 采用组成描述符和拓扑描述符表征抑制剂的分子结构及物理化学性质, 使用Kennard-Stone方法进行训练集的设计, 并用Metropolis Monte Carlo模拟退火方法作变量选择. 结果表明, 支持向量学习机优于其它机器学习方法, 所得到的最优模型具有较好的预测结果, 其预测正确率为91.62%. 说明通过合适的训练集设计及变量选择, 支持向量学习机方法可以很好地用于二氢叶酸还原酶抑制剂的活性分类预测.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号