首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
2.
3.
With the use of atomic and nuclear methods to analyze samples for a multitude of elements, very large data sets have been generated. Due to the ease of obtaining these results with computerized systems, the elemental data acquired are not always as thoroughly checked as they should be leading to some, if not many, bad data points. It is advantageous to have some feeling for the trouble spots in a data, set before it is used for further studies. A technique which has the ability to identify bad data points, after the data has been generated, is classical factor analysis. The ability of classical factor analysis to identify two different types of data errors make it ideally suited for scanning large data sets. Since the results, yielded by factor analysis indicate correlations between parameters, one must know something about the nature of the data set and the analytical techniques used to obtain it to confidentially isolate errors.  相似文献   

4.
5.
6.
7.
8.
Benchmarks for molecular docking have historically focused on re-docking the cognate ligand of a well-determined protein-ligand complex to measure geometric pose prediction accuracy, and measurement of virtual screening performance has been focused on increasingly large and diverse sets of target protein structures, cognate ligands, and various types of decoy sets. Here, pose prediction is reported on the Astex Diverse set of 85 protein ligand complexes, and virtual screening performance is reported on the DUD set of 40 protein targets. In both cases, prepared structures of targets and ligands were provided by symposium organizers. The re-prepared data sets yielded results not significantly different than previous reports of Surflex-Dock on the two benchmarks. Minor changes to protein coordinates resulting from complex pre-optimization had large effects on observed performance, highlighting the limitations of cognate ligand re-docking for pose prediction assessment. Docking protocols developed for cross-docking, which address protein flexibility and produce discrete families of predicted poses, produced substantially better performance for pose prediction. Performance on virtual screening performance was shown to benefit by employing and combining multiple screening methods: docking, 2D molecular similarity, and 3D molecular similarity. In addition, use of multiple protein conformations significantly improved screening enrichment.  相似文献   

9.
《Analytical letters》2012,45(7):713-724
Abstract

Two different sets of data have been subjected to distortion by induced systematic errors of types that are common in analytical chemistry. By means of eigenvector projections and a disjoint principal components analysis it is demonstrated that even gross systematic errors do not significantly influence the classification of the samples.  相似文献   

10.
Aqueous solubility is recognized as a critical parameter in both the early- and late-stage drug discovery. Therefore, in silico modeling of solubility has attracted extensive interests in recent years. Most previous studies have been limited in using relatively small data sets with limited diversity, which in turn limits the predictability of derived models. In this work, we present a support vector machines model for the binary classification of solubility by taking advantage of the largest known public data set that contains over 46?000 compounds with experimental solubility. Our model was optimized in combination with a reduction and recombination feature selection strategy. The best model demonstrated robust performance in both cross-validation and prediction of two independent test sets, indicating it could be a practical tool to select soluble compounds for screening, purchasing, and synthesizing. Moreover, our work may be used for comparative evaluation of solubility classification studies ascribe to the use of completely public resources.  相似文献   

11.
Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software.  相似文献   

12.
13.
19F NMR-based methods have found utility in activity-based screening assays. However, because enzymes catalyze a diverse set of reactions, a large variety of fluorinated substrates would need to be identified to target each one separately. We have developed a more streamlined approach that is applicable to many enzymes that utilize ATP as a substrate. In this method, a fluorine-containing ATP analogue, 2-fluoro-ATP, is used to monitor the reaction. Applications are described for nicotinamide adenine dinucleotide synthetase and 3-phosphoinositide dependent kinase-1. Fragment screening results for the latter indicate that this technique can identify compounds that inhibit as well as activate reactions. The present results, together with previous biochemical studies from other laboratories, have shown that 2-fluoro-ATP can serve as a substrate for nine enzymes that are representative of three of the six enzyme subclasses, namely the transferases, hydrolases, and ligases. This suggests that 2-fluoro-ATP is suitable as a universal tool for screening ATP-requiring enzymes. Importantly, 2-fluoro-ATP has been determined to be a valid substrate for a variety of kinases, including both small molecule and protein kinases, suggesting that it may be useful for investigating the large number of pharmaceutically relevant kinases.  相似文献   

14.
15.
As the use of high-throughput screening systems becomes more routine in the drug discovery process, there is an increasing need for fast and reliable analysis of the massive amounts of the resulting data. At the forefront of the methods used is data reduction, often assisted by cluster analysis. Activity thresholds reduce the data set under investigation to manageable sizes while clustering enables the detection of natural groups in that reduced subset, thereby revealing families of compounds that exhibit increased activity toward a specific biological target. The above process, designed to handle primarily data sets of sizes much smaller than the ones currently produced by high-throughput screening systems, has become one of the main bottlenecks of the modern drug discovery process. In addition to being fragmented and heavily dependent on human experts, it also ignores all screening information related to compounds with activity less than the threshold chosen and thus, in the best case, can only hope to discover a subset of the knowledge available in the screening data sets. To address the deficiencies of the current screening data analysis process the authors have developed a new method that analyzes thoroughly large screening data sets. In this report we describe in detail this new approach and present its main differences with the methods currently in use. Further, we analyze a well-known, publicly available data set using the proposed method. Our experimental results show that the proposed method can improve significantly both the ease of extraction and amount of knowledge discovered from screening data sets.  相似文献   

16.
17.
高旭红  文孟良  曹槐  李一青  李铭刚 《化学学报》2006,64(11):1163-1168
当今肿瘤的诊断与治疗对人类健康非常重要. 在放线菌次生代谢产物内寻找抗肿瘤抗生素的过程中, 如何从种类多、数量大的土壤放线菌株中确定目标菌株, 筛选是关键. 将化学计量学方法用于大批量的放线菌及其次生代谢产物的筛选数据分析, 首先通过主成分分析获得具有抗肿瘤活性菌株对6种肿瘤细胞抑制作用的一些关系与特异性, 其次在进行菌株次生代谢产物的分子模型筛选数据分析时, 利用比较分析发现有较好肿瘤抑制作用者, 基本与细胞筛选结果吻合. 二者结合建立了一种简便适当的分析模式, 能较快地从大批量放线菌株的筛选中找到具有抗肿瘤活性的有研究价值的目标放线菌株.  相似文献   

18.
A new machine learning method is presented for extracting interpretable structure-activity relationships from screening data. The method is based on an evolutionary algorithm and reduced graphs and aims to evolve a reduced graph query (subgraph) that is present within the active compounds and absent from the inactives. The reduced graph representation enables heterogeneous compounds, such as those found in high-throughput screening data, to be captured in a single representation with the resulting query encoding structure-activity information in a form that is readily interpretable by a chemist. The application of the method is illustrated using data sets extracted from the well-known MDDR data set and GSK in-house screening data. Queries are evolved that are consistent with the known SARs, and they are also shown to be robust when applied to independent sets that were not used in training.  相似文献   

19.
Summary The criteria and methods for the choice of the number and position of analytical wave-lengths (AWL) for the quantitative spectrophotometric analysis of multi-component mixtures by the least squares method were studied by means of five three- and four-component model systems. Comparison of the results of using various criteria and selection methods with the data of an exhaustive search of all the possible AWL combinations suggests that the sums of squares of the computer matrix elements, P and P j, could be recommended as the most efficient criteria. The most efficient selection method is that of successive screening of the wave-lengths which, after being eliminated from the initial AWL set, cause the least changes in the P or P j values. From the obtained dependences of P and P j on the number of AWL (n), the value of n can be chosen which provides the greatest reproducibility of the method for a given labour input, with the possible increase in the number of parallel determinations for each AWL being taken into account. The volume of computing can be diminished by ranking the AWL in the initial set and applying the method of successive screening or exhaustive search so as to diminish the number of AWL combinations. The above criteria and selection methods provide minimization of the effect of experimental errors in the optical densities of mixtures on the analytical results.
Kriterien und Algorithmen zur Wahl analytischer Wellenlängen für die spektralphotometrische Analyse von Mehrkomponenten-Gemischen
  相似文献   

20.
Tong  Xia  Zhang  Zhimin  Zeng  Fanjuan  Fu  Chunyan  Ma  Pan  Peng  Ying  Lu  Hongmei  Liang  Yizeng 《Chromatographia》2016,79(19):1247-1255

A novel algorithm, entitled recursive wavelet peak detection (RWPD), is proposed to detect both normal and overlapped peaks in analytical signals. Recursive peak detection is based on continuous wavelet transforms (CWTs), which can be used to obtain initial peak positions even for overlapped peaks. Genetic algorithm (GA) and Gaussian fitting are used to refine peak parameters (peak positions, widths, and heights). Finally, area of peaks can be calculated by numeric integration. Simulated and ultrahigh performance liquid chromatographic ion trap time-of-flight mass spectrometry (UPLC-IT-TOF-MS) data sets have been analyzed by RWPD, MassSpecWavelet, and peakfit package by Tom O’Haver. Results show that RWPD can obtain more accurate positions and smaller relative fitting errors than MassSpecWavelet and peakfit, especially in overlapped peaks. RWPD is a convenient tool for peak detection and deconvolution of overlapped peaks, and it has been developed in R programming language and is available at https://github.com/zmzhang/RWPD.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号