首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The procedures necessary to find the appropriate data banks in seeking particular information or data are much less systematic than the way in which the information is stored at some data banks. Based on the information taken from several hundred direct-mailed questionnaires, a conceptual design is proposed for a data base of toxicological data banks relating to other areas such as medicine, pharmacology, biology, chemistry and environmental science. The system (not yet implemented) contains nearly 150 data banks (both computerized and manual) all over the world with data on the type of information, the way to obtain it, its cost, etc.  相似文献   

2.
The tremendous increase of chemical data sets, both in size and number, and the simultaneous desire to speed up the drug discovery process has resulted in an increasing need for a new generation of computational tools that assist in the extraction of information from data and allow for rapid and in-depth data mining. During recent years, visual data mining has become an important tool within the life sciences and drug discovery area with the potential to help avoiding data analysis from turning into a bottleneck. In this paper, we present InfVis, a platform-independent visual data mining tool for chemists, who usually only have little experience with classical data mining tools, for the visualization, exploration, and analysis of multivariate data sets. InfVis represents multidimensional data sets by using intuitive 3D glyph information visualization techniques. Interactive and dynamic tools such as dynamic query devices allow real-time, interactive data set manipulations and support the user in the identification of relationships and patterns. InfVis has been implemented in Java and Java3D and can be run on a broad range of platforms and operating systems. It can also be embedded as an applet in Web-based interfaces. We will present in this paper examples detailing the analysis of a reaction database that demonstrate how InfVis assists chemists in identifying and extracting hidden information.  相似文献   

3.
The scientific literature is important source of experimental and chemical structure data. Very often this data has been harvested into smaller or bigger data collections leaving the data quality and curation issues on shoulders of users. The current research presents a systematic and reproducible workflow for collecting series of data points from scientific literature and assembling a database that is suitable for the purposes of high quality modelling and decision support. The quality assurance aspect of the workflow is concerned with the curation of both chemical structures and associated toxicity values at (1) single data point level and (2) collection of data points level. The assembly of a database employs a novel “timeline” approach. The workflow is implemented as a software solution and its applicability is demonstrated on the example of the Tetrahymena pyriformis acute aquatic toxicity endpoint. A literature collection of 86 primary publications for T. pyriformis was found to contain 2,072 chemical compounds and 2,498 unique toxicity values, which divide into 2,440 numerical and 58 textual values. Every chemical compound was assigned to a preferred toxicity value. Examples for most common chemical and toxicological data curation scenarios are discussed.  相似文献   

4.
5.
Clustering of gene expression data collected across time is receiving growing attention in the biological literature since time-course experiments allow one to understand dynamic biological processes and identify genes governed by the same processes. It is believed that genes demonstrating similar expression profiles over time might give an informative insight into how underlying biological mechanisms work. In this paper, we propose a method based on functional data analysis (FNDA) to cluster time-dependent gene expression profiles. Consideration of clustering problems using the FNDA setting provides ways to take time dependency into account by using basis function expansion to describe the partially observed curves. We also discuss how to choose the number of bases in the basis function expansion in FNDA. A synthetic cycle data and a real data are used to demonstrate the proposed method and some comparisons between the proposed and existing approaches using the adjusted Rand indices are made.  相似文献   

6.
Different strategies of multivariate data analysis are used to interpret a data base from geological samples. Cluster and correspondence analysis are applied to classify properly 34 chemical elements from 10 representative rock samples (volcanic series from Borovitsa, Rhodopa mountains, Bulgaria). Principal components analysis is also used as display method to visualize the relation between the variables and objects of interest. The multivariate data analysis applied makes it possible to interpret the origin and orogenesis of the samples.  相似文献   

7.
The results of an attempt to construct images from multivariate analytical data are described. The method is based on addition of the features of a pattern in a Fourier polynomial and plotting the resulting summation in polar coordinates. The images formed in this way resemble flies, dragonflies, moths, etc. and can be constructed on a personal computer with graphics and printed on a simple matrix printer.  相似文献   

8.
ChemCam is a remote laser-induced breakdown spectroscopy (LIBS) instrument that will arrive on Mars in 2012, on-board the Mars Science Laboratory Rover. The LIBS technique is crucial to accurately identify samples and quantify elemental abundances at various distances from the rover. In this study, we compare different linear and nonlinear multivariate techniques to visualize and discriminate clusters in two dimensions (2D) from the data obtained with ChemCam. We have used principal components analysis (PCA) and independent components analysis (ICA) for the linear tools and compared them with the nonlinear Sammon’s map projection technique. We demonstrate that the Sammon’s map gives the best 2D representation of the data set, with optimization values from 2.8% to 4.3% (0% is a perfect representation), together with an entropy value of 0.81 for the purity of the clustering analysis. The linear 2D projections result in three (ICA) and five times (PCA) more stress, and their clustering purity is more than twice higher with entropy values about 1.8. We show that the Sammon’s map algorithm is faster and gives a slightly better representation of the data set if the initial conditions are taken from the ICA projection rather than the PCA projection. We conclude that the nonlinear Sammon’s map projection is the best technique for combining data visualization and clustering assessment of the ChemCam LIBS data in 2D. PCA and ICA projections on more dimensions would improve on these numbers at the cost of the intuitive interpretation of the 2D projection by a human operator.  相似文献   

9.
10.
11.
《Tetrahedron letters》2003,44(26):4813
  相似文献   

12.
13.
The effects of normalization and weighting on principal component analysis (p.c.a.) of gas chromatographic data are investigated. The weighting procedure called autoscaling masks the patterns inherent in the data if it is not applied separately to the different groups (classes) of samples (objects) in the data set. When p.c.a. is used unsupervised on objects characterized by variables differing greatly in relative size, logarithmic transformation of raw data seems preferable. This transformation has the ability both to unmask systematic variation in “small” variables, and to retain the data structure, avoiding the problem of closure. The logarithmic transformation also makes the distribution of each variable more normal.  相似文献   

14.
The accuracies of determinations of purity and freezing point based upon cryometric freezes suffer from the scatter of data, from the failure of systems ever to recover from the effects of supercooling, and from complex phenomena that elevate temperature during the later parts of runs. Methods are here proposed for decreasing the errors caused by scatter of data and failure to recover from supercooling. These methods utilize the optical projection of calculated curves upon the actual data and involve a new interpretation of the nature of recovery from supercooling.  相似文献   

15.
16.
17.
18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号