首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Window factor analysis (WFA) is a powerful tool in analyzing evolutionary process. However, it was found that window factor analysis is much sensitive to the noise involved in original data matrix. An error analysis was done with the fact that the concentration profiles resolved by the conventional window factor analysis are easily distorted by the noise reserved by the abstract factor analysis (AFA), and a modified algorithm for window factor analysis was proposed. Both simulated and experimental HPLC-DAD data were investigated by the conventional and the improved methods. Results show that the improved method can yield less noise-distorted concentration profiles than the conventional method, and the ability for resolution of noisy data sets can be greatly enhanced.  相似文献   

2.
This article presents a data analysis method for biomarker discovery in proteomics data analysis. In factor analysis-based discriminate models, the latent variables (LV's) are calculated from the response data measured at all employed instrument channels. Since some channels are irrelevant and their responses do not possess useful information, the extracted LV's possess mixed information from both useful and irrelevant channels. In this work, clustering of variables (CLoVA) based on unsupervised pattern recognition is suggested as an efficient method to identify the most informative spectral region and then it is used to construct a more predictive multivariate classification model. In the suggested method, the instrument channels (m/z value) are clustered into different clusters via self-organization map. Subsequently, the spectral data of each cluster are separately used as the input variables of classification methods such as partial least square-discriminate analysis (PLS-DA) and extended canonical variate analysis (ECVA). The proposed method is evaluated by the analysis of two experimental data sets (ovarian and prostate cancer data set). It is found that our proposed method is able to detect cancerous from healthy samples with much higher sensitivity and selectivity than conventional PLS-DA and ECVA methods.  相似文献   

3.
Abstract

Cluster significance analysis is a tool that allows the identification of ‘embedded clusters’ in QSAR datasets. It is successfully applied to an eye irritation data set to show that these data are indeed asymmetric. The method identifies five parameters that form an embedded cluster of eye irritants amongst non irritants, although full separation is not achieved. This method has considerable potential to identify potential non-linearity in toxicology data sets and for parameter reduction. It is shown also that this can be obtained relatively quickly with an analysis performed on 100,000 subsets containing the same information as an analysis on 1,000,000 subsets.  相似文献   

4.
《Analytica chimica acta》2004,515(1):87-100
The goal of present work is to analyse the effect of having non-informative variables (NIV) in a data set when applying cluster analysis and to propose a method computationally capable of detecting and removing these variables. The method proposed is based on the use of a genetic algorithm to select those variables important to make the presence of groups in data clear. The procedure has been implemented to be used with k-means and using the cluster silhouettes as fitness function for the genetic algorithm.The main problem that can appear when applying the method to real data is the fact that, in general, we do not know a priori what the real cluster structure is (number and composition of the groups).The work explores the evolution of the silhouette values computed from the clusters built by using k-means when non-informative variables are added to the original data set in both a literature data set as well as some simulated data in higher dimension. The procedure has also been applied to real data sets.  相似文献   

5.
对应分析用于中国茶叶化学数据的综合分析   总被引:4,自引:0,他引:4  
吴海龙  曾北危 《分析化学》1991,19(4):456-459
  相似文献   

6.
Spectroscopic imaging techniques provide spatial and spectral information about a sample simultaneously and are finding ever-increasing application in the pharmaceutical industry. Effective extraction of chemical information from imaging data sets is a crucial step during the application of imaging techniques. Multivariate imaging data analysis methods have been reported but few applications of these methods for pharmaceutical samples have been demonstrated. In this study, a bilayer model tablet consisting of avicel, lactose, sodium benzoate, magnesium stearate and red dye was prepared using custom press tooling, and Raman mapping data were collected from a 400 μm × 400 μm area of the tablet surface. Several representative multivariate methods were selected and used in the analysis of the data. Multivariate data analysis methods investigated include principal component analysis (PCA), cluster analysis, direct classical least squares (DCLS) and multivariate curve resolution (MCR). The relative merits and drawbacks of each technique for this application were evaluated. In addition, some practical issues associated with the use of these methods were addressed including data preprocessing, determination of the optimal number of clusters in cluster analysis and the optimization of window size in second derivative calculation.  相似文献   

7.
8.
9.
In order to bring out the nature of the factors influencing lake water composition, multivariate statistical analysis and trend analysis were performed based on the hydrochemical data of the study area, namely, South Chennai. Change in land use pattern and settlements along the banks of the lakes alters the quality and quantity of the surface water. In the present study, the R‐mode factor analysis and cluster analysis were applied to the geochemical parameters of the water to identify the factors affecting the chemical composition of the lake water. Dendograms of both the seasons give three major clusters, reflecting the groups of unpolluted to moderately polluted, polluted, and heavily polluted stations. The movement of stations from one cluster to another clearly brings out the seasonal variation in the chemical composition of the lake water. The complex hydrochemical data of the surface water were interpreted by condensing them into three major factors. Factor score analysis was used successfully to delineate the stations under study and the role of the contributing factors, and the nature of factors responsible for the variation in chemical composition of the water has been clearly brought out. Results of trend analysis using ArcGIS clearly indicate that the trend in water quality is deteriorating at a faster rate in the eastern part of the study area. It is understood that although natural shifts probably can account for some of the variation, it is most likely that human activities play a major role in affecting the water quality on a regional scale. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

10.
Parallel factor analysis was used to quantify the relative concentrations of peaks within four-way comprehensive two dimensional liquid chromatography–diode array detector data sets. Since parallel factor analysis requires that the retention times of peaks between each injection are reproducible, a semi-automated alignment method was developed that utilizes the spectra of the compounds to independently align the peaks without the need for a reference injection. Peak alignment is achieved by shifting the optimized chromatographic component profiles from a three-way parallel factor analysis model applied to each injection. To ensure accurate shifting, components are matched up based on their spectral signature and the position of the peak in both chromatographic dimensions. The degree of shift, for each peak, is determined by calculating the distance between the median data point of the respective dimension (in either the second or first chromatographic dimension) and the maximum data point of the peak furthest from the median. All peaks that were matched to this peak are then aligned to this common retention data point. Target analyte recoveries for four simulated data sets were within 2% of 100% recovery in all cases. Two different experimental data sets were also evaluated. Precision of quantification of two spectrally similar and partially coeluting peaks present in urine was as good as or better than 4%. Good results were also obtained for a challenging analysis of phenytoin in waste water effluent, where the results of the semi-automated alignment method agreed with the reference LC–LC MS/MS method within the precision of the methods.  相似文献   

11.
The congener profile of samples contaminated with dioxin and dioxin-like compounds allows identifying sources of contamination. This article studies the statistical methods of congener profile analysis reported in the literature with respect to the reliability of obtained results. The performance of customary analysis methods regarding raw data transformation and applied TEF (toxic equivalency factor) values is discussed. In particular, the method of principal component analysis and k-means cluster is taken as an example and examined in detail. Reasons for occurring inconsistencies such as the dependence of results on raw data transformation and the disregard of measurement uncertainty are described, and it is shown that they also explain inconsistencies in other methods of cluster analysis such as hierarchical cluster analysis and neural networks. It is concluded that these methods cannot be employed to reach court-proof decisions, i.e. decisions which meet court evidentiary standards. An alternative approach to analyzing congener profiles based on mathematical statistics is briefly presented, allowing reliable, court-proof decisions.  相似文献   

12.
13.
In environmental chemistry studies, it may be necessary to analyze data sets constituted by different blocks of variables, possibly of different types, measured on the same samples. Multiple factor analysis (MFA) is presented as a tool for exploring such data. The most important features of MFA are shown on a real environmental data set, consisting of two blocks of data, namely heavy metals and polycyclic aromatic hydrocarbons, measured for sediment samples. They are discussed and compared to principal component analysis (PCA). The usefulness of the weighting scheme used in MFA as a preprocessing step for other chemometric methods, such as clustering, is also highlighted.  相似文献   

14.
一种基于免疫算法的新型因子分析算法   总被引:3,自引:0,他引:3  
基于免疫算法的基本思想,提出了新的免疫主成分分析法(IPCA),该方法将免疫算法中抗体对抗原的消除运算应用于二维数据矩阵的正交分解,可得到矩阵的特征值和特征向量.结果表明,IPCA与传统的主成分分析法比较,对HPLC-DAD模拟信号的计算结果基本一致.对HPLC-DAD实验信号的解析结果表明,将IPCA与窗口因子分析技术结合比传统的WFA具有更强的解析能力.  相似文献   

15.
A new method, using a combination of 4D-molecular similarity measures and cluster analysis to construct optimum QSAR models, is applied to a data set of 150 chemically diverse compounds to build optimum blood-brain barrier (BBB) penetration models. The complete data set is divided into subsets based on 4D-molecular similarity measures using cluster analysis. The compounds in each cluster subset are further divided into a training set and a test set. Predictive QASAR models are constructed for each cluster subset using the corresponding training sets. These QSAR models best predict test set compounds which are assigned to the same cluster subset, based on the 4D-molecular similarity measures, from which the models are derived. The results suggest that the specific properties governing blood-brain barrier permeability may vary across chemically diverse compounds. Partitioning compounds into chemically similar classes is essential to constructing predictive blood-brain barrier penetration models embedding the corresponding key physiochemical properties of a given chemical class.  相似文献   

16.
A new method of imputation for left‐censored datasets is reported. This method is evaluated by examining datasets in which the true values of the censored data are known so that the quality of the imputation can be assessed both visually and by means of cluster analysis. Its performance in retaining certain data structures on imputation is compared with that of three other imputation algorithms by using cluster analysis on the imputed data. It is found that the new imputation method benefits a subsequent model‐based cluster analysis performed on the left‐censored data. The stochastic nature of the imputations performed in the new method can provide multiple imputed sets from the same incomplete data. The analysis of these provides an estimate of the uncertainty of the cluster analysis. Results from clustering suggest that the imputation is robust, with smaller uncertainty than that obtained from other multiple imputation methods applied to the same data. In addition, the use of the new method avoids problems with ill‐conditioning of group covariances during imputation as well as in the subsequent clustering based on expectation–maximization. The strong imputation performance of the proposed method on simulated datasets becomes more apparent as the groups in the mixture models are increasingly overlapped. Results from real datasets suggest that the best performance occurs when the requirement of normality of each group is fulfilled, which is the main assumption of the new method. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

17.
With the use of atomic and nuclear methods to analyze samples for a multitude of elements, very large data sets have been generated. Due to the ease of obtaining these results with computerized systems, the elemental data acquired are not always as thoroughly checked as they should be leading to some, if not many, bad data points. It is advantageous to have some feeling for the trouble spots in a data, set before it is used for further studies. A technique which has the ability to identify bad data points, after the data has been generated, is classical factor analysis. The ability of classical factor analysis to identify two different types of data errors make it ideally suited for scanning large data sets. Since the results, yielded by factor analysis indicate correlations between parameters, one must know something about the nature of the data set and the analytical techniques used to obtain it to confidentially isolate errors.  相似文献   

18.
聚类分析法研究化妆品的真伪   总被引:7,自引:1,他引:6  
冯建跃  朱金炎 《分析化学》1996,24(1):104-108
本文以活性炭吸附丝采集“永芳F真珠膏”的挥发性组份,用GC-9A的气相色谱仪进行测定,对24个真品进行R型聚类分析和因子分析,提取了14个特征指标,并以Q型聚类方式建立了鉴别“永芳F真珠膏”真伪的数学模型,用3个假冒产品和2个真品对模型进行验证,结果令人满意。  相似文献   

19.
Factor, cluster and self-organizing map analyses were performed for the stability constants of complexes of 24 metal ions and hydrogen with 3960 ligands (15606 values of log K1). Five factors reproduce 89% of data variability. Both direct clusterization and clusterization on the basis of factor analysis established the existence of six different classes of similar cations. The similarity series for metal ions and relative similarity of several ions are discussed and the Kohonen two-dimensional map, which visually represents the similarity, is presented.  相似文献   

20.
We present a new strategy for analyzing imaging time‐of‐flight SIMS data sets affected by detector saturation. Rather than attempt to correct the measured data to remove saturation, we incorporate the detector behavior into the statistical basis of the analysis. This is performed within the framework of maximum a posteriori reconstruction. The proposed approach has several advantages over previous techniques. No approximations are involved other than the assumed model of the detector. The method performs well even when applied to highly saturated and/or single‐scan data sets. It is statistically rigorous, correctly treating the underlying statistical distribution of the data. It is also compatible with Bayesian methods for incorporating prior knowledge about sample properties. An efficient iterative scheme for solving the proposed equations is presented for the case of the bilinear model commonly used in analyses of SIMS data. The correctness of the approach and its efficacy are demonstrated on synthetic data sets. The method is found to perform better than a widely‐used data‐correction method used in combination with alternating‐least‐squares Multivariate Curve Resolution analysis. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号