首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
在核磁共振代谢组学数据预处理中,尺度归一化主要目的是提高特征代谢物信息的权重,减小噪声及无关代谢物信息的影响,从而降低后续模式识别分析的难度. 本文提出一种新的尺度归一化方法,该方法不强调各变量在尺度上的归一,而是在原始数据的基础上,通过提高那些稳定性高、且在不同类别样本中具有显著差异性的变量的权重,以增强与特征代谢物相关的信息. 文中分别采用模拟数据和真实代谢组学数据对新归一化方法的性能进行评估,并与单位方差法(Unit Variance)、变量稳定性(Variable Stability)和尺度缩放法(Level Scaling)等常用的尺度归一化方法做比较. 研究结果表明:新归一化方法能够提高多变量统计模型的预测能力,较好地保留核磁共振谱的分子信息,有助于特征代谢物的识别,并使后续的数据分析结果具有更好的可解释性.  相似文献   

2.
In this paper, multivariate calibration of complicated process fluorescence data is presented. Two data sets related to the production of white sugar are investigated. The first data set comprises 106 observations and 571 spectral variables, and the second data set 268 observations and 3997 spectral variables. In both applications, a single response, ash content, is modelled and predicted as a function of the spectral variables. Both data sets contain certain features making multivariate calibration efforts non-trivial. The objective is to show how principal component analysis (PCA) and partial least squares (PLS) regression can be used to overview the data sets and to establish predictively sound regression models. It is shown how a recently developed technique for signal filtering, orthogonal signal correction (OSC), can be applied in multivariate calibration to enhance predictive power. In addition, signal compression is tested on the larger data set using wavelet analysis. It is demonstrated that a compression down to 4% of the original matrix size — in the variable direction — is possible without loss of predictive power. It is concluded that the combination of OSC for pre-processing and wavelet analysis for compression of spectral data is promising for future use.  相似文献   

3.
A robust method was developed to cluster similar NMR spectra from partially purified extracts obtained from a range of marine sponges and a plant biota. The NMR data were acquired using microtiter plate NMR (VAST) in protonated solvents. A sample data set which contained several clusters was used to optimize the protocol. The evaluation of the robustness was performed using three different clustering methods: tree clustering analysis, K-means clustering and multidimensional scaling. These methods were compared for consistency using the sample data set and the optimized methodology was applied to clustering of a set of spectra from partially purified biota extracts.  相似文献   

4.
Abstract  This work describes a quantitative spectroscopic method for the analysis of ternary mixtures of ceratine (CER), creatinine (CRE), and uric acid (UA) using multivariate data models based upon ultraviolet spectroscopy. By multivariate calibration methods, such as partial least squares regression, it is possible to obtain a model adjusted to the concentration values of the mixtures used in the calibration range. In this study, the calibration model is based on absorption spectra in the 200–260 nm range for 36 different mixtures of CER, CRE, and UA. The unrelated information was removed by the orthogonal signal correction (OSC) method and the results were proved. Evaluation of the prediction errors for the prediction set reveals the OSC-treated data give substantially lower root mean square error of prediction (RMSEP) values than original data. The RMSEP for CER, CRE, and UA with OSC were 1.1686, 0.2195, and 0.3726, and without OSC were 1.9057, 0.3482, and 0.6164, respectively. This procedure allows the simultaneous determination of CER, CRE, and UA in synthetic and real samples. Graphical abstract     相似文献   

5.
《Analytica chimica acta》2004,509(2):217-227
In near-infrared (NIR) measurements, some physical features of the sample can be responsible for effects like light scattering, which lead to systematic variations unrelated to the studied responses. These errors can disturb the robustness and reliability of multivariate calibration models. Several mathematical treatments are usually applied to remove systematic noise in data, being the most common derivation, standard normal variate (SNV) and multiplicative scatter correction (MSC). New mathematical treatments, such as orthogonal signal correction (OSC) and direct orthogonal signal correction (DOSC), have been developed to minimize the variability unrelated to the response in spectral data. In this work, these two new pre-processing methods were applied to a set of roasted coffee NIR spectra. A separate calibration model was developed to quantify the ash content and lipids in roasted coffee samples by PLS regression. The results provided by these correction methods were compared to those obtained with the original data and the data corrected by derivation, SNV and MSC. For both responses, OSC and DOSC treatments gave PLS calibration models with improved prediction abilities (4.9 and 3.3% RMSEP with corrected data versus 7.1 and 8.3% RMSEP with original data, respectively).  相似文献   

6.
Multivariate spectral analysis has been widely applied in chemistry and other fields. Spectral data consisting of measurements at hundreds and even thousands of analytical channels can now be obtained in a few seconds. It is widely accepted that before a multivariate regression model is built, a well-performed variable selection can be helpful to improve the predictive ability of the model. In this paper, the concept of traditional wavelength variable selection has been extended and the idea of variable weighting is incorporated into least-squares support vector machine (LS-SVM). A recently proposed global optimization method, particle swarm optimization (PSO) algorithm is used to search for the weights of variables and the hyper-parameters involved in LS-SVM optimizing the training of a calibration set and the prediction of an independent validation set. All the computation process of this method is automatic. Two real data sets are investigated and the results are compared those of PLS, uninformative variable elimination-PLS (UVE-PLS) and LS-SVM models to demonstrate the advantages of the proposed method.  相似文献   

7.
1H nuclear magnetic resonance (NMR)-based metabonomics is a well-established technique used to analyse and interpret complex multiparametric metabolic data, and has a wide number of applications in the development of pharmaceuticals. However, interpretation of biological data can be confounded by extraneous variation in the data such as fluctuations in either experimental conditions or in physiological status. Here we have shown the novel application of a data filtering method, orthogonal signal correction (OSC), to biofluid NMR data to minimise the influence of inter- and intra-spectrometer variation during data acquisition, and also to minimise innate physiological variation. The removal of orthogonal variation exposed features of interest in the NMR data and facilitated interpretation of the derived multivariate models. Furthermore, analysis of the orthogonal variation provided an explanation of the systematic analytical/biological changes responsible for confounding the original NMR data.  相似文献   

8.
Biomarker discovery is one important goal in metabolomics, which is typically modeled as selecting the most discriminating metabolites for classification and often referred to as variable importance analysis or variable selection. Until now, a number of variable importance analysis methods to discover biomarkers in the metabolomics studies have been proposed. However, different methods are mostly likely to generate different variable ranking results due to their different principles. Each method generates a variable ranking list just as an expert presents an opinion. The problem of inconsistency between different variable ranking methods is often ignored. To address this problem, a simple and ideal solution is that every ranking should be taken into account. In this study, a strategy, called rank aggregation, was employed. It is an indispensable tool for merging individual ranking lists into a single “super”-list reflective of the overall preference or importance within the population. This “super”-list is regarded as the final ranking for biomarker discovery. Finally, it was used for biomarkers discovery and selecting the best variable subset with the highest predictive classification accuracy. Nine methods were used, including three univariate filtering and six multivariate methods. When applied to two metabolic datasets (Childhood overweight dataset and Tubulointerstitial lesions dataset), the results show that the performance of rank aggregation has improved greatly with higher prediction accuracy compared with using all variables. Moreover, it is also better than penalized method, least absolute shrinkage and selectionator operator (LASSO), with higher prediction accuracy or less number of selected variables which are more interpretable.  相似文献   

9.
Multivariate methods, such as principal component analysis (PCA) and multivariate curve resolution (MCR), are often employed to aid the analysis of large complex data sets such as time‐of‐flight secondary ion mass spectrometry (ToF‐SIMS) images. There is, however, much confusion over the most appropriate choice of method for any given application and the effects of data preprocessing, which is exacerbated by the confusing terminologies and the use of jargon in this field. In the present study, a simple model system consisting of a ToF‐SIMS image of an immiscible polymer blend is used to evaluate PCA and MCR in the accurate identification, localisation and quantification of the phase‐separated polymer domains, using four data preprocessing methods (no scaling, normalisation, variance scaling and Poisson scaling). This highlights significant issues and challenges in the quantitative multivariate analysis of mixed organic systems, including the discrimination of chemically significant features from experimental noise, the resolution of weak chemical contributions and potential bias introduced by data preprocessing. Multivariate analysis using Poisson scaling, identified as the most suitable data preprocessing method for both PCA and MCR, demonstrates a marked improvement upon traditional (manual) analysis and provides valuable additional information that is difficult to detect using traditional analysis. Using these results, we present recommendations for the optimum use of multivariate analysis by analysts and provide guidance on selecting the most appropriate methods. Confusing terminology is also clarified. © Crown copyright 2008. Reproduced with the permission of Her Majesty's Stationery Office. Published by John Wiley & Sons, Ltd.  相似文献   

10.
Data fusion in multivariate calibration transfer   总被引:1,自引:0,他引:1  
We report the use of stacked partial least-squares regression and stacked dual-domain regression analysis with four commonly used techniques for calibration transfer to improve predictive performance from transferred multivariate calibration models. The predictive performance from three conventional calibration transfer methods, piecewise direct standardization (PDS), orthogonal signal correction (OSC) and model updating (MUP), requiring standards measured on both instruments, was significantly improved from data fusion either by stacking of wavelet scales or by stacking of spectral intervals, as demonstrated by transfer of calibrations developed on near-infrared spectra of synthetic gasoline. Stacking did not produce as significant an improvement for calibration transfer using a finite impulse response (FIR) filter, but application of SPLS regression to FIR-transferred spectra improves predictive performance of the transferred model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号