首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
A new method of imputation for left‐censored datasets is reported. This method is evaluated by examining datasets in which the true values of the censored data are known so that the quality of the imputation can be assessed both visually and by means of cluster analysis. Its performance in retaining certain data structures on imputation is compared with that of three other imputation algorithms by using cluster analysis on the imputed data. It is found that the new imputation method benefits a subsequent model‐based cluster analysis performed on the left‐censored data. The stochastic nature of the imputations performed in the new method can provide multiple imputed sets from the same incomplete data. The analysis of these provides an estimate of the uncertainty of the cluster analysis. Results from clustering suggest that the imputation is robust, with smaller uncertainty than that obtained from other multiple imputation methods applied to the same data. In addition, the use of the new method avoids problems with ill‐conditioning of group covariances during imputation as well as in the subsequent clustering based on expectation–maximization. The strong imputation performance of the proposed method on simulated datasets becomes more apparent as the groups in the mixture models are increasingly overlapped. Results from real datasets suggest that the best performance occurs when the requirement of normality of each group is fulfilled, which is the main assumption of the new method. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

2.
Single imputation methods have been wide-discussed topics among researchers in the field of bioinformatics. One major shortcoming of methods proposed until now is the lack of robustness considerations. Like all data, gene expression data can possess outlying values. The presence of these outliers could have negative effects on the imputated values for the missing values. Afterwards, the outcome of any statistical analysis on the completed data could lead to incorrect conclusions. Therefore it is important to consider the possibility of outliers in the data set, and to evaluate how imputation techniques will handle these values. In this paper, a simulation study is performed to test existing techniques for data imputation in case outlying values are present in the data. To overcome some shortcomings of the existing imputation techniques, a new robust imputation method that can deal with the presence of outliers in the data is introduced. In addition, the robust imputation procedure cleans the data for further statistical analysis. Moreover, this method can be easily extended towards a multiple imputation approach by which the uncertainty of the imputed values is emphasised. Finally, a classification example illustrates the lack of robustness of some existing imputation methods and shows the advantage of the multiple imputation approach of the new robust imputation technique.  相似文献   

3.
In recent years, many analyses have been carried out to investigate the chemical components of food data. However, studies rarely consider the compositional pitfalls of such analyses. This is problematic as it may lead to arbitrary results when non-compositional statistical analysis is applied to compositional datasets. In this study, compositional data analysis (CoDa), which is widely used in other research fields, is compared with classical statistical analysis to demonstrate how the results vary depending on the approach and to show the best possible statistical analysis. For example, honey and saffron are highly susceptible to adulteration and imitation, so the determination of their chemical elements requires the best possible statistical analysis. Our study demonstrated how principle component analysis (PCA) and classification results are influenced by the pre-processing steps conducted on the raw data, and the replacement strategies for missing values and non-detects. Furthermore, it demonstrated the differences in results when compositional and non-compositional methods were applied. Our results suggested that the outcome of the log-ratio analysis provided better separation between the pure and adulterated data and allowed for easier interpretability of the results and a higher accuracy of classification. Similarly, it showed that classification with artificial neural networks (ANNs) works poorly if the CoDa pre-processing steps are left out. From these results, we advise the application of CoDa methods for analyses of the chemical elements of food and for the characterization and authentication of food products.  相似文献   

4.
The origin of missing values can be caused by different reasons and depending on these origins missing values should be considered differently and dealt with in different ways. In this research, four methods of imputation have been compared with respect to revealing their effects on the normality and variance of data, on statistical significance and on the approximation of a suitable threshold to accept missing data as truly missing. Additionally, the effects of different strategies for controlling familywise error rate or false discovery and how they work with the different strategies for missing value imputation have been evaluated. Missing values were found to affect normality and variance of data and k‐means nearest neighbour imputation was the best method tested for restoring this. Bonferroni correction was the best method for maximizing true positives and minimizing false positives and it was observed that as low as 40% missing data could be truly missing. The range between 40 and 70% missing values was defined as a “gray area” and therefore a strategy has been proposed that provides a balance between the optimal imputation strategy that was k‐means nearest neighbor and the best approximation of positioning real zeros.  相似文献   

5.
Information retrieved from UV–visible spectroscopic data by application of a self-modelling factor analysis algorithm showed apparently systematically shifted thermodynamic properties for the same chemical system as a function of spectral slit widths. This empirical observation triggered a systematic investigation into the likely effects of residual and spectral correlation on the numerical results from quantitative spectroscopic investigations. If slit width was a nuisance factor it would reduce the comparability of information evaluated from spectroscopic data. The influence of spectral slit width was investigated by simulation, i.e. by generating and evaluating synthetic spectra with known properties. The simulations showed that increasing spectral correlation may introduce bias into factor analysis evaluations. By evaluation of the complete measurement uncertainty budget using threshold bootstrap target factor (TB CAT) analysis, the apparent shifts are insignificant relative to the total width of the quantitys measurement uncertainty. Increasing the slit widths causes some systematic effects, for example broadening of the registered spectral bands and reduction of spectral noise, because of higher light intensity passing to the detector. Hence, the observed systematic shifts in mean values might be caused by some latent correlation. As a general conclusion, slit width does not affect bias. However, the simulations show that spectral correlation and residual correlation may cause bias. Residual correlation can be taken into account by computer-intensive statistical methods, for example moving block or threshold bootstrap analysis. Spectral correlation is a property of the chemical system under study and cannot be manipulated. As a major result, evidence is given showing that stronger spectral correlation (r<–0.7) causes non-negligible bias in the evaluated thermodynamic information from such a system.  相似文献   

6.
On the Statistical Calibration of Physical Models   总被引:1,自引:0,他引:1       下载免费PDF全文
We introduce a novel statistical calibration framework for physical models, relying on probabilistic embedding of model discrepancy error within the model. For clarity of illustration, we take the measurement errors out of consideration, calibrating a chemical model of interest with respect to a more detailed model, considered as “truth” for the present purpose. We employ Bayesian statistical methods for such model‐to‐model calibration and demonstrate their capabilities on simple synthetic models, leading to a well‐defined parameter estimation problem that employs approximate Bayesian computation. The method is then demonstrated on two case studies for calibration of kinetic rate parameters for methane air chemistry, where ignition time information from a detailed elementary‐step kinetic model is used to estimate rate coefficients of a simple chemical mechanism. We show that the calibrated model predictions fit the data and that uncertainty in these predictions is consistent in a mean‐square sense with the discrepancy from the detailed model data.  相似文献   

7.
Abstract By means of clustering, one is able to manage large databases easily. Clustering according to structure similarity distinguished the several chemical classes that were present in our training set. All the clusters showed correlation of log WS with log K ( OW ) and melting point, except EINECS-cluster 1. This cluster contains only chemicals with melting points below room temperature, resulting in a log WS-log K( OW ), relationship. The observed weak correlation for this cluster is probably due to the insufficient number of available screens. Such a limited amount of screens allows relatively very different chemicals to share the same cluster. Using statistical criteria, our approach resulted in three QSARs with reasonably good predictive capabilities, originating from clusters 1639, 3472, and 5830. The models resulting from the smaller clusters 6873, 8154, and 16424 are characterised by high correlation coefficients which describe the cluster itself very well but, due to our stringent bootstrap criterion, they are close to randomness. Clusters 6815 and 18083 showed rather low correlations. The models originating from clusters 1639, 3472, and 5830 proved their usefulness by external validation. The log WS-values calculated with our QSARs agreed within 1 log-unit to these reported in the literature.  相似文献   

8.
Recent years have seen the introduction of many surface characterization instruments and other spectral imaging systems that are capable of generating data in truly prodigious quantities. The challenge faced by the analyst, then, is to extract the essential chemical information from this overwhelming volume of spectral data. Multivariate statistical techniques such as principal component analysis (PCA) and other forms of factor analysis promise to be among the most important and powerful tools for accomplishing this task. In order to benefit fully from multivariate methods, the nature of the noise specific to each measurement technique must be taken into account. For spectroscopic techniques that rely upon counting particles (photons, electrons, etc.), the observed noise is typically dominated by ‘counting statistics’ and is Poisson in nature. This implies that the absolute uncertainty in any given data point is not constant, rather, it increases with the number of counts represented by that point. Performing PCA, for instance, directly on the raw data leads to less than satisfactory results in such cases. This paper will present a simple method for weighting the data to account for Poisson noise. Using a simple time‐of‐flight secondary ion mass spectrometry spectrum image as an example, it will be demonstrated that PCA, when applied to the weighted data, leads to results that are more interpretable, provide greater noise rejection and are more robust than standard PCA. The weighting presented here is also shown to be an optimal approach to scaling data as a pretreatment prior to multivariate statistical analysis. Published in 2004 by John Wiley & Sons, Ltd.  相似文献   

9.
Thermodynamic data are suitable subject for investigating strategies and concepts for the evaluation of complete measurement uncertainty budgets in situations where the measurand cannot be expressed in a mathematical formula. Some suitable approaches are the various forms of Monte Carlo simulations in combination with computer-intensive statistical methods that are directed to an evaluation of empirical distribution curves for the uncertainty budget. Basis of the analysis is a cause-and-effect diagram. Some experience is available with cause-and-effect analysis of thermodynamic data derived from spectrophotometric data. Another important technique for the evaluation of thermodynamic data is glass-electrode potentiometry. On basis of a newly derived cause-and-effect diagram, a complete measurement uncertainty budget for the determination of the acidity constants of phosphoric acid by glass-electrode potentiometry is derived. A combination of Monte Carlo and bootstrap methods is applied in conjunction with the commercially available code SUPERQUAD. The results suggest that glass-electrode potentiometry may achieve a high within-laboratory precision because major uncertainty contributions become evident via interlaboratory comparisons. This finding is further underscored by analysing available literature data.  相似文献   

10.
Observed data often belong to some specific intervals of values (for instance in case of percentages or proportions) or are higher (lower) than pre‐specified values (for instance, chemical concentrations are higher than zero). The use of classical principal component analysis (PCA) may lead to extract components such that the reconstructed data take unfeasible values. In order to cope with this problem, a constrained generalization of PCA is proposed. The new technique, called bounded principal component analysis (B‐PCA), detects components such that the reconstructed data are constrained to belong to some pre‐specified bounds. This is done by implementing a row‐wise alternating least squares (ALS) algorithm, which exploits the potentialities of the least squares with inequality (LSI) algorithm. The results of a simulation study and two applications to bounded data are discussed for evaluating how the method and the algorithm for solving it work in practice. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

11.
Medieval glasses, including feet and rims of chalices, fragments of lamps and globular bottles, coming from the archaeological site of Siponto (Foggia, Italy), were analyzed by Inductively Coupled Plasma Emission Spectroscopy and Graphite Furnace Atomic Absorption Spectroscopy for investigating and defining glass production technology in Apulia (Italy) in the Middle Ages, because of the poor understanding currently achieved on either compositional and technological features of medieval glass items. The examined finds, whether colourless or coloured blue, yellow-green, yellow, pink and red, revealed a typical silica-soda-lime-composition. The chemical analysis and the statistical treatment of data allowed to trace former, flux, modifier and, where it is present, the element responsible for colour, clarifying production technology issues. It has been possible to identify, moreover, objects obtained by recycling of cullets or finished items.Finally, this work evaluates the effectiveness of the statistical multivariate treatment by Principal Component Analysis (PCA), Clustering Analysis (CA) and Factor Analysis (FA) on compositional data to obtain technological information in opposition to the conventional binary oxides diagram, which represent the most common, widely assessed, archaeometrical practice to obtain technological information from compositional data.  相似文献   

12.
Calibration of forcefields for molecular simulation should account for the measurement uncertainty of the reference dataset and for the model inadequacy, i.e., the inability of the force-field/simulation pair to reproduce experimental data within their uncertainty range. In all rigour, the resulting uncertainty of calibrated force-field parameters is a source of uncertainty for simulation predictions. Various calibration strategies and calibration models within the Bayesian calibration/prediction framework are explored in the present article. In the case of Lennard-Jones potential for Argon, we show that prediction uncertainty for thermodynamical and transport properties, albeit very small, is larger than statistical simulation uncertainty.  相似文献   

13.
 It is argued that results of uncertainty calculations in chemical analysis should be taken into consideration with some caution owing to their limited generality. The issue of the uncertainty in uncertainty estimation is discussed in two aspects. The first is due to the differences between procedure-oriented and result-oriented uncertainty assessments, and the second is due to the differences between the theoretical calculation of uncertainty and its quantication using the validation (experimental) data. It is shown that the uncertainty calculation for instrumental analytical methods using a regression calibration curve is result-oriented and meaningful only until the next calibration. A scheme for evaluation of the uncertainty in uncertainty calculation by statistical analysis of experimental data is given and illustrated with examples from the author's practice. Some recommendations for the design of corresponding experiments are formulated.  相似文献   

14.
This article addresses mathematical modeling of the reaction of inhibited organic compounds oxidation. It was proposed to solve the inverse problems of chemical kinetics by using the index method of constrained global minimization of the deviation between the calculated and experimental data. This approach allows us to use more complete information about the chemical process which, in its turn, reduces the range of possible solutions, narrows down the domain of uncertainty, and improves the quality of mathematical model of chemical reactions. The index method was applied to solving inverse kinetic problem for the reaction of n‐decane oxidation in the presence of a p‐oxydiphenylamine/n‐decyl alcohol inhibitory composition.  相似文献   

15.
We present an effective approach for modelling compositional data with large concentrations of zeros and several levels of variation, applied to a database of elemental compositions of forensic glass of various use types. The procedure consists of the following: (i) partitioning the data set in subsets characterised by the same pattern of presence/absence of chemical elements and (ii) fitting a Bayesian hierarchical model to the transformed compositions in each data subset. We derive expressions for the posterior predictive probability that newly observed fragments of glass are of a certain use type and for computing the evidential value of glass fragments relating to two competing propositions about their source. The model is assessed using cross‐validation, and it performs well in both the classification and evidence evaluation tasks. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

16.
A model is presented that correlates historical proficiency test data as the log of interlaboratory standard deviations versus the log of analyte concentrations, independent of analyte (measurand) or matrix. Analytical chemistry laboratories can use this model to set their internal measurement quality objectives and to apply the uncertainty budget process to assign the maximum allowable variation in each major step in their bias-free measurement systems. Laboratories that are compliant with this model are able to pass future proficiency tests and demonstrate competence to laboratory clients and ISO 17025 accreditation bodies. Electronic supplementary material to this paper can be obtained by using the Springer LINK server located at http://dx.doi.org/ 10.1007/s007690100398-y. Received: 31 March 2001 Accepted: 11 September 2001  相似文献   

17.
Nowadays, a lot of time and resources are used to determine the quality of goods and services. As a consequence, the quality of measurements themselves, e.g., the metrological traceability of the measured quantity values is essential to allow a proper evaluation of the results with regard to specifications and regulatory limits. This requires knowledge of the measurement uncertainties of all quantity values involved in the measurement procedure, including measurement standards. This study shows how the uncertainties due to the preparation, as well as the chemical and compositional stability of a chemical measurement standard, or calibrator, can be estimated. The results show that the relative standard uncertainty of the concentration value of a typical analytical measurement standard runs up to 2.8% after 1 year. Of this, 1.9% originates from the preparation of the measurement standard, while 2.0 and 0.53% originate from the chemical and compositional stability during storage at −20 °C. The monthly preparation of working calibrators stored at 4 °C and used on a weekly basis, results in an additional standard uncertainty of the analyte concentration value of 0.35% per month due to compositional stability. While the preparation procedure is the major contributor to the total measurement uncertainty, the uncertainties introduced by the stability measurements are another important contributor, and therefore, the measurement procedure to evaluate stability is important to minimize the total measurement uncertainty.  相似文献   

18.
The duplicate method for estimating uncertainty from measurement including sampling is presented in the Eurachem/CITAC guide. The applicability of this method as a tool for verifying sampling plans for mycotoxins was assessed in three case studies with aflatoxin B(1) in animal feedingstuffs. Aspects considered included strategies for obtaining samples from contaminated lots, assumptions about distributions, approaches for statistical analysis, log(10)-transformation of test data and applicability of uncertainty estimates. The results showed that when duplicate aggregate samples are formed by interpenetrating sampling, repeated measurements from a lot can be assumed to approximately follow a normal or lognormal distribution. Due to the large variation in toxin concentration between sampling targets and sometimes very large uncertainty arising from sampling and sample preparation (U(rel) ≥ 50%), estimation of uncertainty from log(10)-transformed data was found to be a more generally applicable approach than application of robust ANOVA.  相似文献   

19.
The characteristic features and the constituents of an identification procedure for chemical substances are discussed. This procedure is a screening of identification hypotheses followed by experimental testing of each one. The testing operation consists of comparison of the values of the quantities measured with other measurement results or reference data, resulting in the Student's ratio, the significance level, the matching of spectra, etc. The performance and the correctness of identification are expressed as "identification uncertainty", i.e. the probability of incorrect identification. The statistical significance level and other similarity values in spectra, chromatography retention parameters, etc. are the particular measures of uncertainty. Searching of prior data and estimation of the prior probability of the presence of particular compounds in the sample (matrix) to be analysed simplifies the setting up and cancelling of hypotheses during screening. Usually, identification is made by the analyst taking into account measurement results, prior information and personal considerations. The estimation of uncertainty and rules for the incorporation of prior data, make the result of identification less subjective.  相似文献   

20.
We have studied rapid calibration models to predict the composition of a variety of biomass feedstocks by correlating near-infrared (NIR) spectroscopic data to compositional data produced using traditional wet chemical analysis techniques. The rapid calibration models are developed using multivariate statistical analysis of the spectroscopic and wet chemical data. This work discusses the latest versions of the NIR calibration models for corn stover feedstock and dilute-acid pretreated corn stover. Measures of the calibration precision and uncertainty are presented. No statistically significant differences (p = 0.05) are seen between NIR calibration models built using different mathematical pretreatments. Finally, two common algorithms for building NIR calibration models are compared; no statistically significant differences (p = 0.05) are seen for the major constituents glucan, xylan, and lignin, but the algorithms did produce different predictions for total extractives. A single calibration model combining the corn stover feedstock and dilute-acid pretreated corn stover samples gave less satisfactory predictions than the separate models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号