首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
The presence of multicollinearity in regression data is no exception in real life examples. Instead of applying ordinary regression methods, biased regression techniques such as principal component regression and ridge regression have been developed to cope with such datasets. In this paper, we consider partial least squares (PLS) regression by means of the SIMPLS algorithm. Because the SIMPLS algorithm is based on the empirical variance-covariance matrix of the data and on least squares regression, outliers have a damaging effect on the estimates. To reduce this pernicious effect of outliers, we propose to replace the empirical variance-covariance matrix in SIMPLS by a robust covariance estimator. We derive the influence function of the resulting PLS weight vectors and the regression estimates, and conclude that they will be bounded if the robust covariance estimator has a bounded influence function. Also the breakdown value is inherited from the robust estimator. We illustrate the results using the MCD estimator and the reweighted MCD estimator (RMCD) for low-dimensional datasets. Also some empirical properties are provided for a high-dimensional dataset.  相似文献   

2.
Recently discovered identities in statistical mechanics have enabled the calculation of equilibrium ensemble averages from realizations of driven nonequilibrium processes, including single-molecule pulling experiments and analogous computer simulations. Challenges in collecting large data sets motivate the pursuit of efficient statistical estimators that maximize use of available information. Along these lines, Hummer and Szabo developed an estimator that combines data from multiple time slices along a driven nonequilibrium process to compute the potential of mean force. Here, we generalize their approach, pooling information from multiple time slices to estimate arbitrary equilibrium expectations. Our expression may be combined with estimators of path-ensemble averages, including existing optimal estimators that use data collected by unidirectional and bidirectional protocols. We demonstrate the estimator by calculating free energies, moments of the polymer extension, the thermodynamic metric tensor, and the thermodynamic length in a model single-molecule pulling experiment. Compared to estimators that only use individual time slices, our multiple time-slice estimators yield substantially smoother estimates and achieve lower variance for higher-order moments.  相似文献   

3.
The spatial sign is a multivariate extension of the concept of sign. Recently multivariate estimators of covariance structures based on spatial signs have been examined by various authors. These new estimators are found to be robust to outlying observations. From a computational point of view, estimators based on spatial sign are very easy to implement as they boil down to a transformation of the data to their spatial signs, from which the classical estimator is then computed. Hence, one can also consider the transformation to spatial signs to be a preprocessing technique, which ensures that the calibration procedure as a whole is robust. In this paper, we examine the special case of spatial sign preprocessing in combination with partial least squares regression as the latter technique is frequently applied in the context of chemical data analysis. In a simulation study, we compare the performance of the spatial sign transformation to nontransformed data as well as to two robust counterparts of partial least squares regression. It turns out that the spatial sign transform is fairly efficient but has some undesirable bias properties. The method is applied to a recently published data set in the field of quantitative structure-activity relationships, where it is seen to perform equally well as the previously described best linear model for these data.  相似文献   

4.
A set of laboratory practices is proposed in which evaluation of the quality of the analytical measurements is incorporated explicitly by applying systematically suitable methodology for extracting the useful information contained in chemical data. Non-parametric and robust techniques useful for detecting outliers have been used to evaluate different figures of merit in the validation and optimization of analytical methods. In particular, they are used for determination of the capability of detection according to ISO 11843 and IUPAC and for determination of linear range, for assessment of the response surface fitted using an experimental design to optimize an instrumental technique, and for analysis of a proficiency test carried out by different groups of students. The tools used are robust regression, least median of squares (LMS) regression, and some robust estimators as median absolute deviation (m.a.d.) or Huber estimator, which are very useful as an alternatives to the usual centralization and dispersion estimators.  相似文献   

5.
The dispersion of microbiological counting measurements, when repeating the analysis on the same material both within a laboratory (repeatability) and between laboratories (reproducibility) can be characterized by the organization of interlaboratory studies, where several sets of identical test materials are sent to several laboratories. Using the example of data generated by an interlaboratory study on enumeration of Listeria monocytogenes in foods by the standardized reference method (colony-count technique), 2 types of robust estimators of reproducibility standard deviations, based on the median, were examined, in comparison with the classical estimators, based on the mean. Experimental evaluation indicated that the 3 approaches gave consistent results for most of the combinations. The usual log10 transformation of the enumeration results was also questioned before these calculations were conducted.  相似文献   

6.
7.
A curve fitting technique for optical spectra based on a robust estimator, least median squares (LMedS), is introduced in this study. For the effective calculation of LMedS, particle swarm optimization (PSO) is also introduced. Unlike a standard curve fitting method using least squares (LS) estimator, the method based on LMedS estimator is less influenced by outliers in experimental data. Two kinds of data sets, simulated data with outliers and temperature-dependent near-infrared (NIR) spectra of oleic acid (OA) are applied for the demonstration of the proposed method. The results clearly reveal that, compared with the LS estimator, the proposed method can effectively reduce undesirable effects of low SN ratio and can yield more accurate fitting results.  相似文献   

8.
Scanning electron microscopy (SEM) is widely used in surface studies and continuous efforts are carried out in the search of estimators of different surface characteristics. By using the variogram, we developed two of these estimators that were used to characterize the surface roughness from the SEM image texture. One of the estimators is related to the crossover between fractal region at low scale and the periodic region at high scale, whereas the other estimator characterizes the periodic region. In this work, a full study of these estimators and the fractal dimension in two dimensions (2D) and three dimensions (3D) was carried out for emery papers. We show that the obtained fractal dimension with only one image is good enough to characterize the roughness surface because its behavior is similar to those obtained with 3D height data. We show also that the estimator that indicates the crossover is related to the minimum cell size in 2D and to the average particle size in 3D. The other estimator has different values for the three studied emery papers in 2D but it does not have a clear meaning, and these values are similar for those studied samples in 3D. Nevertheless, it indicates the formation tendency of compound cells. The fractal dimension values from the variogram and from an area versus step log-log graph were studied with 3D data. Both methods yield different values corresponding to different information from the samples.  相似文献   

9.
The title kinetics reaction has been modeled with a system of ordinary differential equations, for the concentrations of the compounds. In these equations, the velocity constants are unknown. In this work, the four constants had been evaluated by minimizing a mean squares expression comparing the experimental measures of the concentration of hexacyanoferrate(III) with the solution of the system of ordinary differential equations. This problem has not a unique solution and there is an infinite set of constants which minimize the expression. Several sets of possible constants have been analyzed. One of them has been obtained estimating two of the constants with the stationary state approach. For the model to be well posed the constants must fulfill a condition. Information about the order of magnitude of the constants has been reached.  相似文献   

10.
This paper proposes a novel approach for the estimation of spectroscopic data by combining the predictions of an ensemble of estimators using the induced ordered weighted averaging (IOWA) fusion operators. For ensemble generation, we use Gaussian process regression (GPR) and extreme learning machine (ELM) estimators associated with different kernels. To render the model selection issue of ELM as efficiently as in the GPR Bayesian estimation method, we develop an automatic solution based on the powerful differential evolution (DE) algorithm. During the fusion process, the IOWA operator needs two things: (1) an order‐inducing value; and (2) a way to determine its weights. For the order‐inducing value, we propose to use the residual of each estimated output value. Because we cannot compute the true residual, we explore the idea of estimating the residuals themselves by associating to each estimator of the ensemble a second estimator of the same kind called a residual estimator. To learn the weights associated with these nonlinear operators, the proposed method relies on the concept of prioritized aggregation, where we generate the weights directly from the estimated residuals. Experimental results obtained on three real spectroscopic datasets confirm the interesting capabilities of the proposed IOWA fusion method. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

11.
In this paper, we deal with multivariate measurement error models for replicated data under heavy‐tailed distributions, providing appealing robust and adaptable alternatives to the usual Gaussian assumptions. The models contain both error‐prone covariates and predictors measured without errors. The surrogates of the response and the multiple error‐prone covariates are replicated and are allowed unpaired and/or unequal cases. Under the scale mixtures of normal distribution class, we provide an explicit iterative formula of the maximum likelihood estimation via an expectation‐maximization‐type algorithm. Closed forms of asymptotic variances of the estimators are also given. The effect and robustness performances are confirmed by the simulation studies. Two real data sets are analyzed by the proposed models. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

12.
This paper is the third part of the work on robust partial least squares (RPLS) regression. The paper focuses on implementation issues for outlier detection and diagnosis. Furthermore, the paper introduces a numerically more efficient algorithm for determining the Stahel–Donoho estimator (SDE). This has been identified as a potential drawback of the new proposed RPLS algorithm, detailed in Part II of this work. Finally, a total of three application studies are presented which involve data recorded from (i) a calibration experiment (similar number of variables/observations), (ii) a distillation process for purifying benzene (considerably more observations than variables) and (iii) an experiment of a multi‐component concentration determination using Raman spectroscopy (considerably more variables than observations). Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

13.
The estimator proposed recently by Delmas and Jourdain for waste-recycling Monte Carlo achieves variance reduction optimally with respect to a control variate that is evaluated directly using the simulation data. Here, the performance of this estimator is assessed numerically for free energy calculations in generic binary alloys and is compared to those of other estimators taken from the literature. A systematic investigation with varying simulation parameters of a simplified system, the anti-ferromagnetic Ising model, is first carried out in the transmutation ensemble using path-sampling. We observe numerically that (i) the variance of the Delmas-Jourdain estimator is indeed reduced compared to that of other estimators; and that (ii) the resulting reduction is close to the maximal possible one, despite the inaccuracy in the estimated control variate. More extensive path-sampling simulations involving an FeCr alloy system described by a many-body potential additionally show that (iii) gradual transmutations accommodate the atomic frustrations; thus, alleviating the numerical ergodicity issue present in numerous alloy systems and eventually enabling the determination of phase coexistence conditions.  相似文献   

14.
The quantum instanton approximation is a type of quantum transition-state theory that calculates the chemical reaction rate using the reactive flux correlation function and its low-order derivatives at time zero. Here we present several path-integral estimators for the latter quantities, which characterize the initial decay profile of the flux correlation function. As with the internal energy or heat-capacity calculation, different estimators yield different variances (and therefore different convergence properties) in a Monte Carlo calculation. Here we obtain a virial (-type) estimator by using a coordinate scaling procedure rather than integration by parts, which allows more computational benefits. We also consider two different methods for treating the flux operator, i.e., local-path and global-path approaches, in which the latter achieves a smaller variance at the cost of using second-order potential derivatives. Numerical tests are performed for a one-dimensional Eckart barrier and a model proton transfer reaction in a polar solvent, which illustrates the reduced variance of the virial estimator over the corresponding thermodynamic estimator.  相似文献   

15.
The performance of a number of robust estimators in the presence of distinct secondary subsets of data is assessed. Estimators examined include the kernel mode recommended by IUPAC, the MM-estimator described by Yohai and, for comparison, the mean, median, and Huber estimate. The performance of the estimators was compared by application to simulated data with one major and one minor mode, and with known minor mode location and proportion of data in the minor mode. The MM-estimator generally performed better than classical and Huber estimates and also provided better precision than the kernel mode at lower minor mode proportions (20% or less). At high minor mode proportion (30%), the kernel density mode provided smaller mean bias and better precision at modest minor mode offsets.  相似文献   

16.
17.
While estimation of measurement uncertainty (MU) is increasingly acknowledged as an essential component of the chemical measurement process, there is little agreement on how best to use even nominally well-estimated MU. There are philosophical and practical issues involved in defining what is “best” for a given data set; however, there is remarkably little guidance on how well different MU-using estimators perform with imperfect data. This report characterizes the bias, efficiency, and robustness properties for several commonly used or recently proposed estimators of true location, μ, using “Monte Carlo” (MC) evaluation of “measurement” data sets drawn from well-defined distributions. These synthetic models address a number of issues pertinent to interlaboratory comparisons studies. While the MC results do not provide specific guidance on “which estimator is best” for any given set of real data, they do provide broad insight into the expected relative performance within broadly defined scenarios. Perhaps the broadest and most emphatic guidance from the present study is that (1) well-estimated measurement uncertainties can be used to improve the reliability of location determination and (2) some approaches to using measurement uncertainties are better than others. The traditional inverse squared uncertainty-weighted estimators perform well only in the absence of unrepresentative values (value outliers) or underestimated uncertainties (uncertainty outliers); even modest contamination by such outliers may result in relatively inaccurate estimates. In contrast, some inverse total variance-weighted-estimators and probability density function area-based estimators perform well for all scenarios evaluated, including underestimated uncertainties, extreme value outliers, and asymmetric contamination.  相似文献   

18.
This paper introduces a robust algorithm to determine the interfacial tension (gamma) from pendant drop profiles using the Galerkin finite element method (gamma-PD-FEM) to solve the axisymmetric form of the Young-Laplace (YL) equation. In this algorithm, the theoretical profiles are generated by solving the spherical coordinate form of the YL equation. gamma-PD-FEM also solves for the parameter estimates by minimizing the difference between the theoretical and experimental surface functions, f(theta). This technique is compared to the widely used method of converting the YL equation to the three arc length-based (ALB) first-order ODEs developed by Bashforth and Adams (BA) in 1883, or as denoted in this paper, the gamma-PD-BA method. The drop apex is the initial condition for the gamma-PD-BA algorithm and the integration is terminated at a specified location along the drop profile. In contrast to techniques based on the BA approach, computation of the theoretical drop profile in gamma-PD-FEM is obtained from a second-order ordinary differential equation and requires boundary conditions at the drop apex and at the contact line of the drop to the nozzle. By incorporating both boundary conditions into the problem formulation, the algorithm can also determine if the drop shape is at static equilibrium. Results to be presented include an outline of the computer algorithm, and comparison of gamma values obtained from the gamma-PD-FEM and the traditional gamma-PD-BA method using simulated and experimental drop profile data sets.  相似文献   

19.
The nonequilibrium fluctuation theorems have paved the way for estimating equilibrium thermodynamic properties, such as free energy differences, using trajectories from driven nonequilibrium processes. While many statistical estimators may be derived from these identities, some are more efficient than others. It has recently been suggested that trajectories sampled using a particular time-dependent protocol for perturbing the Hamiltonian may be analyzed with another one. Choosing an analysis protocol based on the nonequilibrium density was empirically demonstrated to reduce the variance and bias of free energy estimates. Here, we present an alternate mathematical formalism for protocol postprocessing based on the Feynmac-Kac theorem. The estimator that results from this formalism is demonstrated on a few low-dimensional model systems. It is found to have reduced bias compared to both the standard form of Jarzynski's equality and the previous protocol postprocessing formalism.  相似文献   

20.
The combination of 3D pharmacophore fingerprints and the support vector machine classification algorithm has been used to generate robust models that are able to classify compounds as active or inactive in a number of G-protein-coupled receptor assays. The models have been tested against progressively more challenging validation sets where steps are taken to ensure that compounds in the validation set are chemically and structurally distinct from the training set. In the most challenging example, we simulate a lead-hopping experiment by excluding an entire class of compounds (defined by a core substructure) from the training set. The left-out active compounds comprised approximately 40% of the actives. The model trained on the remaining compounds is able to recall 75% of the actives from the "new" lead series while correctly classifying >99% of the 5000 inactives included in the validation set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号