This article discusses problems of validating classification models especially in datasets where sample sizes are small and the number of variables is large. It describes the use of percentage correctly classified (%CC) as an indicator for success of a classification model. For small datasets, %CC should not be used uncritically and its interpretation depends on sample size. It illustrates the use of a common classification method, discriminant partial least squares (D-PLS) on a randomly generated dataset of 200 samples and 200 variables.
An aim of the classifier is to determine whether the null hypothesis (there is no distinction between two classes) can be rejected. Autoprediction gives an 84.5% CC. It is shown that, if there is variable selection, it must be performed independently on the training set to obtain a CC close to 50% on the test set; otherwise, over-optimistic and false conclusions can be reached about the ability to classify samples into groups.
Finally, two aims of determining the quality of a model are frequently confused, namely optimisation (often used to determine the most appropriate number of components in a model) and independent validation; to overcome this, the data should be split into three groups.
There are often difficulties with model building if validation and optimisation have been done on different groups of samples, especially using iterative methods, each group being modelled using properties, such as a different number of components or different variables. 相似文献
The
thermal effect accompanying the transition of Cu2–xSe
into a superionic conduction state was studied by non-isothermal measurements,
at different heating and cooling rates (β=1, 2.5, 5, 10 and 20°C
min–1). During heating the peak temperature
(Tp) remains almost
stable for all values of β, (136.8±0.4°C for Cu2Se
and 133.0±0.3°C for Cu1.99Se). A gradual
shift of the initiation of the transformation towards lower temperatures is
observed, as the heating rate increases. During cooling there is a significant
shift in the position of the peak maximum (Tp)
towards lower temperatures with the increase of the cooling rate. A small
hysteresis is observed, which increases with the increase of the cooling rate, β.
The mean value of transformation enthalpy was found to be 30.3±0.8
J g–1 for Cu2Se and
28.9±0.9 J g–1 for Cu1.99Se.
The transformation can be described kinetically by the model f(ǯ)=(1–ǯ)n(1+kcatX), with activation energy E=175 kJ mol–1,
exponent value n equal to 0.2, logA=20 and log(kcat)=
0.5. 相似文献
Reaction of the transient phosphinidene complexes R-P=W(CO)5 with N-substituted-diphenylketenimines leads unexpectedly to the novel 2-aminophosphindoles, as confirmed by an X-ray crystal structure determined for one of the derivatives. Experimental evidence for a methylene-azaphosphirane intermediate was found by using the iron-complexed phosphinidene iPr2N-P=Fe(CO)4, which affords the 2-aminophosphindole together with the novel methylene-2,3-dihydro-1H-benzo[1,3]azaphosphole. Analysis of the reaction pathways with DFT indicates that the initially formed methylene-azaphosphirane yields both phosphorus heterocycles by way of a [1,5]- or [1,3]-sigmatropic shift, respectively, followed by a H-shift. Strain underlies both rearrangements, which causes these remarkably selective conversions that can be tuned by changing the substituents. 相似文献
We present a unified approach for linear and nonlinear sensitivity analysis for models of reaction kinetics that are stated in terms of systems of ordinary differential equations (ODEs). The approach is based on the reformulation of the ODE problem as a density transport problem described by a Fokker-Planck equation. The resulting multidimensional partial differential equation is herein solved by extending the TRAIL algorithm originally introduced by Horenko and Weiser in the context of molecular dynamics (J. Comp. Chem. 2003, 24, 1921) and discussed it in comparison with Monte Carlo techniques. The extended TRAIL approach is fully adaptive and easily allows to study the influence of nonlinear dynamical effects. We illustrate the scheme in application to an enzyme-substrate model problem for sensitivity analysis w.r.t. to initial concentrations and parameter values. 相似文献
The acid-catalyzed models on reaction mechanisms of pinacol rearrangement of propylene glycol conversion to propanal and propanone have been investigated using the density functional method at 298.15 K. Thermodynamic quantities of activation steps of four water-addition models were obtained. The number of added water interacting with the transition states of three concerted pathways has obviously affected the product ratio. The relative energetic profiles of the conversion reactions of all solvation models have been comparatively displayed. Estimation of the percent ratio of product composition computed from activation free energies of each acid-catalyzed reaction model was carried out. The percent ratios of propanal and propanone were decreased as the number of added water increased. 相似文献