This article discusses problems of validating classification models especially in datasets where sample sizes are small and the number of variables is large. It describes the use of percentage correctly classified (%CC) as an indicator for success of a classification model. For small datasets, %CC should not be used uncritically and its interpretation depends on sample size. It illustrates the use of a common classification method, discriminant partial least squares (D-PLS) on a randomly generated dataset of 200 samples and 200 variables.
An aim of the classifier is to determine whether the null hypothesis (there is no distinction between two classes) can be rejected. Autoprediction gives an 84.5% CC. It is shown that, if there is variable selection, it must be performed independently on the training set to obtain a CC close to 50% on the test set; otherwise, over-optimistic and false conclusions can be reached about the ability to classify samples into groups.
Finally, two aims of determining the quality of a model are frequently confused, namely optimisation (often used to determine the most appropriate number of components in a model) and independent validation; to overcome this, the data should be split into three groups.
There are often difficulties with model building if validation and optimisation have been done on different groups of samples, especially using iterative methods, each group being modelled using properties, such as a different number of components or different variables. 相似文献
One of the major techniques used for the method development of ternary and quaternary high performance liquid chromatography (HPLC) systems has been to use mixture designs, often referred to as "Glajch's Triangle". This technique does not allow for the systematic and simultaneous optimization of other factors such as gradient time, pH and temperature that affect the quality of separations. An alternative approach is to use experimental designs. The condition, however, that the composition of all components of the mobile phase must total 100% presents a problem when trying to mathematically represent ranges of each mobile phase constituent of a ternary or quaternary system. A method is described here, based on spherical coordinate representations, that adheres to the constraints of the mobile phase composition and allows experimental designs, such as central composite and factorial designs, to be applied to the simultaneous optimization of the mobile phase composition. Other factors, in particular temperature and gradient time, can then be included in the design. As a result of applying these designs to the HPLC separation of phenols and corticosteroids, it was found necessary to include three-way interactions between experimental factors in the model. The significance of these interactions shows that they need to be considered in HPLC method development. 相似文献
A novel approach is proposed for the simultaneous optimization of mobile phase pH and gradient steepness in RP‐HPLC using artificial neural networks. By presetting the initial and final concentration of the organic solvent, a limited number of experiments with different gradient time and pH value of mobile phase are arranged in the two‐dimensional space of mobile phase parameters. The retention behavior of each solute is modeled using an individual artificial neural network. An “early stopping” strategy is adopted to ensure the predicting capability of neural networks. The trained neural networks can be used to predict the retention time of solutes under arbitrary mobile phase conditions in the optimization region. Finally, the optimal separation conditions can be found according to a global resolution function. The effectiveness of this method is validated by optimization of separation conditions for amino acids derivatised by a new fluorescent reagent. 相似文献
A detailed model for nonisothermal sorption of multicomponent mixtures in a single sorbent particle (monodisperse or bidisperse with negligible intracrystalline mass transport limitations) under pressure swing conditions is developed in this study. The dusty-gas model is used to describe the coupling of the molar fluxes, the temperature, the partial pressures and the partial pressure gradients of the components in the pore space of the particle. The variations of the temperature are described by an energy equation in which both convective and conductive modes of heat transport are accounted for. No limitations are imposed on the number of the components in the mixture and on the type of the adsorption isotherm. The model is applied in the investigation of the industrially important air-zeolite 5A system. Two cases with respect to the surrounding gas phase are examined: infinite environment, which is representative for single particle experiments, and finite environment, which is representative for the situation in packed bed adsorbers. It is found that in an infinite environment the external and internal temperature gradients are equally important while in a finite environment the external heat transport limitations are negligible. It is concluded that in modeling the nonisothermal operation of adsorption processes occurring in packed beds it is not necessary to allow for the temperature differences between the gas phase and the surface of the adsorbing particles. Furthermore, if the temperature gradients within the particles can be neglected, only a single temperature equation is needed to describe the energy transport in the bed. 相似文献
This study aims to clarify the effects of carbon activation type and physical form on the extent of adsorption capacity and
desorption capacity of a bi-solute mixture of phenol and 2-chlorophenol (2-CP). For this purpose, two different PACs; thermally
activated Norit SA4 and chemically activated Norit CA1, and their granular countertypes with similar physical characteristics,
thermally activated Norit PKDA and chemically activated Norit CAgran, were used. The thermally activated carbons were better
adsorbers for phenol and 2-CP compared with chemically activated carbons, but adsorption was more reversible in the latter
case. 2-CP was adsorbed preferentially by each type of activated carbon, but adsorption of phenol was strongly suppressed
in the presence of 2-CP. The simplified ideal adsorbed solution (SIAS) model underestimated the 2-CP loadings and overestimated
the phenol loadings. However, the improved and modified forms of the SIAS model could better predict the competitive adsorption.
The type of carbon activation was decisive in the application of these models. For each activated carbon type, phenol was
desorbed more readily in the bi-solute case, but desorption of 2-CP was less compared with single-solute. This was attributed
to higher energies of 2-CP adsorption. 相似文献
Summary The standard method ofPitzer for predicting the solubility isotherms of systems in which solid phases with a constant composition crystallize is applied to cases when mixed crystals are formed. The four-component carnallite type systems RbCl-CsCl-MgCl2-H2O, RbCl-KCl-MgCl2-H2O, and RbCl-RbBr-MgCl2-MgBr2-H2O and the corresponding subsystems are thermodynamically simulated at 25°C. It is established that the solubility diagrams consist of crystallization regions of the simple saltsMX,MX, MgX2·6H2O, and MgX2·6H2O and of the corresponding carnallite type double salts with the composition 1:1:6. A method of calculation of the integralGibbs energy of mixingGmix(s) of crystals formed in water-salt systems has been proposed. The results on the systems RbCl-KCl-H2O, RbCl-RbBr-H2O, and MgCl2-MgBr2-H2O are compared with experimental data from the literature and with values calculated using various models.
Thermodynamische Simulation von Vierkomponentensystemen des Carnallit-Types
Zusammenfassung DiePitzer-Methode zur Voraussage der Löslichkeitsisothermen in Mehrstoffsystemen, in welchen feste Phasen mit konstanter Zusammensetzung auskristallisieren, wurde auch für Fälle angewendet, bei denen sich Mischkristalle bilden. Die Vierstoffsysteme RbCl-CsCl-MgCl2-H2O, RbCl-KCl-MgCl2-H2O und RbCl-RbBr-MgCl2-MgBr2-H2O, aus welchen Carnallit-Typ-Mischkristalle auskristallisieren, und die dazugehörigen Dreistoff-Randsysteme wurden bei 25°C simuliert. Man stellt fest, daß die Löslichkeitsdiagramme sowohl Kristallisationsbereiche der einfachen SalzeMX,M'X, MgX2·6H2O und MgX2·6H2O als auch der entsprechenden carnallitartigen Doppelsalze mit der Zuzammensetzung 1:1:6 umfassen. Eine Methode zur Berechnung derGibbs-EnergieGmix(s) für die in Wasser-Salz-Systemen gebildeten Mischkristalle wird vorgeschlagen. Die für die Systeme RbCl-KCl-H2O, RbCl-RbBr-H2O und MgCl2-MgBr2-H2O erhaltenen Ergebnisse werden mit experimentellen Literaturdaten und Resultaten von Berechnungen aufgrund verschiedener Modelle verglichen.