共查询到19条相似文献,搜索用时 15 毫秒
1.
2.
In this paper, we proposed a wavelength selection method based on random decision particle swarm optimization with attractor for near‐infrared (NIR) spectra quantitative analysis. The proposed method was incorporated with partial least square (PLS) to construct a prediction model. The proposed method chooses the current own optimal or the current global optimal to calculate the attractor. Then the particle updates its flight velocity by the attractor, and the particle state is updated by the random decision with the new velocity. Moreover, the root‐mean‐square error of cross‐validation is adopted as the fitness function for the proposed method. In order to demonstrate the usefulness of the proposed method, PLS with all wavelengths, uninformative variable elimination by PLS, elastic net, genetic algorithm combined with PLS, the discrete particle swarm optimization combined with PLS, the modified particle swarm optimization combined with PLS, the neighboring particle swarm optimization combined with PLS, and the proposed method are used for building the components quantitative analysis models of NIR spectral datasets, and the effectiveness of these models is compared. Two application studies are presented, which involve NIR data obtained from an experiment of meat content determination using NIR and a combustion procedure. Results verify that the proposed method has higher predictive ability for NIR spectral data and the number of selected wavelengths is less. The proposed method has faster convergence speed and could overcome the premature convergence problem. Furthermore, although improving the prediction precision may sacrifice the model complexity under a certain extent, the proposed method is overfitted slightly. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
3.
4.
Yi‐Fei Liu Shan Xu Hong Gong Yan‐Fang Cui Dan‐Dan Song Yan‐Ping Zhou 《Journal of Chemometrics》2015,29(10):537-546
The complexity of metabolic profiles makes chemometric tools indispensable for extracting the most significant information. Partial least‐squares discriminant analysis (PLS‐DA) acts as one of the most effective strategies for data analysis in metabonomics. However, its actual efficacy in metabonomics is often weakened by the high similarity of metabolic profiles, which contain excessive variables. To rectify this situation, particle swarm optimization (PSO) was introduced to improve PLS‐DA by simultaneously selecting the optimal sample and variable subsets, the appropriate variable weights, and the best number of latent variables (SVWL) in PLS‐DA, forming a new algorithm named PSO‐SVWL‐PLSDA. Combined with 1H nuclear magnetic resonance‐based metabonomics, PSO‐SVWL‐PLSDA was applied to recognize the patients with lung cancer from the healthy controls. PLS‐DA was also investigated as a comparison. Relatively to the recognition rates of 86% and 65%, which were yielded by PLS‐DA, respectively, for the training and test sets, those of 98.3% and 90% were offered by PSO‐SVWL‐PLSDA. Moreover, several most discriminative metabolites were identified by PSO‐SVWL‐PLSDA to aid the diagnosis of lung cancer, including lactate, glucose (α‐glucose and β‐glucose), threonine, valine, taurine, trimethylamine, glutamine, glycoprotein, proline, and lipid. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
5.
6.
In the analysis of gene expression profiles, the number of tissue samples with genes expression levels available is usually small compared with the number of genes. This can lead either to possible overfitting or even to a complete failure in analysis of microarray data. The selection of genes that are really indicative of the tissue classification concerned is becoming one of the key steps in microarray studies. In the present paper, we have combined the modified discrete particle swarm optimization (PSO) and support vector machines (SVM) for tumor classification. The modified discrete PSO is applied to select genes, while SVM is used as the classifier or the evaluator. The proposed approach is used to the microarray data of 22 normal and 40 colon tumor tissues and showed good prediction performance. It has been demonstrated that the modified PSO is a useful tool for gene selection and mining high dimension data. 相似文献
7.
8.
Santiago A. Bortolato Juan A. Arancibia Graciela M. Escandar Alejandro C. Olivieri 《Journal of Chemometrics》2007,21(12):557-566
The combination of unfolded partial least‐squares (U‐PLS) with residual bilinearization (RBL) provides a second‐order multivariate calibration method capable of achieving the second‐order advantage. RBL is performed by varying the test sample scores in order to minimize the residues of a combined U‐PLS model for the calibrated components and a principal component model for the potential interferents. The sample scores are then employed to predict the analyte concentration, with regression coefficients taken from the calibration step. When the contribution of multiple potential interferents is severe, particle swarm optimization (PSO) helps in preventing RBL to be trapped by false minima, restoring its predictive ability and making it comparable to the standard parallel factor (PARAFAC) analysis. Both simulated and experimental systems are analyzed in order to show the potentiality of the new technique. Copyright © 2007 John Wiley & Sons, Ltd. 相似文献
9.
Alexey N. Skvortsov 《Journal of Chemometrics》2014,28(10):727-739
Rotation ambiguity (RA) in multivariate curve resolution (MCR) is an undesirable case, when the physicochemical constraints are not sufficiently strong to provide a unique resolution of the data matrix of the mixtures into spectra and concentration profiles of individual chemical components. RA is often met in MCR of overlapped chromatographic peaks, kinetic and equilibrium data, and fluorescence two‐dimensional spectra. In case of RA, a single candidate solution has little practical value. So, the whole set of feasible solutions should be characterized somehow. It is a quite intricate task in a general case. In the present paper, a method was proposed to estimate RA with charged particle swarm optimization (cPSO), a population‐based algorithm. The criteria for updating the particles were modified, so that the swarm converged to the steady state, which spanned the set of feasible solutions. The performance of cPSO‐MCR was demonstrated on test functions, simulated datasets, and real‐world data. Good accordance of the cPSO‐MCR results with the analytical solutions (Borgen plots) was observed. cPSO‐MCR was also shown to be capable of estimating the strength of the constraints and of revealing RA in noisy data. As compared with analytical methods, cPSO‐MCR is simpler to implement, expands to more than three chemical compounds, is immune to noise, and can be easily adapted to virtually all types of constraints and objective functions (constraint based or residue based). cPSO‐MCR also provides natural visual information about the level of RA in spectra and concentration profiles, similar to the methods of two extreme solutions (e.g., MCR‐BANDS). Copyright © 2014 John Wiley & Sons, Ltd. 相似文献
10.
Shen Q Jiang JH Jiao CX Lin WQ Shen GL Yu RQ 《Journal of computational chemistry》2004,25(14):1726-1735
The multilayer feed-forward ANN is an important modeling technique used in QSAR studying. The training of ANN is usually carried out only to optimize the weights of the neural network and without paying attention to the network topology. Some other strategies used to train ANN are, first, to discover an optimum structure of the network, and then to find weights for an already defined structure. These methods tend to converge to local optima, and may also lead to overfitting. In this article, a hybridized particle swarm optimization (PSO) approach was applied to the neural network structure training (HPSONN). The continuous version of PSO was used for the weight training of ANN, and the modified discrete PSO was applied to find appropriate the network architecture. The network structure and connectivity are trained simultaneously. The two versions of PSO can jointly search the global optimal ANN architecture and weights. A new objective function is formulated to determine the appropriate network architecture and optimum value of the weights. The proposed HPSONN algorithm was used to predict carcinogenic potency of aromatic amines and biological activity of a series of distamycin and distamycin-like derivatives. The results were compared to those obtained by PSO and GA training in which the network architecture was kept fixed. The comparison demonstrated that the HPSONN is a useful tool for training ANN, which converges quickly towards the optimal position, and can avoid overfitting in some extent. 相似文献
11.
Gene expression data are characterized by thousands even tens of thousands of measured genes on only a few tissue samples. This can lead either to possible overfitting and dimensional curse or even to a complete failure in analysis of microarray data. Gene selection is an important component for gene expression-based tumor classification systems. In this paper, we develop a hybrid particle swarm optimization (PSO) and tabu search (HPSOTS) approach for gene selection for tumor classification. The incorporation of tabu search (TS) as a local improvement procedure enables the algorithm HPSOTS to overleap local optima and show satisfactory performance. The proposed approach is applied to three different microarray data sets. Moreover, we compare the performance of HPSOTS on these datasets to that of stepwise selection, the pure TS and PSO algorithm. It has been demonstrated that the HPSOTS is a useful tool for gene selection and mining high dimension data. 相似文献
12.
Using particle swarms for the development of QSAR models based on K-nearest neighbor and kernel regression 总被引:1,自引:0,他引:1
We describe the application of particle swarms for the development of quantitative structure-activity relationship (QSAR) models based on k-nearest neighbor and kernel regression. Particle swarms is a population-based stochastic search method based on the principles of social interaction. Each individual explores the feature space guided by its previous success and that of its neighbors. Success is measured using leave-one-out (LOO) cross validation on the resulting model as determined by k-nearest neighbor kernel regression. The technique is shown to compare favorably to simulated annealing using three classical data sets from the QSAR literature. 相似文献
13.
14.
以普通玉米籽粒为试验材料,在应用遗传算法结合偏最小二乘回归法对近红外光谱数据进行特征波长选择的基础上,应用偏最小二乘回归法建立了特征波长测定玉米籽粒中淀粉含量的校正模型.试验结果表明,基于11个特征波长所建立的校正模型,其校正误差(RMSEC)、交叉检验误差(RMSECV)和预测误差(RMSEP)分别为0.30%、0.35%和0.27%,校正数据集和独立的检验数据集的预测值与实际测定值之间的相关系数分别达到0.9279和0.9390,与全光谱数据所建立的预测模型相比,在预测精度上均有所改善,表明应用遗传算法和PLS进行光谱特征选择,能获得更简单和更好的模型,为玉米籽粒中淀粉含量的近红外测定和红外光谱数据的处理提供了新的方法与途径. 相似文献
15.
Bai-Chuan Deng Yong-Huan Yun Dong-Sheng Cao Yu-Long Yin Wei-Ting Wang Hong-Mei Lu Qian-Yi Luo Yi-Zeng Liang 《Analytica chimica acta》2016
In this study, a new variable selection method called bootstrapping soft shrinkage (BOSS) method is developed. It is derived from the idea of weighted bootstrap sampling (WBS) and model population analysis (MPA). The weights of variables are determined based on the absolute values of regression coefficients. WBS is applied according to the weights to generate sub-models and MPA is used to analyze the sub-models to update weights for variables. The optimization procedure follows the rule of soft shrinkage, in which less important variables are not eliminated directly but are assigned smaller weights. The algorithm runs iteratively and terminates until the number of variables reaches one. The optimal variable set with the lowest root mean squared error of cross-validation (RMSECV) is selected. The method was tested on three groups of near infrared (NIR) spectroscopic datasets, i.e. corn datasets, diesel fuels datasets and soy datasets. Three high performing variable selection methods, i.e. Monte Carlo uninformative variable elimination (MCUVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm partial least squares (GA-PLS) are used for comparison. The results show that BOSS is promising with improved prediction performance. The Matlab codes for implementing BOSS are freely available on the website: http://www.mathworks.com/matlabcentral/fileexchange/52770-boss. 相似文献
16.
ZhiLiang Li FeiFei Tian ShiRong Wu ShanBin Yang ShengXi Yang Yuan Zhou QiaoXia Zhang RenHui Qin Hu Mei Gang Chen GenRong Li 《中国科学B辑(英文版)》2008,51(5):487-496
Both the concept and the model of snug quantitative structure-activity relationship (QSAR) were pro-posed and developed for molecular design through constructing QSAR based on some known mode of receptor/ligand interactions. Many disadvantages of traditional models can be avoided by using the proposed method because the traditional models only determined upon molecular structural features in sample sets themselves. A genetic virtual screening of peptide/protein combinations (GVSPPC) is proposed for the first time by utilizing this idea to examine peptide/protein affinity activities. A genetic algorithm (GA) was developed for screening combinative targets with an interaction mode for virtual receptors. GVSPPC succeeds in disposing difficulties in rational QSAR,in order to search for the ligand/receptor interactions on conditions of unknown structures. Some bioactive oligo-/poly-peptide systems covering 58 angiotensin converting enzyme (ACE) inhibitors and 18 double site mutation residues in camel antibody protein cAb-Lys3 were investigated by GVSPPC with satisfactory results (R 2 cu>0.91,Q 2 cv > 0.86,ERMS=0.19-0.95),respectively,which demonstrates that GVSPPC is more inter-pretable in the ligand-receptor interaction than the traditional QSAR method. 相似文献
17.
Lizhi Cui Zhihao Ling Josiah Poon Simon K. Poon Hao Chen Junbin Gao Paul Kwan Kei Fan 《Journal of Chemometrics》2015,29(3):146-153
In order to separate a high‐performance liquid chromatography with diode array detector (HPLC‐DAD) data set to chromatogram peaks and spectra for all compounds, a separation method based on the model of generalized Gaussian reference curve measurement (GGRCM) and the algorithm of multi‐target intermittent particle swarm optimization (MIPSO) is proposed in this paper. A parameter θ is constructed to generate a reference curve r(θ) for a chromatogram peak based on its physical principle. The GGRCM model is proposed to calculate the fitness ε(θ) for every θ, which indicates the possibility for the HPLC‐DAD data set to contain a chromatogram peak similar to the r(θ). The smaller the fitness is, the higher the possibility. The algorithm of MIPSO is then introduced to calculate the optimal parameters by minimizing the fitness mentioned earlier. Finally, chromatogram peaks are constructed based on these optimal parameters, and the spectra are calculated by an estimator. Through the simulations and experiments, the following conclusions are drawn: (i) the GGRCM‐MIPSO method can extract chromatogram peaks from simulation data set without knowing the number of the compounds in advance even when a severe overlap and white noise exist and (ii) the GGRCM‐MIPSO method can be applied to the real HPLC‐DAD data set. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献
18.
Particle swarm optimization is a novel evolutionary stochastic global optimization method that has gained popularity in the chemical engineering community. This optimization strategy has been successfully used for several applications including thermodynamic calculations. To the best of our knowledge, the performance of PSO in phase stability and equilibrium calculations for both multicomponent reactive and non-reactive mixtures has not yet been reported. This study introduces the application of particle swarm optimization and several of its variants for solving phase stability and equilibrium problems in multicomponent systems with or without chemical equilibrium. The reliability and efficiency of a number of particle swarm optimization algorithms are tested and compared using multicomponent systems with vapor–liquid and liquid–liquid equilibrium. Our results indicate that the classical particle swarm optimization with constant cognitive and social parameters is a reliable method and offers the best performance for global minimization of the tangent plane distance function and the Gibbs energy function in both reactive and non-reactive systems. 相似文献
19.
Pure component selectivity analysis (PCSA) was successfully utilized to enhance the robustness of a partial least squares (PLS) model by examining the selectivity of a given component to other components. The samples used in this study were composed of NH4OH, H2O2 and H2O, a popular etchant solution in the electronic industry. Corresponding near-infrared (NIR) spectra (9000-7500 cm−1) were used to build PLS models. The selective determination of H2O2 without influences from NH4OH and H2O was a key issue since its molecular structure is similar to that of H2O and NH4OH also has a hydroxyl functional group. The best spectral ranges for the determination of NH4OH and H2O2 were found with the use of moving window PLS (MW-PLS) and corresponding selectivity was examined by pure component selectivity analysis. The PLS calibration for NH4OH was free from interferences from the other components due to the presence of its unique NH absorption bands. Since the spectral variation from H2O2 was broadly overlapping and much less distinct than that from NH4OH, the selectivity and prediction performance for the H2O2 calibration were sensitively varied depending on the spectral ranges and number of factors used. PCSA, based on the comparison between regression vectors from PLS and the net analyte signal (NAS), was an effective method to prevent over-fitting of the H2O2 calibration. A robust H2O2 calibration model with minimal interferences from other components was developed. PCSA should be included as a standard method in PLS calibrations where prediction error only is the usual measure of performance. 相似文献