首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Li S  Yao X  Liu H  Li J  Fan B 《Analytica chimica acta》2007,584(1):37-42
T-lymphocyte (T-cell) is a very important component in human immune system. It possesses a receptor (TCR) that is specific for the foreign epitopes which are in a form of short peptides bound to the major histocompatibility complex (MHC). When T-cell receives the message about the peptides bound to MHC, it makes the immune system active and results in the disposal of the immunogen. The antigenic determinants recognized and bound by the T-cell receptor is known as T-cell epitope. The accurate prediction of T-cell epitopes is crucial for vaccine development and clinical immunology. For the first time we developed new models using least squares support vector machine (LSSVM) and amino acid properties for T-cell epitopes prediction. A dataset including 203 short peptides (167 non-epitopes and 36 epitopes) was used as the input dataset and it was randomly divided into a training set and a test set. The models based on LSSVM and amino acid properties were evaluated using leave-one-out cross-validation method and the predictive ability of the test set, and obtained the results of 0.9875 and 0.9734 under the ROC curves, respectively. This result is more satisfactory than that were reported before. Especially, the accuracy of true positive gets a marked enhancement.  相似文献   

2.
3.
This work describes multi-classification based on binary probabilistic discriminant partial least squares (p-DPLS) models, developed with the strategy one-against-one and the principle of winner-takes-all. The multi-classification problem is split into binary classification problems with p-DPLS models. The results of these models are combined to obtain the final classification result. The classification criterion uses the specific characteristics of an object (position in the multivariate space and prediction uncertainty) to estimate the reliability of the classification, so that the object is assigned to the class with the highest reliability. This new methodology is tested with the well-known Iris data set and a data set of Italian olive oils. When compared with CART and SIMCA, the proposed method has better average performance of classification, besides giving a statistic that evaluates the reliability of classification. For the olive oil set the average percentage of correct classification for the training set was close to 84% with p-DPLS against 75% with CART and 100% with SIMCA, while for the test set the average was close to 94% with p-DPLS as against 50% with CART and 62% with SIMCA.  相似文献   

4.
Yankun Li 《Talanta》2007,72(1):217-222
Consensus modeling of combining the results of multiple independent models to produce a single prediction avoids the instability of single model. Based on the principle of consensus modeling, a consensus least squares support vector regression (LS-SVR) method for calibrating the near-infrared (NIR) spectra was proposed. In the proposed approach, NIR spectra of plant samples were firstly preprocessed using discrete wavelet transform (DWT) for filtering the spectral background and noise, then, consensus LS-SVR technique was used for building the calibration model. With an optimization of the parameters involved in the modeling, a satisfied model was achieved for predicting the content of reducing sugar in plant samples. The predicted results show that consensus LS-SVR model is more robust and reliable than the conventional partial least squares (PLS) and LS-SVR methods.  相似文献   

5.
With an increasing number of publicly available microarray datasets, it becomes attractive to borrow information from other relevant studies to have more reliable and powerful analysis of a given dataset. We do not assume that subjects in the current study and other relevant studies are drawn from the same population as assumed by meta-analysis. In particular, the set of parameters in the current study may be different from that of the other studies. We consider sample classification based on gene expression profiles in this context. We propose two new methods, a weighted partial least squares (WPLS) method and a weighted penalized partial least squares (WPPLS) method, to build a classifier by a combined use of multiple datasets. The methods can weight the individual datasets depending on their relevance to the current study. A more standard approach is first to build a classifier using each of the individual datasets, then to combine the outputs of the multiple classifiers using a weighted voting. Using two quite different datasets on human heart failure, we show first that WPLS/WPPLS, by borrowing information from the other dataset, can improve the performance of PLS/PPLS built on only a single dataset. Second, WPLS/WPPLS performs better than the standard approach of combining multiple classifiers. Third, WPPLS can improve over WPLS, just as PPLS does over PLS for a single dataset.  相似文献   

6.
Two alternative partial least squares (PLS) methods, averaged PLS and weighted average PLS, are proposed and compared with the classical PLS in terms of root mean square error of prediction (RMSEP) for three real data sets. These methods compute the (weighted) average of PLS models with different complexity. The prediction abilities of the alternative methods are comparable to that of the classical PLS but they do not require to determine how many components should be included in the model. They are also more robust in the sense that the quality of prediction depends less on a good choice of the number of components to be included. In addition, weighted average PLS is also compared with the weighted average part of LOCAL, a published method that also applies weighted average PLS, with however an entirely different weighting scheme.  相似文献   

7.
The work summarized in this paper presents the first part of a three‐paper series on robust partial least squares (RPLS) regression. Motivated by recent research activities in this area, this part provides a detailed algorithmic analysis of associated techniques, showing that existing work (i) may not represent a true robust formulation of partial least squares (PLS), (ii) may lead to convergence problems or (iii) may be insensitive to a certain type of outlier. On the basis of this analysis, Part I introduces a new conceptual RPLS algorithm that overcomes the deficiencies of existing work. The second part of this work details this new RPLS technique, compares its peformance with existing RPLS methods and provides an analysis on the computational efficiency and sensitivity of these algorithms. Whilst the first two parts of this work discuss algorithmic developments of RPLS, the final part concentrates on practical issues of RPLS implementations. This third part is devoted to practitioners of chemistry and chemical engineering covering a wide range of applications involving a calibration experiment, the analysis of recorded data from an industrial debutanizer process and data from a number of Raman spectroscopy experiments. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

8.
Two novel algorithms which employ the idea of stacked generalization or stacked regression, stacked partial least squares (SPLS) and stacked moving‐window partial least squares (SMWPLS) are reported in the present paper. The new algorithms establish parallel, conventional PLS models based on all intervals of a set of spectra to take advantage of the information from the whole spectrum by incorporating parallel models in a way to emphasize intervals highly related to the target property. It is theoretically and experimentally illustrated that the predictive ability of these two stacked methods combining all subsets or intervals of the whole spectrum is never poorer than that of a PLS model based only on the best interval. These two stacking algorithms generate more parsimonious regression models with better predictive power than conventional PLS, and perform best when the spectral information is neither isolated to a single, small region, nor spread uniformly over the response. A simulation data set is employed in this work not only to demonstrate this improvement, but also to demonstrate that stacked regressions have the potential capability of predicting property information from an outlier spectrum in the prediction set. Moisture, oil, protein and starch in Cargill corn samples have been successfully predicted by these new algorithms, as well as hydroxyl number for different instruments of terpolymer samples including and excluding an outlier spectrum. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

9.
燃料电池的机理模型及控制建模的研究   总被引:1,自引:0,他引:1  
根据直接甲醇燃料电池(DNIFC)的组成结构、工作原理,并运用电化学,流体动力学、热力学等学科理论,建立了DNIFC电池性能数学模型,并结合DNIFC实验数据进行仿真,结果表明这种数学建模是合理和有效的。由于数学模型的复杂性难以满足工程上对PNIFC控制系统的设计特别是实时控制需要的情况,本文提出一种基于最小二乘支持向量机建模算法,用具有RBF核函数的LS-SVM离线建立DNIFC电堆的非线性模型;仿真和实验结果表明了该建模方法具有建模简单、模型精度高等优点,亦证明了该算法的有效性和优越性。研究结果对直接甲醇燃料电池控制系统的建模和控制具有一定的实用价值。  相似文献   

10.
In this paper, based on asymmetric least squares smoothing, a new algorithm for multiple spectra baseline correction is proposed. By means of the similarity among the multiple spectra, the algorithm estimates the baselines by penalizing the differences in the baseline corrected signals, which makes the algorithm possible to eliminate scatter effects on the spectra. In addition, a relaxation factor which measures the similarity of the baseline corrected spectra is incorporated into the optimization model and an alternate iteration strategy is used to solve the optimization problem. The proposed algorithm is fast and can output multiple baselines simultaneously. Experimental results on both simulated data and real data demonstrate the effectiveness and efficiency of the algorithm.  相似文献   

11.
12.
The present study demonstrated the possibility of utilizing the ytterbium (Yb)‐based internal standard near‐infrared (NIR) spectroscopic measurement technique coupled with multivariate calibration for quantitative analysis of tea, including total free amino acids and total polyphenols in tea. Yb is a rare earth element aimed to compensate for the spectral variation induced by the alteration of sample quantity during the spectral measurement of the powdered samples. Boosting was invoked to be combined with least‐squares support vector regression (LS‐SVR), forming boosting least‐squares support vector regression (BLS‐SVR) for the multivariate calibration task. The results showed that the tea quality could be accurately and rapidly determined via the Yb‐based internal standard NIR spectroscopy combined with BLS‐SVR method. Moreover, the introduction of boosting drastically enhanced the performance of individual LS‐SVR, and BLS‐SVR compared favorably with partial least‐squares regression. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

13.
Using a series of thirteen organic materials that includes novel high-nitrogen energetic materials, conventional organic military explosives, and benign organic materials, we have demonstrated the importance of variable selection for maximizing residue discrimination with partial least squares discriminant analysis (PLS-DA). We built several PLS-DA models using different variable sets based on laser induced breakdown spectroscopy (LIBS) spectra of the organic residues on an aluminum substrate under an argon atmosphere. The model classification results for each sample are presented and the influence of the variables on these results is discussed. We found that using the whole spectra as the data input for the PLS-DA model gave the best results. However, variables due to the surrounding atmosphere and the substrate contribute to discrimination when the whole spectra are used, indicating this may not be the most robust model. Further iterative testing with additional validation data sets is necessary to determine the most robust model.  相似文献   

14.
Changeable size moving window partial least squares (CSMWPLS) and searching combination moving window partial least squares (SCMWPLS) are proposed to search for an optimized spectral interval and an optimized combination of spectral regions from informative regions obtained by a previously proposed spectral interval selection method, moving window partial least squares (MWPLSR) [Anal. Chem. 74 (2002) 3555]. The utilization of informative regions aims to construct better PLS models than those based on the whole spectral points. The purpose of CSMWPLS and SCMWPLS is to optimize the informative regions and their combination to further improve the prediction ability of the PLS models. The results of their application to an open-path (OP)/FT-IR spectra data set show that the proposed methods, especially SCMWPLS can find out an optimized combination, with which one can improve, often significantly, the performance of the corresponding PLS model, in terms of low prediction error, root mean square error of prediction (RMSEP) with the reasonable latent variable (LVs) number, comparing with the results obtained using whole spectra or direct combination of informative regions for a compound. Regions consisting of the combinations obtained can easily be explained by the existence of IR absorption bands in those spectral regions.  相似文献   

15.
In this study we compared the use of ordinary least squares and weighted least squares in the calibration of the method for analyzing essential and toxic metals present in human milk by ICP-OES, in order to avoid systematic errors in the measurements used. Human milk samples were provided by maternity clinic Odete Valadares and digested by means of a high-performance microwave (MW) oven. Evaluation of plasma short and long-term stability was made using a solution of digested milk (1:50) with 2.0 mg L−1 Mg in HNO3 2% (v/v). The detection power resulted to be at or below the μg L−1 level, whilst the precision expressed as relative standard deviation R.S.D. was almost always equal to or better than 3.3%. Certified reference material Infant Formula (NIST SRM 1846) was used to assess the accuracy of the proposed method, which proved to be accurate and precise. Recovery rates were in the range of 83-117%. Aqueous calibration was carried out for each element under study.  相似文献   

16.
将偏最小二乘法(PLS)用于同步荧光光谱严重重叠的多柔比星(doxorubicin, DOX)和柔红霉素(daunorubicin, DNR)两组分混合体系进行波谱解析, 建立了该混合体系含量同时测定的新方法. 在pH 3.45 B-R缓冲溶液中, 波长差Δλ=55 nm时, 用测得的25个混合标样的同步荧光原始光谱、一阶导数光谱值建立模型. DOX和DNR在质量浓度为0.05~3.0 μg/mL范围内呈现良好的线性关系, 所建立的测定二者模型的相关系数分别为0.9897和0.9909; 平均回收率分别为101.0%和101.4%; 预测均方根误差(RMSEP)分别为0.1400和0.1395; 预测相对标准误差(SEP)分别为0.1541和0.1525. 该方法可应用于尿液样品的分析测定.  相似文献   

17.
研究了用流动注射分析进行阻尼最小二乘分光光度法同时测定锌、铜和钴的新方法。方法基于阻尼最小二乘法改进CPA法,将阻尼因子和非零截距引入CPA法,降低了P系数矩阵的病态程度和改善了校正模型的预报能力,使结果更加准确,已将方法成功用于锌铜钴混合体系的分析。  相似文献   

18.
Multi-way partial least squares modeling of water quality data   总被引:1,自引:0,他引:1  
A 10 years surface water quality data set pertaining to a polluted river was analyzed using partial least squares (PLS) regression models. Both the unfold-PLS and N-PLS (tri-PLS and quadri-PLS) models were calibrated through leave-one out cross-validation method. These were applied to the multivariate, multi-way data array with a view to assess and compare their predictive capabilities for biochemical oxygen demand (BOD) of river water in terms of their relative mean squares error of cross-validation, prediction and variance captured. The sum of squares of residuals and leverages were computed and analyzed to identify the sites, variables, years and months which may have influence on the constructed model. Both the tri- and quadri-PLS models yielded relatively low validation error as compared to unfold-PLS and captured high variance in model. Moreover, both of these methods produced acceptable model precision and accuracy. In case of tri-PLS the root mean squares errors were 1.65 and 2.17 for calibration and prediction, respectively; whereas these were 2.58 and 1.09 for quadri-PLS. At a preliminary level it seems that BOD can be predicted but a different data arrangement is needed. Moreover, analysis of the scores and loadings plots of the N-PLS models could provide information on time evolution of the river water quality.  相似文献   

19.
From the fundamental parts of PLS‐DA, Fisher's canonical discriminant analysis (FCDA) and Powered PLS (PPLS), we develop the concept of powered PLS for classification problems (PPLS‐DA). By taking advantage of a sequence of data reducing linear transformations (consistent with the computation of ordinary PLS‐DA components), PPLS‐DA computes each component from the transformed data by maximization of a parameterized Rayleigh quotient associated with FCDA. Models found by the powered PLS methodology can contribute to reveal the relevance of particular predictors and often requires fewer and simpler components than their ordinary PLS counterparts. From the possibility of imposing restrictions on the powers available for optimization we obtain an explorative approach to predictive modeling not available to the traditional PLS methods. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

20.
The problem of fitting a helix to data arises in analysis of protein structure, in nuclear physics, and in engineering. A continuous helix is described by five parameters: helix axis, helix radius, and helix pitch. One of these helix parameters is frequently predefined in the helix fitting. Other algorithms find only the helix axis or determine separately the helix axis, the helix radius, or the helix pitch. Here we describe a total least squares method, HELFIT, for helix fitting. HELFIT enables one to calculate simultaneously all five of the helix parameters with high accuracy. The minimum number of data points required for the analysis is only four. HELFIT is very insensitive to noise even in short helices. HELFIT also calculates a parameter, p = rmsd/(N − 1)1/2, which estimates the regularity of helical structures independent of the number of data points, where rmsd is the root mean square distance from the best-fit helix to data points and N is the number of data points. It should become a basic tool of structural bioinformatics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号