首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Kernel partial least squares (KPLS) has become a popular technique for regression and classification of complex data sets, which is a nonlinear extension of linear PLS in which training samples are transformed into a feature space via a nonlinear mapping. The PLS algorithm can then be carried out in the feature space. In the present study, we attempt to develop a novel tree KPLS (TKPLS) classification algorithm by constructing an informative kernel on the basis of decision tree ensembles. The constructed tree kernel can effectively discover the similarities of samples and select informative features by variable importance ranking in the process of building the kernel. Simultaneously, TKPLS can also handle nonlinear relationships in the structure–activity relationship data by such a kernel. Finally, three data sets related to different categorical bioactivities of compounds are used to evaluate the performance of TKPLS. The results show that the TKPLS algorithm can be regarded as an alternative and promising classification technique. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

2.
The feasibility of utilizing an Adaboost algorithm in conjuction with near-infrared (NIR) spectroscopy to automatically distinguish cigarettes of different brands was explored. Simple linear discriminant analysis (LDA) was used as the base algorithm to train all weak classifiers in Adaboost. Both principal component analysis (PCA) and its kernel version (kernel principal component analysis, KPCA) were used for feature extraction and were also compared to each other. The influence of the training set size on the final classification model was also investigated. Using a case study, it was demonstrated that Adaboost coupled with PCA or KPCA can obviously improve the ability to discriminate between samples that cannot be separated by a single linear classifier. However, in term of the overall performance, KPCA appears preferable to PCA for feature extraction, especially when the samples used for training are relatively small. The results also indicate that more training samples should be applied, if possible, in order to fully demonstrate the superiority of Adaboost. It seems that the use of an Adaboost algorithm in conjunction with NIR spectroscopy in combination with KPCA for feature extraction comprises a promising tool for distinguishing cigarettes of different brands, especially in situations where there is an obvious overlap between the NIR spectra afforded by cigarettes of different brands.  相似文献   

3.
针对高维小样本质谱数据在构造模型时易产生的过拟合现象、变量间的严重共线性、及结构与性质间的非线性关系,采用了核分段逆回归(KSIR)特征提取集成线性判别分析(LDA)新技术。首先以KSIR算法完成质谱数据的非线性特征提取,然后在由新特征矢量张成的低维空间构造样本类别的线性判别函数,负责各样本个体类别的判定。将KSIR-LDA方法应用于软饮料的质谱数据分类,结果表明:该方法不仅适应质谱数据与性质间的非线性关系,而且可以更少、解释能力更强的特征变量取得更高的分类精度,并能实现在低维特征空间对数据的解释及可视化。  相似文献   

4.
The nonlinear, nonnegative single‐mixture blind source separation problem consists of decomposing observed nonlinearly mixed multicomponent signal into nonnegative dependent component (source) signals. The problem is difficult and is a special case of the underdetermined blind source separation problem. However, it is practically relevant for the contemporary metabolic profiling of biological samples when only one sample is available for acquiring mass spectra; afterwards, the pure components are extracted. Herein, we present a method for the blind separation of nonnegative dependent sources from a single, nonlinear mixture. First, an explicit feature map is used to map a single mixture into a pseudo multi‐mixture. Second, an empirical kernel map is used for implicit mapping of a pseudo multi‐mixture into a high‐dimensional reproducible kernel Hilbert space. Under sparse probabilistic conditions that were previously imposed on sources, the single‐mixture nonlinear problem is converted into an equivalent linear, multiple‐mixture problem that consists of the original sources and their higher‐order monomials. These monomials are suppressed by robust principal component analysis and hard, soft, and trimmed thresholding. Sparseness‐constrained nonnegative matrix factorizations in reproducible kernel Hilbert space yield sets of separated components. Afterwards, separated components are annotated with the pure components from the library using the maximal correlation criterion. The proposed method is depicted with a numerical example that is related to the extraction of eight dependent components from one nonlinear mixture. The method is further demonstrated on three nonlinear chemical reactions of peptide synthesis in which 25, 19, and 28 dependent analytes are extracted from one nonlinear mixture mass spectra. The goal application of the proposed method is, in combination with other separation techniques, mass spectrometry‐based non‐targeted metabolic profiling, such as biomarker identification studies. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

5.
Large amounts of data from high-throughput metabolomics experiments become commonly more and more complex, which brings an enormous amount of challenges to existing statistical modeling. Thus there is a need to develop statistically efficient approach for mining the underlying metabolite information contained by metabolomics data under investigation. In the work, we developed a novel kernel Fisher discriminant analysis (KFDA) algorithm by constructing an informative kernel based on decision tree ensemble. The constructed kernel can effectively encode the similarities of metabolomics samples between informative metabolites/biomarkers in specific parts of the measurement space. Simultaneously, informative metabolites or potential biomarkers can be successfully discovered by variable importance ranking in the process of building kernel. Moreover, KFDA can also deal with nonlinear relationship in the metabolomics data by such a kernel to some extent. Finally, two real metabolomics datasets together with a simulated data were used to demonstrate the performance of the proposed approach through the comparison of different approaches.  相似文献   

6.
Kernel partial least squares (KPLS) has become popular techniques for chemical and biological modeling, which is a nonlinear extension of linear PLS. Training samples are transformed into a feature space via a nonlinear mapping, and then PLS algorithm can be carried out in the feature space. However, one of the main limitations of KPLS is that each feature is given the same importance in the kernel matrix, thus explaining the poor performance of KPLS for data with many irrelevant features. In this study, we provide a new strategy incorporated variable importance into KPLS, which is termed as the WKPLS approach. The WKPLS approach by modifying the kernel matrix provides a feasible way to differentiate between the true and noise variables. On the basis of the fact that the regression coefficients of the PLS model reflect the importance of variables, we firstly obtain the normalized regression coefficients by establishing the PLS model with all the variables. Then, Variable importance is incorporated into primary kernel. The performance of WKPLS is investigated with one simulated dataset and two structure–activity relationship (SAR) datasets. Compared with standard linear kernel PLS and Gaussian kernel PLS, The results show that WKPLS yields superior prediction performances to standard KPLS. WKPLS could be considered as a good mechanism by introducing extra information to improve the performance of KPLS for modeling SAR.  相似文献   

7.
8.
Molecular aggregation state of bioactive compounds plays a key role in bio‐interactive procedure. Diverse aggregation states of bioactive compounds contribute to different biological or chemical properties. Water‐bridge, as the simple hetero‐molecular aggregation, has been found bridging the binding between many bioactive compounds and their targets through hydrogen bonding network, e.g. in the recognition of neonicotinoids with insect nAChRs. To better understanding the roles of water‐bridge on bioactivities of compounds, an approach of hetero‐dimeric aggregation with water was proposed. Quantitative structure‐activity relationship (QSAR) and pharmacophore modeling investigations were applied on 19 neonicotinoids, as well as their aggregates with water. The aggregate‐based CoMSIA, PHASE and linear QSAR models presented better statistical significance and predictabilities than the monomer ones, which indicated that the bioactivities correlated with the aggregate properties and water bridged hydrogen bond of the active site. All results revealed the essential roles of water‐bridge in ligand recognition, which should be considered in future ligand design and optimization.  相似文献   

9.
Nonlinear underdetermined blind separation of nonnegative dependent sources consists in decomposing a set of observed nonlinearly mixed signals into a greater number of original nonnegative and dependent component (source) signals. This hard problem is practically relevant for contemporary metabolic profiling of biological samples, where sources (a.k.a. pure components or analytes) are aimed to be extracted from mass spectra of nonlinear multicomponent mixtures. This paper presents a method for nonlinear underdetermined blind separation of nonnegative dependent sources that comply with a sparse probabilistic model, that is, sources are constrained to be sparse in support and amplitude. This model is validated on experimental pure component mass spectra. Under a sparse prior, a nonlinear problem is converted into an equivalent linear one comprised of original sources and their higher‐order, mostly second‐order, monomials. The influence of these monomials, which stand for error terms, is reduced by preprocessing a matrix of mixtures by means of robust principal component analysis and hard, soft and trimmed thresholding. Preprocessed data matrices are mapped in high‐dimensional reproducible kernel Hilbert space (RKHS) of functions by means of an empirical kernel map. Sparseness‐constrained nonnegative matrix factorizations in RKHS yield sets of separated components. They are assigned to pure components from the library using a maximal correlation criterion. The methodology is exemplified on demanding numerical and experimental examples related respectively to extraction of eight dependent components from three nonlinear mixtures and to extraction of 25 dependent analytes from nine nonlinear mixture mass spectra recorded in nonlinear chemical reaction of peptide synthesis. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

10.
Four 1,8‐naphthalimide hydrazone molecules with different electron‐donating groups have been applied in the study of linear and nonlinear optical (NLO) properties. These compounds showed strong green emission in solution. Their NLO properties such as two‐photon absorption (TPA) behavior with femtosecond laser pulses ca. 800 nm and excited‐state absorption (ESA) behavior with nanosecond laser pulses at 532 nm were investigated. Compound 4 presented the largest two‐photon cross section (550 GM) among them due to two factors: the conjugated length of compound 4 is the longest and the electron‐donating ability of compound 4 is the strongest. Different from TPA behavior, compound 2 showed the best nonlinear absorption properties at 532 nm and its nonlinear absorption coefficient and third‐order nonlinear optical susceptibilities χ (3) were up to 1.41×10?10 MKS and 4.65×10?12 esu, respectively. Through the modification of the structure, the nonlinear optical properties of these compounds at different wavelengths (532 and 800 nm) were well tuned. The great broad‐band nonlinear optical properties indicate hydrazones are good candidates for organic nonlinear optical absorption materials.  相似文献   

11.
与统计分析和神经网络相比,基于结构风险最小的支持向量机有更好的分类性能。它用于非线性分类时,先将样本映射到更高维的特征空间,往往会增加复共线性与冗余信息,将影响样本分布,降低线性支持向量机分类器(LSVC)的预测性能。本研究提出非线性分类相关分析算法(NLCCA),利用核函数技术,无需了解非线性映射的算式,从特征空间的样本映像中提取分类相关成分,以消除冗余信息,改善样本分布。由此构建的NLCCA-LSVC集成分类器具有优良的预测性能。经模拟数据的测试,并实际用于两个复杂的化学模式识别问题,均取得令人满意的效果,也印证了算法的有效性。  相似文献   

12.
13.
Underdetermined blind separation of nonnegative dependent sources consists in decomposing a set of observed mixed signals into greater number of original nonnegative and dependent component (source) signals. That is an important problem for which very few algorithms exist. It is also practically relevant for contemporary metabolic profiling of biological samples, such as biomarker identification studies, where sources (a.k.a. pure components or analytes) are aimed to be extracted from mass spectra of complex multicomponent mixtures. This paper presents a method for underdetermined blind separation of nonnegative dependent sources. The method performs nonlinear mixture‐wise mapping of observed data in high‐dimensional reproducible kernel Hilbert space (RKHS) of functions and sparseness‐constrained nonnegative matrix factorization (NMF) therein. Thus, the original problem is converted into new one with increased number of mixtures, increased number of dependent sources, and higher‐order (error) terms generated by nonlinear mapping. Provided that amplitudes of original components are sparsely distributed, which is the case for mass spectra of analytes, sparseness‐constrained NMF in RKHS yields, with significant probability, improved accuracy relative to the case when the same NMF algorithm is performed on the original problem. The method is exemplified on numerical and experimental examples related respectively to extraction of 10 dependent components from five mixtures and to extraction of 10 dependent analytes from mass spectra of two to five mixtures. Thereby, analytes mimic complexity of components expected to be found in biological samples. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

14.
Compared with daily recorded process variables that can be easily obtained through the distributed control system, acquirements of key quality variables are much more difficult. As a result, for soft sensor development, we may only have a small number of output data samples and have much more input data samples. In this case, it is important to incorporate more input data samples to improve the modeling performance of the soft sensor. On the basis of the semisupervised modeling method, this paper aims to extend the linear semisupervised soft sensor to the nonlinear one, with incorporation of the kernel learning algorithm. Under the probabilistic modeling framework, a mixture form of the nonlinear semisupervised soft sensor is developed in the present work. To evaluate the performance of the developed nonlinear semisupervised soft sensor, an industrial case study is provided. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

15.
Existing approximation to the softness kernel, successfully explored in earlier work, has been extended; the normal Gauss distribution function has been used instead of the Dirac delta. The softness kernel becomes continuous functions in space and may be used to calculate the linear response function of the electron density. Three-dimensional visualization of the softness kernel and the linear response function are presented for a nitrogen atom as a working example. By using a single parameter of the spatial Gauss distribution, the novel softness kernel has been adjusted to be consistent with the standard form of the hardness kernel, representing the leading fraction of the electronic interactions in the system.  相似文献   

16.
The nonlinear Schrödinger equation with Gaussian convolution kernel K2 induces the group SU3 with reference to the classification of the multiplet structure of the eigenstates. Such a field can be used to describe some atoms (where the outermost electrons are related tos-orbitals) as a self-interacting, extended particle with an internal structure. In the case of those atoms, where the valence electrons are described byp-orbitals, and almost all molecules the Gaussian kernel K2 has to be generalized by Hermite polynomials. By that, we can formulate a nonlinear field theory, establishing the spatial symmetry of a system via basis structure functions. Thus the symmetry represents the most essential starting-point for treating molecules as quasi-particles with an internal structure. It will be shown that there is some connection with the concept of chirality functions and the Ginzburg — Landau theory of super-conductivity. The latter theory indicates that we can consider the nonlinear Schrödinger equation and its generalizations as a classical field theory being associated with phase transitions.  相似文献   

17.
18.
19.
秦金贵 《有机化学》2001,21(11):1081-1089
介绍本研究组得到国家自然科学基金资助的四个方面工作的进展。研究了金属有机化合物的结构与非线性光学性质的关系,总结了从分子几何构型着手,根据不同用途,对金属有机非线性光学材料进行分子设计的经验规律;提出了利用“组合式共轭桥”进行有机非线性光学发色团分子设计的新思路,所合成的几个有机化合物既有很大的光学非线性,又有紫移的最大吸收峰;通过化学键将有机发色团分子张到各种高分子的侧链上,合成和表征了潜在的电光高分子和光折变高分子材料;采用无机-有机夹层复合的思路对兼有导电性和强磁性的分子材料进行了探索,将一些有机小分子和导电高分子分别插入了层状无机物MPS3的层间,得到了8个新的分子磁体,而另一夹层化合物则表现了较高的电导率。  相似文献   

20.
This paper focuses on the application of principal component analysis (PCA) to facilitate the optimization of the derivatization of oestrogenic steroids—estrone, 17β‐estradiol, estriol, 17α‐ethinylestradiol and diethylstilbestrol—in order to achieve (1) the complete derivatization of all the hydroxyl groups contained in the structure of the compounds and (2) the greatest effectiveness of this reaction. Six different derivatization reagents were used in this study, whereas 2‐methyl‐anthracene was applied as the internal standard to evaluate the effectiveness of the reactions. The experimental data were subjected to PCA. With PCA, the dimensionality of the original multivariable data set could be reduced and the selection of optimum conditions for derivatization facilitated. The mixture of 99% N,O‐bis(trimethylsilyl)trifluoroacetamide + 1% trimethylchlorosilane and pyridine (1:1, v/v) at 60 °C for 30 min has been established as the most convenient and efficient means of derivatizing the aforementioned oestrogenic steroids and diethylstilbestrol; the N‐methyl‐N‐(trimethylsilyl)trifluoroacetamide + pyridine (1:1, v/v) mixture seems to be a promising alternative. The application of PCA for optimizing the derivatization procedure, proposed for the first time in this study, is particularly useful in the development of multicomponent methods across several chemical classes of compounds. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号