共查询到20条相似文献,搜索用时 15 毫秒
1.
Charles Bouveyron 《Journal of Chemometrics》2013,27(12):433-446
In chemometrics, the supervised and unsupervised classification of high‐dimensional data has become a recurrent problem. Model‐based techniques for discriminant analysis and clustering are popular tools, which are renowned for their probabilistic foundations and their flexibility. However, classical model‐based techniques show a disappointing behaviour in high‐dimensional spaces, which up to now have been limited in their use within chemometrics. The recent developments in model‐based classification overcame these drawbacks and enabled the efficient classification of high‐dimensional data, even in the ‘small n / large p’ condition. This work presents a comprehensive review of these recent approaches, including regularization‐based techniques, parsimonious modelling, subspace classification methods and classification methods based on variable selection. The use of these model‐based methods is also illustrated on real‐world classification problems in chemometrics using R packages. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献
2.
3.
Hai‐Yan Fu Hai‐Long Wu Yong‐Jie Yu Li‐Li Yu Shu‐Rong Zhang Jin‐Fang Nie Shu‐Fang Li Ru‐Qin Yu 《Journal of Chemometrics》2011,25(8):408-429
A novel third‐order calibration algorithm, alternating weighted residue constraint quadrilinear decomposition (AWRCQLD) based on pseudo‐fully stretched matrix forms of quadrilinear model, was developed for the quantitative analysis of four‐way data arrays. The AWRCQLD algorithm is based on the new scheme that introduces four unique constraint parts to improve the quality of four‐way PARAFAC algorithm. The tested results demonstrated that the AWRCQLD algorithm has the advantage of faster convergence rate and being insensitive to the excess component number adopted in the model compared with four‐way PARAFAC. Moreover, simulated data and real experimental data were analyzed to explore the third‐order advantage over the second‐order counterpart. The results showed that third‐order calibration methods possess third‐order advantages which allow more inherent information to be obtained from four‐way data, so it can improve the resolving and quantitative capability in contrast with second‐order calibration especially in high collinear systems. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献
4.
Pierpaolo D'Urso Livia De Giovanni Elizabeth Ann Maharaj Riccardo Massari 《Journal of Chemometrics》2014,28(1):28-51
Following a nonparametric approach, we suggest a time‐series clustering method. Our clustering approach combines the benefits connected to the interpretative power of the nonparametric representation of the time series, and the clustering and vector quantization informational gain produced by the adopted unsupervised neural networks technique, enhanced with the self‐organizing maps ordering and topological preservation abilities. The proposed clustering method takes into account a composite wavelet‐based information of the multivariate time series by adding to the information connected to the wavelet variance, namely the influence of variability of individual univariate components of the multivariate time series across scales, the information associated to wavelet correlation, represented by the interaction between pairs of univariate components of the multivariate time series at each scale, and then suitably tuning the combination of these pieces of information. In order to assess the effectiveness of the proposed clustering approach, a simulation study and an empirical application are shown. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献
5.
Matrix‐covariant representation of high‐order configuration interaction and coupled cluster theories
Anatoliy V. Luzanov 《International journal of quantum chemistry》2008,108(4):671-695
We present the closed form of the reduced density matrices (RDMs) of arbitrary order for configuration interaction (CI) wave functions at any excitation level, up to the full CI. A special operator technique due to Bogoliubov is applied and extended. It focuses on constructions of matrix‐covariant expressions independent of the basis set used. The corresponding variational CI equations are given in an explicit form containing the matrices related to conventional excitation operators. A subsequent transformation of the latter to an irreducible form makes it possible to generate the matrix‐covariant representation for coupled cluster (CC) models. Here this transformation is performed for a simplified high‐order CC scheme somewhat reminiscent of the quadratic CI model. A generalized spin‐flip approximation closely related to high‐order CI and CC models is presented, stressing on a possible inclusion of nondynamical and dynamical correlation effects for multiple bond breaking. A derivation of the full CI and simple CC models for systems involving effective three‐electron interactions is also given, thereby demonstrating the capability of the proposed method to deal with complicated many‐body problem. © 2007 Wiley Periodicals, Inc. Int J Quantum Chem, 2008 相似文献
6.
In several scientific applications, data are generated from two or more diverse sources (views) with the goal of predicting an outcome of interest. Often it is the case that the outcome is not associated with any single view. However, the synergy of all measurements from each view may yield a more predictive classifier. For example, consider a drug discovery application in which individual molecules are described partially by several assay screens based on diverse profiles and partially by their chemical structural fingerprints. A common classification problem is to determine whether the molecule is associated with a particular disease. In this paper, a co‐training algorithm is developed to utilize data from diverse sources to predict the common class variable. Novel enhancements for variable importance, robustness to a mislabeled class variable, and a technique to handle unbalanced classes are applied to the motivating data set, highlighting that the approach attains strong performance and provides useful diagnostics for data analytic purposes. In addition, comparisons to a framework with data fusion using partial least squares (PLS) are also assessed on real data. An R package for performing the proposed approach is provided as Supporting information. Copyright © 2003 John Wiley & Sons, Ltd. 相似文献
7.
Jingjing Xu Yuanshan Wang Xiangnan Xu Kian-Kai Cheng Daniel Raftery Jiyang Dong 《Molecules (Basel, Switzerland)》2021,26(19)
In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputation remains a challenging problem. Here, we propose a non-negative matrix factorization (NMF)-based method for NA imputation in MS-based metabolomics data, which makes use of both global and local information of the data. The proposed method was compared with three commonly used methods: k-nearest neighbors (kNN), random forest (RF), and outlier-robust (ORI) missing values imputation. These methods were evaluated from the perspectives of accuracy of imputation, retrieval of data structures, and rank of imputation superiority. The experimental results showed that the NMF-based method is well-adapted to various cases of data missingness and the presence of outliers in MS-based metabolic profiles. It outperformed kNN and ORI and showed results comparable with the RF method. Furthermore, the NMF method is more robust and less susceptible to outliers as compared with the RF method. The proposed NMF-based scheme may serve as an alternative NA imputation method which may facilitate biological interpretations of metabolomics data. 相似文献
8.
《Electrophoresis》2017,38(3-4):494-500
An easy‐to‐do paper‐based solid‐phase microextraction (p‐SPME) was developed for determination of 8‐hydroxy‐2’‐deoxyguanosine (8‐OHdG) in urine sample by CE‐LIF. Small piece of filter paper was used as a solid phase to extract 8‐OHdG from urine sample. Its primary mechanism is based on the hydrogen‐bonding interaction between 8‐OHdG and cellulose molecules. The effects of the pH of the sample solution, extraction time, and temperature on the peak area of the analyte were investigated in order to obtain the optimal p‐SPME conditions. Comparing with the untreated sample, the p‐SPME can significantly reduce the interference to the separation of 8‐OHdG by CE‐LIF. Meanwhile, the p‐SPEM can provide more than three times concentrated effect. The developed method was evaluated according to an FDA guideline for biological analysis. The precisions (RSD%, n = 5) of the peak area and migration time of the analyte at three different concentrations were within 3.02–5.82% and 0.92–1.58%, respectively. The limit of identification of the method is about 5 nM according to the significant difference between two sets of the samples with and without spiking the standard (Student's t ‐test, p < 0.05). Good linearity was obtained in the range of 10–1000 nM (R 2>0.99) based on the standard addition. The recoveries at three different concentrations were within 99.8–103.5%. The results of the real sample analysis are consistent with those reported in our previous paper (Electrophoresis 2014, 35, 1873–1879). 相似文献
9.
Javier Palarea‐Albaladejo Josep Antoni Martín‐Fernndez Ricardo Antonio Olea 《Journal of Chemometrics》2014,28(7):585-599
The bootstrap method is commonly used to estimate the distribution of estimators and their associated uncertainty when explicit analytic expressions are not available or are difficult to obtain. It has been widely applied in environmental and geochemical studies, where the data generated often represent parts of whole, typically chemical concentrations. This kind of constrained data is generically called compositional data, and they require specialised statistical methods to properly account for their particular covariance structure. On the other hand, it is not unusual in practice that those data contain labels denoting nondetects, that is, concentrations falling below detection limits. Nondetects impede the implementation of the bootstrap and represent an additional source of uncertainty that must be taken into account. In this work, a bootstrap scheme is devised that handles nondetects by adding an imputation step within the resampling process and conveniently propagates their associated uncertainly. In doing so, it considers the constrained relationships between chemical concentrations originated from their compositional nature. Bootstrap estimates using a range of imputation methods, including new stochastic proposals, are compared across scenarios of increasing difficulty. They are formulated to meet compositional principles following the log‐ratio approach, and an adjustment is introduced in the multivariate case to deal with nonclosed samples. Results suggest that nondetect bootstrap based on model‐based imputation is generally preferable. A robust approach based on isometric log‐ratio transformations appears to be particularly suited in this context. Computer routines in the R statistical programming language are provided. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献
10.
A new strategy was explored to generate pure gold cluster ions, Aun+/?, from gold films deposited on solid substrates via a matrix‐assisted laser ablation technique. The gold films deposited on SiO2‐particle‐assembled photonic crystals were demonstrated to be the most ideal compared with the films deposited on various glass slides. Dropped with a matrix of 2‐(4‐hydroxyphenylazo) benzoic acid and bombarded by nitrogen pulse laser (355 nm), they could release a series of Aun+ with n more than 110 or Aun? with n more than 60 according to the data obtained by inline time‐of‐flight mass spectrometry. The gold‐deposited photonic crystal substrates could be stored at room temperature for at least 6 months. The method is hence steady and convenient in use. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献
11.
Muhammad Rafiq Muhammad Khalid Muhammad Nawaz Tahir Muhammad Umair Ahmad Muhammad Usman Khan Muhammad Moazzam Naseer Ataualpa Albert Carmo Braga Shabbir Muhammad Zahid Shafiq 《应用有机金属化学》2019,33(11)
Present study advocates the joint experimental and computational studies of two potent benzoimidazole‐based hydrazones with chemical formula C23H18F2N4O ( 5a ) and C25H22FN5O3 ( 5b ). Both 5a and 5b were synthesized and resolved into their crystal structures using SC‐XRD for the assessment of bond lengths, bond angles, unit cells and space groups. The structures of 5a and 5b were chemically characterized using infrared (FT‐IR), UV–Visible, nuclear magnetic resonance (1H‐NMR and 13C‐NMR), EIMS and elemental analysis. DFT at M06‐2X/6‐31G(d,p) level of theory was performed to get optimized structures and countercheck the experimental findings. Overall, DFT findings show excellent concurrence with the experimental data which confirms the purity of both compounds. FMO, NBO analysis, MEP surfaces and nonlinear optical (NLO) properties were explored at same level of theory. UV–Vis analysis at TDDFT/M06‐2X/6‐31G(d,p) level of theory showed that 5b is red shifted with λmax 331.69 nm as compared to 5a with λmax 240.25 nm. Global reactivity parameters were estimated using energy of FMOs indicated the greater harness value than the softness values of 5a and 5b . NBO analysis confirmed that the presence of non‐covalent interactions, hydrogen bonding and hyper conjugative interactions are pivotal cause for the existence of 5a and 5b in the solid‐state. NLO results of 5a and 5b were observed better than standard molecule recommended the NLO activity of said molecules for optoelectronic applications. 相似文献
12.
The rapid development of new technologies for large‐scale analysis of genetic variation in the genomes of individuals and populations has presented statistical geneticists with a grand challenge to develop efficient methods for identifying the small proportion of all identified genetic polymorphisms that have effects on traits of interest. To address such a “large p small n” problem, we have developed a heteroscedastic effects model (HEM) that has been shown to be powerful in high‐throughput genetic analyses. Here, we describe how this whole‐genome model can also be utilized in chemometric analysis. As a proof of concept, we use HEM to predict analyte concentrations in silage using Fourier transform infrared spectroscopy signals. The results show that HEM often outperforms the classic methods and in addition to this presents a substantial computational advantage in the analyses of such high‐dimensional data. The results thus show the value of taking an interdisciplinary approach to chemometric analysis and indicate that large‐scale genomic models can be a promising new approach for chemometric analysis that deserve to be evaluated more by experts in the field. The software used for our analyses is freely available as an R package at http://cran.r‐project.org/web/packages/bigRR/ . Copyright © 2014 John Wiley & Sons, Ltd. 相似文献
13.
Timothy R. Croley Kevin D. White Jon Wong John H. Callahan Steven M. Musser Margaret Antler Vitaly Lashin Graham A. McGibbon 《Journal of separation science》2013,36(5):971-979
Increasing importation of food and the diversity of potential contaminants have necessitated more analytical testing of these foods. Historically, mass spectrometric methods for testing foods were confined to monitoring selected ions (SIM or MRM), achieving sensitivity by focusing on targeted ion signals. A limiting factor in this approach is that any contaminants not included on the target list are not typically identified and retrospective data mining is limited. A potential solution is to utilize high‐resolution MS to acquire accurate mass full‐scan data. Based on the instrumental resolution, these data can be correlated to the actual mass of a contaminant, which would allow for identification of both target compounds and compounds that are not on a target list (nontargets). The focus of this research was to develop software algorithms to provide rapid and accurate data processing of LC/MS data to identify both targeted and nontargeted analytes. Software from a commercial vendor was developed to process LC/MS data and the results were compared to an alternate, vendor‐supplied solution. The commercial software performed well and demonstrated the potential for a fully automated processing solution. 相似文献
14.
Artem B. Mamonov Xin Zhang Daniel M. Zuckerman 《Journal of computational chemistry》2011,32(3):396-405
We adapted existing polymer growth strategies for equilibrium sampling of peptides described by modern atomistic forcefields with a simple uniform dielectric solvent. The main novel feature of our approach is the use of precalculated statistical libraries of molecular fragments. A molecule is sampled by combining fragment configurations—of single residues in this study—which are stored in the libraries. Ensembles generated from the independent libraries are reweighted to conform with the Boltzmann‐factor distribution of the forcefield describing the full molecule. In this way, high‐quality equilibrium sampling of small peptides (4–8 residues) typically requires less than one hour of single‐processor wallclock time and can be significantly faster than Langevin simulations. Furthermore, approximate, clash‐free ensembles can be generated for larger peptides (up to 32 residues in this study) in less than a minute of single‐processor computing. We discuss possible applications of our growth procedure to free energy calculation, fragment assembly protein‐structure prediction protocols, and to “multi‐resolution” sampling. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2011 相似文献
15.
Metabolic profiling for the identification of Huntington biomarkers by on‐line solid‐phase extraction capillary electrophoresis mass spectrometry combined with advanced data analysis tools 下载免费PDF全文
Laura Pont Fernando Benavente Joaquim Jaumot Romà Tauler Jordi Alberch Silvia Ginés José Barbosa Victoria Sanz‐Nebot 《Electrophoresis》2016,37(5-6):795-808
In this work, an untargeted metabolomic approach based on sensitive analysis by on‐line solid‐phase extraction capillary electrophoresis mass spectrometry (SPE‐CE‐MS) in combination with multivariate data analysis is proposed as an efficient method for the identification of biomarkers of Huntington's disease (HD) progression in plasma. For this purpose, plasma samples from wild‐type (wt) and HD (R6/1) mice of different ages (8, 12, and 30 weeks), were analyzed by C18‐SPE‐CE‐MS in order to obtain the characteristic electrophoretic profiles of low molecular mass compounds. Then, multivariate curve resolution alternating least squares (MCR‐ALS) was applied to the multiple full scan MS datasets. This strategy permitted the resolution of a large number of metabolites being characterized by their electrophoretic peaks and their corresponding mass spectra. A total number of 29 compounds were relevant to discriminate between wt and HD plasma samples, as well as to follow‐up the HD progression. The intracellular signaling was found to be the most affected metabolic pathway in HD mice after 12 weeks of birth, when mice already showed motor coordination deficiencies and cognitive decline. This fact agreed with the atrophy and dysfunction of specific neurons, loss of several types of receptors, and changed expression of neurotransmitters. 相似文献
16.
《Journal of separation science》2017,40(3):663-670
Radix Polygalae, the dried roots of Polygala tenuifolia and P. sibirica , is one of the most well‐known traditional Chinese medicinal plants. It is an important medicinal plant that has been used as a sedative and to improve memory for a number of years in most of Asia. However, the in vivo constituents of the multiple constituents from Radix Polygalae remain unknown. In the current study, ultra high performance liquid chromatography coupled to quadrupole time‐of‐flight mass spectrometry and the MarkerLynxTM software combined with multiple data processing approach were used to study the constituents in vitro and in vivo. A rapid and efficient method for the characterization of multiple constituents in the herbal medicine Radix Polygalae by ultra high performance liquid chromatography coupled to quadrupole time‐of‐flight mass spectrometry is described. In total, 35 compounds in the Radix Polygalae and 13 compounds absorbed into blood were characterized. Of the 35 compounds in vitro, ten were reported for first time. In the 13 compounds in vivo, six were prototype components and seven were metabolites were also elucidated for first time. This work narrowed the range of screening the potentially bioactive components and provided a basis for the quality control and mechanism of action. 相似文献
17.
Jiri Brabec Chao Yang Evgeny Epifanovsky Anna I. Krylov Esmond Ng 《Journal of computational chemistry》2016,37(12):1059-1067
We present an algorithm for reducing the computational work involved in coupled‐cluster (CC) calculations by sparsifying the amplitude correction within a CC amplitude update procedure. We provide a theoretical justification for this approach, which is based on the convergence theory of inexact Newton iterations. We demonstrate by numerical examples that, in the simplest case of the CCD equations, we can sparsify the amplitude correction by setting, on average, roughly 90% nonzero elements to zeros without a major effect on the convergence of the inexact Newton iterations. 相似文献
18.
Daphnane diterpenoids are mainly distributed in Thymelaeaceae and Euphorbiaceae and have various bioactivities. About 100 daphnane diterpenoids have been isolated from natural plants. In this review, we systematically summarize the 13C‐NMR data of daphnane diterpenoids isolated from natural plants over the past several decades and briefly discussed their biological activities and basic structural–activity relationship. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献
19.
Lucie Loukotková Eva Tesařová Zuzana Bosáková Pavel Repko Daniel W. Armstrong 《Journal of separation science》2010,33(9):1244-1254
Retention and enantioseparation behavior of ten 2,2′‐disubstituted or 2,3,2′‐trisubstituted 1,1′‐binaphthyls and 8,3′‐disubstituted 1,2′‐binaphthyls, which are used as catalysts in asymmetric synthesis, was investigated on eight chiral stationary phases (CSPs) based on β‐CD, polysaccharides (tris(3,5‐dimethylphenylcarbamate) cellulose or amylose CSPs) and new synthetic polymers (trans‐1,2‐diamino‐cyclohexane, trans‐1,2‐diphenylethylenediamine and trans‐9,10‐dihydro‐9,10‐ethanoanthracene‐(11S,12S)‐11,12‐dicarboxylic acid CSPs). Normal‐, reversed‐phase and polar‐organic separation modes were employed. The effect of the mobile phase composition was examined. The enantiomeric separation of binaphthyl derivatives, which possess quite similar structures, was possible in different enantioselective environments. The substituents and their positions on the binaphthyl skeleton affect their properties and, as a consequence, the separation system suitable for their enantioseparation. In general, the presence of ionizable groups on the binaphthyl skeleton, substitution with non‐identical groups and a chiral axis in the 1,2′ position had the greatest impact on the enantiomeric discrimination. The 8,3′‐disubstituted 1,2′‐binaphthyl derivatives were the most easily separated compounds in several separation systems. From all the chiral stationary phases tested, cellulose‐based columns were shown to be the most convenient for enantioseparation of the studied analytes. However, the polymeric CSPs with their complementary behavior provided good enantioselective environments for some derivatives that could be hardly separated in any other chromatographic system. 相似文献
20.
Karine Vuignier Szabolcs Fekete Pierre‐Alain Carrupt Jean‐Luc Veuthey Davy Guillarme 《Journal of separation science》2013,36(14):2231-2243
In the present study, three types of silica‐based monoliths, i.e. the first and second generations of commercial silica monolithic columns and a wide‐pore prototype monolith were compared for the analysis of large biomolecules. These molecules possess molecular weights between 1 and 66 kDa. The gradient kinetic performance of the first‐generation monolith was lower than that of the second generation, for large biomolecules (>14 kDa) but very close with smaller ones (1.3–5.8 kDa). In contrast, the wide‐pore prototype column was particularly attractive with proteins larger than 19 kDa (higher peak capacity). Among these three columns, the selectivity and retention remained quite similar but a possible larger number of accessible and charged residual silanols was noticed on the wide‐pore prototype material, which led to unpredicted small changes in selectivity and slightly broader peaks than expected. The peak shapes attained with the addition of 0.1% formic acid in the mobile phase remained acceptable for MS coupling, particularly for biomolecules of less than 6 kDa. It was found that one of the major issues with all of these silica‐based monoliths is the possible poor recovery of large biomolecules (principally with monoclonal antibody fragments of more than 25 kDa). 相似文献