期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A selective overview of feature screening for ultrahigh-dimensional data

《中国科学数学(英文版)》2015,(10)

High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of highdimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that only a small number of predictors contribute to the response, is frequently adopted and deemed useful in the analysis of high-dimensional data.Following this general principle, a large number of variable selection approaches via penalized least squares or likelihood have been developed in the recent literature to estimate a sparse model and select significant variables simultaneously. While the penalized variable selection methods have been successfully applied in many highdimensional analyses, modern applications in areas such as genomics and proteomics push the dimensionality of data to an even larger scale, where the dimension of data may grow exponentially with the sample size. This has been called ultrahigh-dimensional data in the literature. This work aims to present a selective overview of feature screening procedures for ultrahigh-dimensional data. We focus on insights into how to construct marginal utilities for feature screening on specific models and motivation for the need of model-free feature screening procedures. 相似文献

2.

天然气消费量的偏最小二乘支持向量机预测

谭水莲钟忠社马村尹勋刚胡军浩《数学建模及其应用》2014,3(1):35-40

结合偏最小二乘法和支持向量机的优缺点,提出基于偏最小二乘支持向量机的天然气消费量预测模型。首先,利用偏最小二乘法确定影响天然气消费量的新综合变量,建立以新综合变量为输入,天然气消费量为输出的支持向量机模型,对天然气消费量进行了预测;然后,与多元回归、偏最小二乘回归、普通支持向量机做误差检验比较,验证该方法的可行性与正确性。结果表明,此天然气消费量预测模型具有较高的精确度和应用价值。相似文献

3.

Rapidly computing robust minimum capacity s-t cuts: a case study in solving a sequence of maximum flow problems 总被引：1，自引：0，他引：1

Douglas S. Altner Özlem Ergun 《Annals of Operations Research》2011,190(1):3-15

Near infrared (NIR) spectroscopy is a rapid, non-destructive technology to predict a variety of wood properties and provides great opportunities to optimize manufacturing processes through the realization of in-line assessment of forest products. In this paper, a novel multivariate regression procedure, the hybrid model of principal component regression (PCR) and partial least squares (PLS), is proposed to develop more accurate prediction models for high-dimensional NIR spectral data. To integrate the merits of PCR and PLS, both principal components defined in PCR and latent variables in PLS are utilized in hybrid models by a common iterative procedure under the constraint that they should keep orthogonal to each other. In addition, we propose the modified sequential forward floating search method, originated in feature selection for classification problems, in order to overcome difficulties of searching the vast number of possible hybrid models. The effectiveness and efficiency of hybrid models are substantiated by experiments with three real-life datasets of forest products. The proposed hybrid approach can be applied in a wide range of applications with high-dimensional spectral data. 相似文献

4.

Local linear regression for data with AR errors

Runze Li Yan Li 《应用数学学报(英文版)》2009,25(3):427-444

In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques. We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one. From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set. 相似文献

5.

High-dimensional variable screening and bias in subsequent inference,with an empirical comparison

Peter Bühlmann Jacopo Mandozzi 《Computational Statistics》2014,29(3-4):407-430

We review variable selection and variable screening in high-dimensional linear models. Thereby, a major focus is an empirical comparison of various estimation methods with respect to true and false positive selection rates based on 128 different sparse scenarios from semi-real data (real data covariables but synthetic regression coefficients and noise). Furthermore, we present some theoretical bounds for the bias in subsequent least squares estimation, using the selected variables from the first stage, which have direct implications for construction of p-values for regression coefficients. 相似文献

6.

Bayesian Variable Selection for Median Regression

HU Danqing GU Yongquan ZHAO Weihua 《应用概率统计》2019,35(6):594-610

When the data has heavy tail feature or contains outliers, conventional variable selection methods based on penalized least squares or likelihood functions perform poorly. Based on Bayesian inference method, we study the Bayesian variable selection problem for median linear models. The Bayesian estimation method is proposed by using Bayesian model selection theory and Bayesian estimation method through selecting the Spike and Slab prior for regression coefficients, and the effective posterior Gibbs sampling procedure is also given. Extensive numerical simulations and Boston house price data analysis are used to illustrate the effectiveness of the proposed method. 相似文献

7.

Biomarker discovery: classification using pooled samples

Anna Telaar Dirk Repsilber Gerd Nürnberg 《Computational Statistics》2013,28(1):67-106

RNA-sample pooling is sometimes inevitable, but should be avoided in classification tasks like biomarker studies. Our simulation framework investigates a two-class classification study based on gene expression profiles to point out how strong the outcomes of single sample designs differ to those of pooling designs. The results show how the effects of pooling depend on pool size, discriminating pattern, number of informative features and the statistical learning method used (support vector machines with linear and radial kernel, random forest (RF), linear discriminant analysis, powered partial least squares discriminant analysis (PPLS-DA) and partial least squares discriminant analysis (PLS-DA)). As a measure for the pooling effect, we consider prediction error (PE) and the coincidence of important feature sets for classification based on PLS-DA, PPLS-DA and RF. In general, PPLS-DA and PLS-DA show constant PE with increasing pool size and low PE for patterns for which the convex hull of one class is not a cover of the other class. The coincidence of important feature sets is larger for PLS-DA and PPLS-DA as it is for RF. RF shows the best results for patterns in which the convex hull of one class is a cover of the other class, but these depend strongly on the pool size. We complete the PE results with experimental data which we pool artificially. The PE of PPLS-DA and PLS-DA are again least influenced by pooling and are low. Additionally, we show under which assumption the PLS-DA loading weights, as a measure for importance of features regarding classification, are equal for the different designs. 相似文献

8.

A coordinate gradient descent method for <Emphasis Type="Italic">ℓ</Emphasis><Subscript>1</Subscript>-regularized convex minimization

Sangwoon Yun Kim-Chuan Toh 《Computational Optimization and Applications》2011,48(2):273-307

In applications such as signal processing and statistics, many problems involve finding sparse solutions to under-determined linear systems of equations. These problems can be formulated as a structured nonsmooth optimization problems, i.e., the problem of minimizing ℓ ₁-regularized linear least squares problems. In this paper, we propose a block coordinate gradient descent method (abbreviated as CGD) to solve the more general ℓ ₁-regularized convex minimization problems, i.e., the problem of minimizing an ℓ ₁-regularized convex smooth function. We establish a Q-linear convergence rate for our method when the coordinate block is chosen by a Gauss-Southwell-type rule to ensure sufficient descent. We propose efficient implementations of the CGD method and report numerical results for solving large-scale ℓ ₁-regularized linear least squares problems arising in compressed sensing and image deconvolution as well as large-scale ℓ ₁-regularized logistic regression problems for feature selection in data classification. Comparison with several state-of-the-art algorithms specifically designed for solving large-scale ℓ ₁-regularized linear least squares or logistic regression problems suggests that an efficiently implemented CGD method may outperform these algorithms despite the fact that the CGD method is not specifically designed just to solve these special classes of problems. 相似文献

9.

��λ��ع�ı�Ҷ˹��ѡ�񷽷�

�� Ȫ ��Ϊ�� 《应用概率统计》2006,35(6):594-610

??When the data has heavy tail feature or contains outliers, conventional variable selection methods based on penalized least squares or likelihood functions perform poorly. Based on Bayesian inference method, we study the Bayesian variable selection problem for median linear models. The Bayesian estimation method is proposed by using Bayesian model selection theory and Bayesian estimation method through selecting the Spike and Slab prior for regression coefficients, and the effective posterior Gibbs sampling procedure is also given. Extensive numerical simulations and Boston house price data analysis are used to illustrate the effectiveness of the proposed method. 相似文献

10.

Semiparametric Estimation and Selection for Nonstationary Spatial Covariance Functions

《Journal of computational and graphical statistics》2013,22(1):117-139

We propose a method for estimating nonstationary spatial covariance functions by representing a spatial process as a linear combination of some local basis functions with uncorrelated random coefficients and some stationary processes, based on spatial data sampled in space with repeated measurements. By incorporating a large collection of local basis functions with various scales at various locations and stationary processes with various degrees of smoothness, the model is flexible enough to represent a wide variety of nonstationary spatial features. The covariance estimation and model selection are formulated as a regression problem with the sample covariances as the response and the covariances corresponding to the local basis functions and the stationary processes as the predictors. A constrained least squares approach is applied to select appropriate basis functions and stationary processes as well as estimate parameters simultaneously. In addition, a constrained generalized least squares approach is proposed to further account for the dependencies among the response variables. A simulation experiment shows that our method performs well in both covariance function estimation and spatial prediction. The methodology is applied to a U.S. precipitation dataset for illustration. Supplemental materials relating to the application are available online. 相似文献

11.

Bayesian l0‐regularized least squares

Nicholas G. Polson Lei Sun 《商业与工业应用随机模型》2019,35(3):717-731

Bayesian l₀‐regularized least squares is a variable selection technique for high‐dimensional predictors. The challenge is optimizing a nonconvex objective function via search over model space consisting of all possible predictor combinations. Spike‐and‐slab (aka Bernoulli‐Gaussian) priors are the gold standard for Bayesian variable selection, with a caveat of computational speed and scalability. Single best replacement (SBR) provides a fast scalable alternative. We provide a link between Bayesian regularization and proximal updating, which provides an equivalence between finding a posterior mode and a posterior mean with a different regularization prior. This allows us to use SBR to find the spike‐and‐slab estimator. To illustrate our methodology, we provide simulation evidence and a real data example on the statistical properties and computational efficiency of SBR versus direct posterior sampling using spike‐and‐slab priors. Finally, we conclude with directions for future research. 相似文献

12.

Data-driven recursive least squares methods for non-affined nonlinear discrete-time systems

《Applied Mathematical Modelling》2020

Aiming at identifying nonlinear systems, one of the most challenging problems in system identification, a class of data-driven recursive least squares algorithms are presented in this work. First, a full form dynamic linearization based linear data model for nonlinear systems is derived. Consequently, a full form dynamic linearization-based data-driven recursive least squares identification method for estimating the unknown parameter of the obtained linear data model is proposed along with convergence analysis and prediction of the outputs subject to stochastic noises. Furthermore, a partial form dynamic linearization-based data-driven recursive least squares identification algorithm is also developed as a special case of the full form dynamic linearization based algorithm. The proposed two identification algorithms for the nonlinear nonaffine discrete-time systems are flexible in applications without relying on any explicit mechanism model information of the systems. Additionally, the number of the parameters in the obtained linear data model can be tuned flexibly to reduce computation complexity. The validity of the two identification algorithms is verified by rigorous theoretical analysis and simulation studies. 相似文献

13.

用Prony方法计算信号数据参数的误差分析 总被引：2，自引：0，他引：2

魏木生《计算数学》1995,17(4):349-359

对于用Prony方法计算信号数据参数的误差分析,指出了提高计算精度的方法和影响精度的因素.我们分析了秩亏LS和TLS方法和误差界,指出用秩亏的Prony方法可大大提高计算精度.同时也指出了T的变化和η的变化对结果精度的影响,以及精确计算重奇点的方法, 相似文献

14.

剔除相关性的最小二乘研究

郑彦玲《数理统计与管理》2009,28(5)

本文提出了一种新的回归模型,剔除相关性的最小二乘,它有效的克服了变量间的相关性,兼顾到变量的筛选。并与最小二乘、向后删除变量法、偏最小二乘比较分析。发现剔除相关性的最小二乘能很好的处理自变量间多重相关性,对变量进行有效的筛选,克服了回归系数反常的现象。相似文献

15.

生长曲线模型的变量选择

下载免费PDF全文

高采文朱晓琳曾林蕊《应用概率统计》2014,30(2):213-222

生长曲线模型是一个典型的多元线性模型, 在现代统计学上占有重要地位. 文章首先基于Potthoff-Roy变换后的生长曲线模型, 采用自适应LASSO为惩罚函数给出了参数矩阵的惩罚最小二乘估计, 实现了变量的选择. 其次, 基于局部渐近二次估计, 对生长曲线模型的惩罚最小二乘估计给出了统一的近似估计表达式. 接着, 讨论了经过Potthoff-Roy变换后模型的惩罚最小二乘估计, 证明了自适应LASSO具有Oracle性质. 最后对几种变量选择方法进行了数据模拟. 结果表明自适应LASSO效果比较好. 另外, 综合考虑, Potthoff-Roy变换优于拉直变换. 相似文献

16.

Iterative weighted partial spline least squares estimation in semiparametric modeling of longitudinal data

孙孝前尤进红《中国科学A辑(英文版)》2003,46(5):724-735

In this paper we consider the estimating problem of a semiparametric regression modelling whenthe data are longitudinal.An iterative weighted partial spline least squares estimator(IWPSLSE)for the para-metric component is proposed which is more efficient than the weighted partial spline least squares estimator(WPSLSE)with weights constructed by using the within-group partial spline least squares residuals in the sense 相似文献

17.

Practical alternating least squares for tensor ring decomposition

Yajie Yu Hanyu Li 《Numerical Linear Algebra with Applications》2024,31(3):e2542

Tensor ring (TR) decomposition has been widely applied as an effective approach in a variety of applications to discover the hidden low-rank patterns in multidimensional and higher-order data. A well-known method for TR decomposition is the alternating least squares (ALS). However, solving the ALS subproblems often suffers from high cost issue, especially for large-scale tensors. In this paper, we provide two strategies to tackle this issue and design three ALS-based algorithms. Specifically, the first strategy is used to simplify the calculation of the coefficient matrices of the normal equations for the ALS subproblems, which takes full advantage of the structure of the coefficient matrices of the subproblems and hence makes the corresponding algorithm perform much better than the regular ALS method in terms of computing time. The second strategy is to stabilize the ALS subproblems by QR factorizations on TR-cores, and hence the corresponding algorithms are more numerically stable compared with our first algorithm. Extensive numerical experiments on synthetic and real data are given to illustrate and confirm the above results. In addition, we also present the complexity analyses of the proposed algorithms. 相似文献

18.

基于灰色关联分析与PLS-LSSVM的航材备件需求预测模型

李文强刘颖《数学的实践与认识》2021,(7):54-60

航材备件是保障航空装备日常训练和作战正常使用的重要影响因素,针对部分航材备件样本数据量少,影响因素多且复杂多变,预测结果与装备系统完好性要求偏差较大等问题.建立基于灰色关联分析(GRA)与偏最小二乘(PLS)及最小二乘向量机(LSSVM)相结合的航材备件预测模型,采集某无人机航材备件数据,通过对统计数据进行灰色关联分析... 相似文献

19.

Regularization tools and robust optimization for ill-conditioned least squares problem: A computational comparison

Maziar Salahi 《Applied mathematics and computation》2011,217(20):7985-7990

Least squares problems arise frequently in many disciplines such as image restorations. In these areas, for the given least squares problem, usually the coefficient matrix is ill-conditioned. Thus if the problem data are available with certain error, then after solving least squares problem with classical approaches we might end up with a meaningless solution. Tikhonov regularization, is one of the most widely used approaches to deal with such situations. In this paper, first we briefly describe these approaches, then the robust optimization framework which includes the errors in problem data is presented. Finally, our computational experiments on several ill-conditioned standard test problems using the regularization tools, a Matlab package for least squares problem, and the robust optimization framework, show that the latter approach may be the right choice. 相似文献

20.

PLS regression: A directional signal-to-noise ratio approach

Pierre Druilhet Alain Mom 《Journal of multivariate analysis》2006,97(6):1313-1329

We present a new approach to univariate partial least squares regression (PLSR) based on directional signal-to-noise ratios (SNRs). We show how PLSR, unlike principal components regression, takes into account the actual value and not only the variance of the ordinary least squares (OLS) estimator. We find an orthogonal sequence of directions associated with decreasing SNR. Then, we state partial least squares estimators as least squares estimators constrained to be null on the last directions. We also give another procedure that shows how PLSR rebuilds the OLS estimator iteratively by seeking at each step the direction with the largest difference of signals over the noise. The latter approach does not involve any arbitrary scale or orthogonality constraints. 相似文献