首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
结合偏最小二乘法和支持向量机的优缺点,提出基于偏最小二乘支持向量机的天然气消费量预测模型。首先,利用偏最小二乘法确定影响天然气消费量的新综合变量,建立以新综合变量为输入,天然气消费量为输出的支持向量机模型,对天然气消费量进行了预测;然后,与多元回归、偏最小二乘回归、普通支持向量机做误差检验比较,验证该方法的可行性与正确性。结果表明,此天然气消费量预测模型具有较高的精确度和应用价值。  相似文献   

2.
This paper presents an extension of the standard regression tree method to clustered data. Previous works extending tree methods to accommodate correlated data are mainly based on the multivariate repeated-measures approach. We propose a “mixed effects regression tree” method where the correlated observations are viewed as nested within clusters rather than as vectors of multivariate repeated responses. The proposed method can handle unbalanced clusters, allows observations within clusters to be split, and can incorporate random effects and observation-level covariates. We implemented the proposed method using a standard tree algorithm within the framework of the expectation-maximization (EM) algorithm. The simulation results show that the proposed regression tree method provides substantial improvements over standard trees when the random effects are non negligible. A real data example is used to illustrate the method.  相似文献   

3.
In this article, we propose and explore a multivariate logistic regression model for analyzing multiple binary outcomes with incomplete covariate data where auxiliary information is available. The auxiliary data are extraneous to the regression model of interest but predictive of the covariate with missing data. Horton and Laird [N.J. Horton, N.M. Laird, Maximum likelihood analysis of logistic regression models with incomplete covariate data and auxiliary information, Biometrics 57 (2001) 34–42] describe how the auxiliary information can be incorporated into a regression model for a single binary outcome with missing covariates, and hence the efficiency of the regression estimators can be improved. We consider extending the method of [9] to the case of a multivariate logistic regression model for multiple correlated outcomes, and with missing covariates and completely observed auxiliary information. We demonstrate that in the case of moderate to strong associations among the multiple outcomes, one can achieve considerable gains in efficiency from estimators in a multivariate model as compared to the marginal estimators of the same parameters.  相似文献   

4.
In this paper, we consider the ultra-high dimensional partially linear model, where the dimensionality p of linear component is much larger than the sample size n, and p can be as large as an exponential of the sample size n. Firstly, we transform the ultra-high dimensional partially linear model into the ultra-high dimensional linear model based the profile technique used in the semiparametric regression. Secondly, in order to finish the variable screening for high-dimensional linear component, we propose a variable screening method called as the profile greedy forward regression (PGFR) by combining the greedy algorithm with the forward regression (FR) method. The proposed PGFR method not only considers the correlation between the covariates, but also identifies all relevant predictors consistently and possesses the screening consistency property under the some regularity conditions. We further propose the BIC criterion to determine whether the selected model contains the true model with probability tending to one. Finally, some simulation studies and a real application are conducted to examine the finite sample performance of the proposed PGFR procedure.  相似文献   

5.
Several papers have already stressed the interest of latent root regression and its similarities to partial least squares regression. A new formulation of this method which makes it even simpler than the original method to set up a prediction model is discussed. Furthermore, it is shown how this method can be extended not only to the case where it is desired to predict several response variables from a set of predictors but also to the multiblock setting where the aim is to predict one or several data sets from several other data sets. The interest of the method is illustrated on the basis of a data set pertaining to epidemiology.  相似文献   

6.
To perform multiple regression, the least squares estimator is commonly used. However, this estimator is not robust to outliers. Therefore, robust methods such as S-estimation have been proposed. These estimators flag any observation with a large residual as an outlier and downweight it in the further procedure. However, a large residual may be caused by an outlier in only one single predictor variable, and downweighting the complete observation results in a loss of information. Therefore, we propose the shooting S-estimator, a regression estimator that is especially designed for situations where a large number of observations suffer from contamination in a small number of predictor variables. The shooting S-estimator combines the ideas of the coordinate descent algorithm with simple S-regression, which makes it robust against componentwise contamination, at the cost of failing the regression equivariance property.  相似文献   

7.
Abstract

We present a method for graphically displaying regression data with Bernoulli responses. The method, which is based on the use of grayscale graphics to visualize contributions to a likelihood function, provides an analog of a scatterplot for logistic regression, as well as probit analysis. Furthermore, the method may be used in place of a traditional scatterplot in situations where such plots are often used.  相似文献   

8.
We consider regression models with multiple correlated responses for each design point. Under the null hypothesis, a linear regression is assumed. For the least-squares residuals of this linear regression, we establish the limit of the partial sums. This limit is a projection on a certain subspace of the reproducing Kernel Hilbert space of a multivariate Brownian motion. Based on this limit, we propose a significance test of Kolmogorov-Smirnov type to test the null hypothesis and show that this result can be used to study a change-point problem in the case of linear profile data (panel data). We compare our proposed method, which does not rely on any distributional assumptions, with the likelihood ratio test in a simulation study.  相似文献   

9.
Recent sufficient dimension reduction methodologies in multivariate regression do not have direct application to a categorical predictor. For this, we define the multivariate central partial mean subspace and propose two methodologies to estimate it. The first method uses the ordinary least squares. Chi-squared distributed statistics for dimension tests are constructed, and an estimate of the target subspace is consistent and efficient. Moreover, the effects of continuous predictors can be tested without assuming any model. The second method extends Iterative Hessian Transformation to this context. For dimension estimation, permutation tests are used. Simulated and real data examples for illustrating various properties of the proposed methods are presented.  相似文献   

10.
Multivariate analysis of variance (MANOVA) extends the ideas and methods of univariate ANOVA in simple and straightforward ways. But the familiar graphical methods typically used for univariate ANOVA are inadequate for showing how measures in a multivariate response vary with each other, and how their means vary with explanatory factors. Similarly, the graphical methods commonly used in multiple regression are not widely available or used in multivariate multiple regression (MMRA). We describe a variety of graphical methods for multiple-response (MANOVA and MMRA) data aimed at understanding what is being tested in a multivariate test, and how factor/predictor effects are expressed across multiple response measures.

In particular, we describe and illustrate: (a) Data ellipses and biplots for multivariate data; (b) HE plots, showing the hypothesis and error covariance matrices for a given pair of responses, and a given effect; (c) HE plot matrices, showing all pairwise HE plots; and (d) reduced-rank analogs of HE plots, showing all observations, group means, and their relations to the response variables. All of these methods are implemented in a collection of easily used SAS macro programs.  相似文献   

11.
This paper is concerned with feature screening for ultrahigh-dimensional covariates under general varying-coefficient models. With the sparsity principle and based on the conditional distance correlation, we develop a new marginal feature screening procedure called CDC-SIS to select significant covariates and show that it possesses the sure screening property and ranking consistency property under some regularity conditions. The proposed procedure enjoys two appealing merits. First, the model we considered is more flexible than traditional varying-coefficients regression models, so the method can be used in a wider range of applications. Second, CDC-SIS can be used directly to deal with grouped predictor variables and multivariate responses. We assess the finite sample properties of the proposed procedure by Monte Carlo studies, and illustrate our method by an empirical analysis of a real data set. Compared with other similar works, our procedure yields better performance.  相似文献   

12.
In this paper, an attempt has been made to develop a pre-harvest forecast of sugarcane yield. The forecast is based on plant biometrical characteristics such as plant height, girth of cane, number of canes per plot and width of third leaf from the top. Some of these characteristics are correlated and hence the conventional practice of fitting a regression model using least square technique for estimating the parameters does not lead to satisfactory pre-harvest forecasts. Keeping in view the difficulties relating to the violation of the assumptions of normality, independence and homoscedasticity of classical multivariate regression analysis, the present study proposes an alternative approach, which is free from assumptions. It employs a goal programming formulation to estimate the pre-harvest yield of sugarcane on the basis of measurements on biometrical characters of the plant. In order to assess the quality of forecasts, variance of residuals obtained from the proposed method has been compared with that obtained from the conventional regression analysis. The study reveals that there is no significant difference (P-value=0.43461) in the variances of the two residual series. Thus, without compromising the quality of forecast, the proposed alternative methodology can be adopted to estimate the sugarcane yield 3 months before harvest in situations, where the assumptions of conventional regression analysis are violated.  相似文献   

13.
黄超 《大学数学》2012,(1):79-83
逐步回归是多元回归分析筛选自变量的一种重要思想方法。利用矩阵消去变换的知识,证明了逐步回归第一步引入变量,第二步引入变量,第三步不可能剔除变量。最后对经典Hald数据利用SAS统计软件,编程实现逐步回归分析.  相似文献   

14.
Definitive screening designs (DSDs) are a class of experimental designs that allow the estimation of linear, quadratic, and interaction effects with little experimental effort if there is effect sparsity. The number of experimental runs is twice the number of factors of interest plus one. Many industrial experiments involve nonnormal responses. Generalized linear models (GLMs) are a useful alternative for analyzing these kind of data. The analysis of GLMs is based on asymptotic theory, something very debatable, for example, in the case of the DSD with only 13 experimental runs. So far, analysis of DSDs considers a normal response. In this work, we show a five‐step strategy that makes use of tools coming from the Bayesian approach to analyze this kind of experiment when the response is nonnormal. We consider the case of binomial, gamma, and Poisson responses without having to resort to asymptotic approximations. We use posterior odds that effects are active and posterior probability intervals for the effects and use them to evaluate the significance of the effects. We also combine the results of the Bayesian procedure with the lasso estimation procedure to enhance the scope of the method. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

15.
基于errors-in-variables的预测模型及其应用   总被引:1,自引:0,他引:1  
预测是统计学实际应用的一个主要方面,多元线性回归预测是一种很好的方法,广泛地应用在各种实际领域,但其局限性及不足也是明显的。本文以一种新的观点认识数据,即认为变量的观测里均含有误差,同时认为不应删除经慎重选择进来的解释变量。为此,本文提出了一种新的多元预测方法———多元线性EIV预测。本文还考虑了新预测模型的一个实例应用,并从相对偏差上与多元回归预测进行了比较,从而揭示了多元线性EIV预测的先进性及较好的预测精度。  相似文献   

16.
This article proposes a Bayesian approach for the sparse group selection problem in the regression model. In this problem, the variables are partitioned into different groups. It is assumed that only a small number of groups are active for explaining the response variable, and it is further assumed that within each active group only a small number of variables are active. We adopt a Bayesian hierarchical formulation, where each candidate group is associated with a binary variable indicating whether the group is active or not. Within each group, each candidate variable is also associated with a binary indicator, too. Thus, the sparse group selection problem can be solved by sampling from the posterior distribution of the two layers of indicator variables. We adopt a group-wise Gibbs sampler for posterior sampling. We demonstrate the proposed method by simulation studies as well as real examples. The simulation results show that the proposed method performs better than the sparse group Lasso in terms of selecting the active groups as well as identifying the active variables within the selected groups. Supplementary materials for this article are available online.  相似文献   

17.
The major purpose of this paper is to evaluate the practical use of statistical techniques in both the generalization or analysis of simulation results, and the design of simulation experiments. This problem is investigated with the help of a real-life system, namely the container terminus of ECT in Rotterdam. This system is modeled by a simulation program. The relationship between the simulation response and its input variables is modeled by a linear regression model: metamodel or auxiliary model. The paper summarizes regression analysis including generalized least squares which might be used for simulation responses with non-constant variances. The validity of the postulated regression metamodel is tested statistically: F- and t-statistics. The selection of the situations to be simulated, is done through experimental design methodology, permitting both quantitative and qualitative factors. The statistical techniques apply not only to simulation but also to real-life experiments.  相似文献   

18.
面板数据经常出现在许多研究领域, 比如纵向跟踪研究. 在很多情况下, 纵向反应变量与观察 时间和删失时间都有关系. 本文在有偏抽样下, 针对这些相关性存在的情况, 利用一个不能观察的潜在 变量, 提出了一个联合建模方法来刻画纵向反应变量与观察时间和删失时间的相关性, 获得了模型中 回归参数的估计方程以及估计的渐近性质, 并通过数值模拟验证了这些估计在小样本下也是有效的, 同时把该估计方法用于一组实际的膀胱癌数据分析中.  相似文献   

19.
The confidence prediction of the mean value ofmultiple responses in a linear multivariate normal regression model is considered. In order to solve it, confidence intervals of the mean value of multiple responses and its predicted value are obtained. They are numerically modeled and analyzed in comparison with known analogues for regression and individual response.  相似文献   

20.
SAS6.11版岭回归分析程序设计及其实例分析   总被引:9,自引:0,他引:9  
应用岭回归分析可以解决自变量之间存在复共线性时的回归问题。本文给出了在SAS6.1 1及以上版本中实现岭回归分析的程序 ,用具体实例说明进行岭回归的方法  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号