首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Multiblock component methods are applied to data sets for which several blocks of variables are measured on a same set of observations with the goal to analyze the relationships between these blocks of variables. In this article, we focus on multiblock component methods that integrate the information found in several blocks of explanatory variables in order to describe and explain one set of dependent variables. In the following, multiblock PLS and multiblock redundancy analysis are chosen, as particular cases of multiblock component methods when one set of variables is explained by a set of predictor variables that is organized into blocks. Because these multiblock techniques assume that the observations come from a homogeneous population they will provide suboptimal results when the observations actually come from different populations. A strategy to palliate this problem—presented in this article—is to use a technique such as clusterwise regression in order to identify homogeneous clusters of observations. This approach creates two new methods that provide clusters that have their own sets of regression coefficients. This combination of clustering and regression improves the overall quality of the prediction and facilitates the interpretation. In addition, the minimization of a well-defined criterion—by means of a sequential algorithm—ensures that the algorithm converges monotonously. Finally, the proposed method is distribution-free and can be used when the explanatory variables outnumber the observations within clusters. The proposed clusterwise multiblock methods are illustrated with of a simulation study and a (simulated) example from marketing.  相似文献   

2.
This paper presents an extension of the standard regression tree method to clustered data. Previous works extending tree methods to accommodate correlated data are mainly based on the multivariate repeated-measures approach. We propose a “mixed effects regression tree” method where the correlated observations are viewed as nested within clusters rather than as vectors of multivariate repeated responses. The proposed method can handle unbalanced clusters, allows observations within clusters to be split, and can incorporate random effects and observation-level covariates. We implemented the proposed method using a standard tree algorithm within the framework of the expectation-maximization (EM) algorithm. The simulation results show that the proposed regression tree method provides substantial improvements over standard trees when the random effects are non negligible. A real data example is used to illustrate the method.  相似文献   

3.
A numerical comparison is made between three integration methods for semi-discrete parabolic partial differential equations in two space variables with a mixed derivative. Linear as well as non-linear equations are considered. The integration methods are the well-known one-step line hopscotch method, a four-step line hopscotch method, and a stabilized, explicit Runge-Kutta method.  相似文献   

4.
Additive models and tree-based regression models are two main classes of statistical models used to predict the scores on a continuous response variable. It is known that additive models become very complex in the presence of higher order interaction effects, whereas some tree-based models, such as CART, have problems capturing linear main effects of continuous predictors. To overcome these drawbacks, the regression trunk model has been proposed: a multiple regression model with main effects and a parsimonious amount of higher order interaction effects. The interaction effects can be represented by a small tree: a regression trunk. This article proposes a new algorithm—Simultaneous Threshold Interaction Modeling Algorithm (STIMA)—to estimate a regression trunk model that is more general and more efficient than the initial one (RTA) and is implemented in the R-package stima. Results from a simulation study show that the performance of STIMA is satisfactory for sample sizes of 200 or higher. For sample sizes of 300 or higher, the 0.50 SE rule is the best pruning rule for a regression trunk in terms of power and Type I error. For sample sizes of 200, the 0.80 SE rule is recommended. Results from a comparative study of eight regression methods applied to ten benchmark datasets suggest that STIMA and GUIDE are the best performers in terms of cross-validated prediction error. STIMA appeared to be the best method for datasets containing many categorical variables. The characteristics of a regression trunk model are illustrated using the Boston house price dataset.

Supplemental materials for this article, including the R-package stima, are available online.  相似文献   

5.
In this paper it is argued that all multivariate estimation methods, such as OLS regression, simultaneous linear equations systems and, more widely, what are known as LISREL methods, have merit as geometric approximation methods, even if the observations are not drawn from a multivariate normal parent distribution and consequently cannot be viewed as ML estimators. It is shown that for large samples the asymptotical distribution of any estimator, being a totally differentiable covariance function, may be assessed by the δ method. Finally, we stress that the design of the sample and a priori knowledge about the parent distribution may be incorporated to obtain more specific results. It turns out that some fairly traditional assumptions, such as assuming some variables to be non-random, fixed over repeated samples, or the existence of a parent normal distribution, may have dramatic effects on the assessment of standard deviations and confidence bounds, if such assumptions are not realistic. The method elaborated by us does not make use of such assumptions.  相似文献   

6.
Multivariate cubic polynomial optimization problems, as a special case of the general polynomial optimization, have a lot of practical applications in real world. In this paper, some necessary local optimality conditions and some necessary global optimality conditions for cubic polynomial optimization problems with mixed variables are established. Then some local optimization methods, including weakly local optimization methods for general problems with mixed variables and strongly local optimization methods for cubic polynomial optimization problems with mixed variables, are proposed by exploiting these necessary local optimality conditions and necessary global optimality conditions. A global optimization method is proposed for cubic polynomial optimization problems by combining these local optimization methods together with some auxiliary functions. Some numerical examples are also given to illustrate that these approaches are very efficient.  相似文献   

7.
It is increasingly common to be faced with longitudinal or multi-level data sets that have large numbers of predictors and/or a large sample size. Current methods of fitting and inference for mixed effects models tend to perform poorly in such settings. When there are many variables, it is appealing to allow uncertainty in subset selection and to obtain a sparse characterization of the data. Bayesian methods are available to address these goals using Markov chain Monte Carlo (MCMC), but MCMC is very computationally expensive and can be infeasible in large p and/or large n problems. As a fast approximate Bayes solution, we recommend a novel approximation to the posterior relying on variational methods. Variational methods are used to approximate the posterior of the parameters in a decomposition of the variance components, with priors chosen to obtain a sparse solution that allows selection of random effects. The method is evaluated through a simulation study, and applied to an epidemiological application.  相似文献   

8.
Hybrid methodology involving differential equations modeling and statistical regression is developed in order to test basic ideas in asset price dynamics. In particular, the method provides a mechanism for testing the relative importance of price trend compared with valuation. The significance of yearly highs in prices can also be understood through this procedure. A large data set of 52 closed-end funds comprising about 61,500 data points is used with the mixed effects model in SPlus. The model suggests that the role of the trend is as significant as the valuation. Upon determination of the coefficients, one has a model that can be used for short term forecasts of asset prices. The model incorporates the finiteness of assets and the importance of “liquidity”, or “excess cash”. The statistics utilize data on the number of shares and the national money supply. The methodology can easily be extended to other behavioral effects.  相似文献   

9.
This paper provides a new methodology to solve bilinear, non-convex mathematical programming problems by a suitable transformation of variables. Schur's decomposition and special ordered sets (SOS) type 2 constraints are used resulting in a mixed integer linear or quadratic program in the two applications shown. While Beale, Tomlin and others developed the use of SOS type 2 variables to handle non-convexities, our approach is novel in two aspects. First, the use of Schur's decomposition as an integral part of the approximation step is new and leads to a numerically viable method to separate the variables. Second, the combination of our approach for handling bilinear side constraints in a complementarity or equilibrium problem setting is also new and opens the way to many interesting and realistic modifications to such models. We contrast our approach with other methods for solving bilinear problems also known as indefinite quadratic programs. From a practical point of view our methodology is helpful since no specialized procedures need to be created so that existing solvers can be used. The approach is illustrated with two engineering examples and the mathematical analysis appears in the Appendices.  相似文献   

10.
A data analysis method is proposed to derive a latent structure matrix from a sample covariance matrix. The matrix can be used to explore the linear latent effect between two sets of observed variables. Procedures with which to estimate a set of dependent variables from a set of explanatory variables by using latent structure matrix are also proposed. The proposed method can assist the researchers in improving the effectiveness of the SEM models by exploring the latent structure between two sets of variables. In addition, a structure residual matrix can also be derived as a by-product of the proposed method, with which researchers can conduct experimental procedures for variables combinations and selections to build various models for hypotheses testing. These capabilities of data analysis method can improve the effectiveness of traditional SEM methods in data property characterization and models hypotheses testing. Case studies are provided to demonstrate the procedure of deriving latent structure matrix step by step, and the latent structure estimation results are quite close to the results of PLS regression. A structure coefficient index is suggested to explore the relationships among various combinations of variables and their effects on the variance of the latent structure.  相似文献   

11.
The kernel-based regression (KBR) method, such as support vector machine for regression (SVR) is a well-established methodology for estimating the nonlinear functional relationship between the response variable and predictor variables. KBR methods can be very sensitive to influential observations that in turn have a noticeable impact on the model coefficients. The robustness of KBR methods has recently been the subject of wide-scale investigations with the aim of obtaining a regression estimator insensitive to outlying observations. However, existing robust KBR (RKBR) methods only consider Y-space outliers and, consequently, are sensitive to X-space outliers. As a result, even a single anomalous outlying observation in X-space may greatly affect the estimator. In order to resolve this issue, we propose a new RKBR method that gives reliable result even if a training data set is contaminated with both Y-space and X-space outliers. The proposed method utilizes a weighting scheme based on the hat matrix that resembles the generalized M-estimator (GM-estimator) of conventional robust linear analysis. The diagonal elements of hat matrix in kernel-induced feature space are used as leverage measures to downweight the effects of potential X-space outliers. We show that the kernelized hat diagonal elements can be obtained via eigen decomposition of the kernel matrix. The regularized version of kernelized hat diagonal elements is also proposed to deal with the case of the kernel matrix having full rank where the kernelized hat diagonal elements are not suitable for leverage. We have shown that two kernelized leverage measures, namely, the kernel hat diagonal element and the regularized one, are related to statistical distance measures in the feature space. We also develop an efficiently kernelized training algorithm for the parameter estimation based on iteratively reweighted least squares (IRLS) method. The experimental results from simulated examples and real data sets demonstrate the robustness of our proposed method compared with conventional approaches.  相似文献   

12.
双曲型积分微分方程H~1-Galerkin混合元法的误差估计   总被引:14,自引:1,他引:14  
王瑞文 《计算数学》2006,28(1):19-30
本文用H1-Galerkin混合有限元法分析了基于带有记忆项的多孔介质中的对流问题的数学模型,即双曲型积分微分方程.我们得到了在一维情况下函数和它梯度的最优阶误差估计, 并且由此推广到二维和三维情况下,得到了和用传统的混合元方法相同的收敛阶数,而且不用验证满足LBB相容性条件.  相似文献   

13.
Mutual fund investors are concerned with the selection of the best fund in terms of performance among the set of alternative funds. This paper proposes an innovative mutual funds performance evaluation measure in the context of multicriteria decision making. We implement a multicriteria methodology using stochastic multicriteria acceptability analysis, on Greek domestic equity funds for the period 2000-2009. Combining a unique dataset of risk-adjusted returns such as Carhart’s alpha with funds’ cost variables, we obtain a multicriteria performance evaluation and ranking of the mutual funds, by means of an additive value function model. The main conclusion is that among employed variables, the sophisticated Carhart’s alpha plays the most important role in determining fund rankings. On the other hand, funds’ rankings are affected only marginally by operational attributes. We believe that our results could have serious implications either in terms of a fund rating system or for constructing optimal combinations of portfolios.  相似文献   

14.
In this article, for Lasso penalized linear regression models in high-dimensional settings, we propose a modified cross-validation (CV) method for selecting the penalty parameter. The methodology is extended to other penalties, such as Elastic Net. We conduct extensive simulation studies and real data analysis to compare the performance of the modified CV method with other methods. It is shown that the popular K-fold CV method includes many noise variables in the selected model, while the modified CV works well in a wide range of coefficient and correlation settings. Supplementary materials containing the computer code are available online.  相似文献   

15.
One of the methods for solving mixed problems is the classical separation of variables (the Fourier method). If the boundary conditions of the mixed problem are irregular, this method, generally speaking, is not applicable. In the present paper, a generalized separation of variables and a way of application of this method to solving some mixed problems with irregular boundary conditions are proposed. Analytical representation of the solution to this irregular mixed problem is obtained.  相似文献   

16.
In econometrics it is common for variables to be related together in a set of linear, multilateral and causal interdependencies. This type of system generally has properties which are unsatisfactory for application of classical regression techniques. Consequently, alternative estimation methods have been developed. This paper explores the relations between several such methods in terms of symmetric idempotents of predetermined variables and their orthogonal complements. Generalizations of two‐ and three‐stage least squares and instrumental variables are considered, including Wicken's estimator.2 The relative efficiencies of the estimators are also discussed.

  相似文献   

17.
In software defect prediction with a regression model, too many metrics extracted from static code and aggregated (sum, avg, max, min) from methods into classes can be candidate features, and the classical feature selection methods, such as AIC, BIC, should be processed at a given model. As a result, the selected feature sets are significantly different for various models without a reasonable interpretation. Maximal information coefficient (MIC) presented by Reshef et al.\ucite{4} is a novel method to measure the degree of the interdependence between two continuous variables, and an available computing method is also given based on the observations. This paper firstly use the MIC between defect counts and each feature to select features, and then conduct the power transformation on the selected features, and finally build up the principal component Poisson and negative binomial regression model. All experiments are conducted on KC1 data set in NASA repository on the level of class. The block-regularized $m\times 2$ cross-validated sequential $t$-test is employed to test the difference of performance of two models. The performance measures of a model in this paper are FPA, AAE, ARE. The experimental results show that 1) the aggregated features, such as sum, avg, max, are selected by MIC except min, which are significantly different from AIC, BIC; 2) the power transformation to the features can improve the performance for majority of models; 3) after PCA and factorial analysis, two clear factors are obtained in the model. One corresponds to the aggregated features via avg and max, and the other corresponds to the aggregated features with sum. Therefore, the model owns a reasonable interpretation. Conclusively, the aggregated features with sum, avg, max are significantly effective for software defect prediction, and the regression model based on the selected features by MIC has some advantages.  相似文献   

18.
The aim of this paper is to compare different fuzzy regression methods in the assessment of the information content on future realised volatility of option-based volatility forecasts. These methods offer a suitable tool to handle both imprecision of measurements and fuzziness of the relationship among variables. Therefore, they are particularly useful for volatility forecasting, since the variable of interest (realised volatility) is unobservable and a proxy for it is used. Moreover, measurement errors in both realised volatility and volatility forecasts may affect the regression results. We compare both the possibilistic regression method of Tanaka et al. (IEEE Trans Syst Man Cybern 12:903–907, 1982) and the least squares fuzzy regression method of Savic and Pedrycz (Fuzzy Sets Syst 39:51–63, 1991). In our case study, based on intra-daily data of the DAX-index options market, both methods have proved to have advantages and disadvantages. Overall, among the two methods, we prefer the Savic and Pedrycz (Fuzzy Sets Syst 39:51–63, 1991) method, since it contains as special case (the central line) the ordinary least squares regression, is robust to the analysis of the variables in logarithmic terms or in levels, and provides sharper results than the Tanaka et al. (IEEE Trans Syst Man Cybern 12:903–907, 1982) method.  相似文献   

19.
Exact global optimization of the clusterwise regression problem is challenging and there are currently no published feasible methods for performing this clustering optimally, even though it has been over thirty years since its original proposal. This work explores global optimization of the clusterwise regression problem using mathematical programming and related issues. A mixed logical-quadratic programming formulation with implication of constraints is presented and contrasted against a quadratic formulation based on the traditional big-M, which cannot guarantee optimality because the regression line coefficients, and thus errors, may be arbitrarily large. Clusterwise regression optimization times and solution optimality for two clusters are empirically tested on twenty real datasets and three series of synthetic datasets ranging from twenty to one hundred observations and from two to ten independent variables. Additionally, a few small real datasets are clustered into three lines.  相似文献   

20.
伪双曲方程的新混合有限元方法   总被引:1,自引:1,他引:1  
构造分析一类二阶伪双曲方程的H1-Galerkin扩展混合有限元方法,该方法采用了扩展混合有限元方法和H1-Galerkin混合有限元方法相结合的技巧.新的格式同时保持了扩展混合有限元方法和H1-Galerkin混合有限元方法的优点.该混合格式与标准的混合格式相比能同时逼近三个变量:未知函数、梯度和流量(系数乘以梯度),并且不必满足LBB相容性条件.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号