首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Multivariate adaptive regression splines (MARS) has become a popular data mining (DM) tool due to its flexible model building strategy for high dimensional data. Compared to well-known others, it performs better in many areas such as finance, informatics, technology and science. Many studies have been conducted on improving its performance. For this purpose, an alternative backward stepwise algorithm is proposed through Conic-MARS (CMARS) method which uses a penalized residual sum of squares for MARS as a Tikhonov regularization problem. Additionally, by modifying the forward step of MARS via mapping approach, a time efficient procedure has been introduced by S-FMARS. Inspiring from the advantages of MARS, CMARS and S-FMARS, two hybrid methods are proposed in this study, aiming to produce time efficient DM tools without degrading their performances especially for large datasets. The resulting methods, called SMARS and SCMARS, are tested in terms of several performance criteria such as accuracy, complexity, stability and robustness via simulated and real life datasets. As a DM application, the hybrid methods are also applied to an important field of finance for predicting interest rates offered by a Turkish bank to its customers. The results show that the proposed hybrid methods, being the most time efficient with competing performances, can be considered as powerful choices particularly for large datasets.  相似文献   

2.
In high dimensional data modeling, Multivariate Adaptive Regression Splines (MARS) is a popular nonparametric regression technique used to define the nonlinear relationship between a response variable and the predictors with the help of splines. MARS uses piecewise linear functions for local fit and apply an adaptive procedure to select the number and location of breaking points (called knots). The function estimation is basically generated via a two-stepwise procedure: forward selection and backward elimination. In the first step, a large number of local fits is obtained by selecting large number of knots via a lack-of-fit criteria; and in the latter one, the least contributing local fits or knots are removed. In conventional adaptive spline procedure, knots are selected from a set of all distinct data points that makes the forward selection procedure computationally expensive and leads to high local variance. To avoid this drawback, it is possible to restrict the knot points to a subset of data points. In this context, a new method is proposed for knot selection which bases on a mapping approach like self organizing maps. By this method, less but more representative data points are become eligible to be used as knots for function estimation in forward step of MARS. The proposed method is applied to many simulated and real datasets, and the results show that it proposes a time efficient forward step for the knot selection and model estimation without degrading the model accuracy and prediction performance.  相似文献   

3.
结合偏最小二乘法和支持向量机的优缺点,提出基于偏最小二乘支持向量机的天然气消费量预测模型。首先,利用偏最小二乘法确定影响天然气消费量的新综合变量,建立以新综合变量为输入,天然气消费量为输出的支持向量机模型,对天然气消费量进行了预测;然后,与多元回归、偏最小二乘回归、普通支持向量机做误差检验比较,验证该方法的可行性与正确性。结果表明,此天然气消费量预测模型具有较高的精确度和应用价值。  相似文献   

4.
Multivariate adaptive regression spline (MARS) is a statistical modeling method used to represent a complex system. More recently, a version of MARS was modified to be piecewise linear. This paper presents a mixed integer linear program, called MARSOPT, that optimizes a non-convex piecewise linear MARS model subject to constraints that include both linear regression models and piecewise linear MARS models. MARSOPT is customized for an automotive crash safety system design problem for a major US automaker and solved using branch and bound. The solutions from MARSOPT are compared with those from customized genetic algorithms.  相似文献   

5.
Multivariate adaptive regression splines (MARS) is a popular nonparametric regression tool often used for prediction and for uncovering important data patterns between the response and predictor variables. The standard MARS algorithm assumes responses are normally distributed and independent, but in this article we relax both of these assumptions by extending MARS to generalized estimating equations. We refer to this MARS-for-GEEs algorithm as “MARGE.” Our algorithm makes use of fast forward selection techniques, such that in the univariate case, MARGE has similar computation speed to a standard MARS implementation. Through simulation we show that the proposed algorithm has improved predictive performance than the original MARS algorithm when using correlated and/or nonnormal response data. MARGE is also competitive with alternatives in the literature, especially for problems with multiple interacting predictors. We apply MARGE to various ecological examples with different data types. Supplementary material for this article is available online.  相似文献   

6.

Knowledge management is widely considered as a strategic tool to increase firm performance by enabling the reuse of organizational knowledge. Although many have studied knowledge management in a variety of business settings, the concept of tacit knowledge, especially the individual one, has not been explored in due detail. The objective of this study is to identify and prioritize individual tacit knowledge criteria and to explain their effects on firm performance. In the proposed methodology, first, the most prevalent individual tacit knowledge variables are identified by means of knowledge elicitation and feature selection methods. Then, the extracted variables were prioritized using machine learning methods and fuzzy Analytic Hierarchy Process (AHP). Support vector machine (SVM), logistic regression, and artificial neural networks are used as the first approach, followed by fuzzy AHP as the second approach. Based on the comparative analysis results, SVM (as the best-performed machine-learning technique) and fuzzy AHP methods were identified for the subsequent analysis. The results showed that both SVM and fuzzy AHP determined time efficiency of employees, communication between employees and supervisors, and innovative capability of employees as the most important tacit knowledge criteria. These findings are mostly supported by the extant literature, and collectively shows the synergistic nature of the utilized analytics approaches in determining individual tacit knowledge criteria.

  相似文献   

7.
Although support vector regression models are being used successfully in various applications, the size of the business datasets with millions of observations and thousands of variables makes training them difficult, if not impossible to solve. This paper introduces the Row and Column Selection Algorithm (ROCSA) to select a small but informative dataset for training support vector regression models with standard SVM tools. ROCSA uses ε-SVR models with L1-norm regularization of the dual and primal variables for the row and column selection steps, respectively. The first step involves parallel processing of data chunks and selects a fraction of the original observations that are either representative of the pattern identified in the chunk, or represent those observations that do not fit the identified pattern. The column selection step dramatically reduces the number of variables and the multicolinearity in the dataset, increasing the interpretability of the resulting models and their ease of maintenance. Evaluated on six retail datasets from two countries and a publicly available research dataset, the reduced ROCSA training data improves the predictive accuracy on average by 39% compared with the original dataset when trained with standard SVM tools. Comparison with the ε SSVR method using reduced kernel technique shows similar performance improvement. Training a standard SVM tool with the ROCSA selected observations improves the predictive accuracy on average by 21% compared to the practical approach of random sampling.  相似文献   

8.
为快速、准确地对胎膜早破进行预测,首次应用了一种新型的数据挖掘技术-支持向量机预测模型.该模型针对所获取的胎膜早破及正常破膜数据集100个病例进行建模,并与神经网络、Logistic回归建模的性能进行了比较.结果表明,支持向量机具有可调参数少、学习速度快等优点,计算所得到的结果无论从准确率,还是所获取知识的可理解性等方面,都优于常用的神经网络等方法.用支持向量机方法建立的胎膜早破预测模型合理可行.  相似文献   

9.
Using advanced machine learning techniques as an alternative to conventional double-entry volume equations, a regression model of the inside-bark volume (dependent variable) for standing Eucalyptus globulus trunks (or main stems) has been built as a function of the following three independent variables: age, height and outside-bark diameter at breast height (DBH). The experimental observed data (age, height, outside-bark DBH and inside-bark volume) for 142 trees (E. globulus) were measured and a nonlinear model was built using a data-mining methodology based on support vector machines (SVM) and multilayer perceptron networks (MLP) for regression problems. Coefficients of determination and Furnival’s indices indicate the superiority of the SVM with a radial kernel over the allometric regression models and the MLP.  相似文献   

10.
In Korea, many forms of credit guarantees have been issued to fund small and medium enterprises (SMEs) with a high degree of growth potential in technology. However, a high default rate among funded SMEs has been reported. In order to effectively manage such governmental funds, it is important to develop an accurate scoring model for selecting promising SMEs. This paper provides a support vector machines (SVM) model to predict the default of funded SMEs, considering various input variables such as financial ratios, economic indicators, and technology evaluation factors. The results show that the accuracy performance of the SVM model is better than that of back-propagation neural networks (BPNs) and logistic regression. It is expected that the proposed model can be applied to a wide range of technology evaluation and loan or investment decisions for technology-based SMEs.  相似文献   

11.
在支持向量机预测建模中,核函数用来将低维特征空间中的非线性问题映射为高维特征空间中的线性问题.核函数的特征对于支持向量机的学习和预测都有很重要的影响.考虑到两种典型核函数—全局核(多项式核函数)和局部核(RBF核函数)在拟合与泛化方面的特性,采用了一种基于混合核函数的支持向量机方法用于预测建模.为了评价不同核函数的建模效果、得到更好的预测性能,采用遗传算法自适应进化支持向量机模型的各项参数,并将其应用于装备费用预测的实际问题中.实际计算表明采用混合核函数的支持向量机较单一核函数时有更好的预测性能,可以作为一种有效的预测建模方法在装备管理中推广应用.  相似文献   

12.
13.
In this paper, we consider the ultra-high dimensional partially linear model, where the dimensionality p of linear component is much larger than the sample size n, and p can be as large as an exponential of the sample size n. Firstly, we transform the ultra-high dimensional partially linear model into the ultra-high dimensional linear model based the profile technique used in the semiparametric regression. Secondly, in order to finish the variable screening for high-dimensional linear component, we propose a variable screening method called as the profile greedy forward regression (PGFR) by combining the greedy algorithm with the forward regression (FR) method. The proposed PGFR method not only considers the correlation between the covariates, but also identifies all relevant predictors consistently and possesses the screening consistency property under the some regularity conditions. We further propose the BIC criterion to determine whether the selected model contains the true model with probability tending to one. Finally, some simulation studies and a real application are conducted to examine the finite sample performance of the proposed PGFR procedure.  相似文献   

14.
An empirical Bayes method to select basis functions and knots in multivariate adaptive regression spline (MARS) is proposed, which takes both advantages of frequentist model selection approaches and Bayesian approaches. A penalized likelihood is maximized to estimate regression coefficients for selected basis functions, and an approximated marginal likelihood is maximized to select knots and variables involved in basis functions. Moreover, the Akaike Bayes information criterion (ABIC) is used to determine the number of basis functions. It is shown that the proposed method gives estimation of regression structure that is relatively parsimonious and more stable for some example data sets.  相似文献   

15.
Applications of regression models for binary response are very common and models specific to these problems are widely used. Quantile regression for binary response data has recently attracted attention and regularized quantile regression methods have been proposed for high dimensional problems. When the predictors have a natural group structure, such as in the case of categorical predictors converted into dummy variables, then a group lasso penalty is used in regularized methods. In this paper, we present a Bayesian Gibbs sampling procedure to estimate the parameters of a quantile regression model under a group lasso penalty for classification problems with a binary response. Simulated and real data show a good performance of the proposed method in comparison to mean-based approaches and to quantile-based approaches which do not exploit the group structure of the predictors.  相似文献   

16.
The response surface method (RSM), a simple and effective approximation technique, is widely used for reliability analysis in civil engineering. However, the traditional RSM needs a considerable number of samples and is computationally intensive and time-consuming for practical engineering problems with many variables. To overcome these problems, this study proposes a new approach that samples experimental points based on the difference between the last two trial design points. This new method constructs the response surface using a support vector machine (SVM); the SVM can build complex, nonlinear relations between random variables and approximate the performance function using fewer experimental points. This approach can reduce the number of experimental points and improve the efficiency and accuracy of reliability analysis. The advantages of the proposed method were verified using four examples involving random variables with different distributions and correlation structures. The results show that this approach can obtain the design point and reliability index with fewer experimental points and better accuracy. The proposed method was also employed to assess the reliability of a numerically modeled tunnel. The results indicate that this new method is applicable to practical, complex engineering problems such as rock engineering problems.  相似文献   

17.
鞅差误差序列下半参数EV回归模型的近邻估计   总被引:1,自引:1,他引:0  
谭星 《数学杂志》2008,28(2):203-208
本文研究了误差为鞅差序列的条件下的一维半参数EV回归模型.利用两步估计的方法构造了参数分量和非参数分量的近邻估计,并且分别证明了估计量的L2相合性和强相合性,从而推广了在普通半参数回归模型已有的相关结论.  相似文献   

18.
A Collapsing Knapsack is a container whose capacity diminishes as the number of items it must hold is increased. This paper focuses on those cases in which the decision variables are continuous, i.e., can take any non-negative value. It is demonstrated that the problem can be reduced to a set of two dimensional subproblems. Strategies for elimination of subproblems and conditions permitting reduction to a set of one dimensional problems are also considered. Computational results indicate that the procedure is quite efficient. Even for large problems only a small number of subproblems have to be solved.  相似文献   

19.
This paper introduces a model-based approach to the important data mining tool Multivariate adaptive regression splines (MARS), which has originally been organized in a more model-free way. Indeed, MARS denotes a modern methodology from statistical learning which is important in both classification and regression, with an increasing number of applications in many areas of science, economy and technology. It is very useful for high-dimensional problems and shows a great promise for fitting nonlinear multivariate functions. The MARS algorithm for estimating the model function consists of two algorithms, these are the forward and the backward stepwise algorithm. In our paper, we propose not to use the backward stepwise algorithm. Instead, we construct a penalized residual sum of squares for MARS as a Tikhonov regularization problem which is also known as ridge regression. We treat this problem using continuous optimization techniques which we consider to become an important complementary technology and model-based alternative to the concept of the backward stepwise algorithm. In particular, we apply the elegant framework of conic quadratic programming. This is an area of convex optimization which is very well-structured, herewith, resembling linear programming and, hence, permitting the use of powerful interior point methods. Based on these theoretical and algorithmical studies, this paper also contains an application to diabetes data. We evaluate and compare the performance of the established MARS and our new CMARS in classifying diabetic persons, where CMARS turns out to be very competitive and promising.  相似文献   

20.
A new logistic regression algorithm based on evolutionary product-unit (PU) neural networks is used in this paper to determine the assets that influence the decision of poor households with respect to the cultivation of non-traditional crops (NTC) in the Guatemalan Highlands. In order to evaluate high-order covariate interactions, PUs were considered to be independent variables in product-unit neural networks (PUNN) analysing two different models either including the initial covariates (logistic regression by the product-unit and initial covariate model) or not (logistic regression by the product-unit model). Our results were compared with those obtained using a standard logistic regression model and allow us to interpret the most relevant household assets and their complex interactions when adopting NTC, in order to aid in the design of rural policies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号