首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper describes the relationship between support vector regression (SVR) and rough (or interval) patterns. SVR is the prediction component of the support vector techniques. Rough patterns are based on the notion of rough values, which consist of upper and lower bounds, and are used to effectively represent a range of variable values. Predictions of rough values in a variety of different forms within the context of interval algebra and fuzzy theory are attracting research interest. An extension of SVR, called rough support vector regression   (RSVR), is proposed to improve the modeling of rough patterns. In particular, it is argued that the upper and lower bounds should be modeled separately. The proposal is shown to be a more flexible version of lower possibilistic regression model using ??-insensitivity. Experimental results on the Dow Jones Industrial Average demonstrate the suggested RSVR modeling technique.  相似文献   

2.
The need to minimize the potential impact of air pollutants on humans has made the accurate prediction of concentrations of air pollutants a crucial subject in environmental research. Support vector regression (SVR) models have been successfully employed to solve time series problems in many fields. The use of SVR models for forecasting concentrations of air pollutants has not been widely investigated. Data preprocessing procedures and the parameter selection of SVR models can radically influence forecasting performance. This study proposes a support vector regression with logarithm preprocessing procedure and immune algorithms (SVRLIA) model which takes advantage of the structural risk minimization of SVR models, the data smoothing of preprocessing procedures, and the optimization of immune algorithms, in order to more accurately forecast concentrations of air pollutants. Three pollutants, namely particulate matter (PM10), nitrogen oxide, (NOx), and nitrogen dioxide (NO2), are collected and examined to determine the feasibility of the developed SVRLIA model. Experimental results reveal that the SVRLIA model can accurately forecast concentrations of air pollutants.  相似文献   

3.
Supervised classification is an important part of corporate data mining to support decision making in customer-centric planning tasks. The paper proposes a hierarchical reference model for support vector machine based classification within this discipline. The approach balances the conflicting goals of transparent yet accurate models and compares favourably to alternative classifiers in a large-scale empirical evaluation in real-world customer relationship management applications. Recent advances in support vector machine oriented research are incorporated to approach feature, instance and model selection in a unified framework.  相似文献   

4.
Method  In this paper, we introduce a bi-level optimization formulation for the model and feature selection problems of support vector machines (SVMs). A bi-level optimization model is proposed to select the best model, where the standard convex quadratic optimization problem of the SVM training is cast as a subproblem. Feasibility  The optimal objective value of the quadratic problem of SVMs is minimized over a feasible range of the kernel parameters at the master level of the bi-level model. Since the optimal objective value of the subproblem is a continuous function of the kernel parameters, through implicity defined over a certain region, the solution of this bi-level problem always exists. The problem of feature selection can be handled in a similar manner. Experiments and results  Two approaches for solving the bi-level problem of model and feature selection are considered as well. Experimental results show that the bi-level formulation provides a plausible tool for model selection.  相似文献   

5.
Discrete support vector machines (DSVM), originally proposed for binary classification problems, have been shown to outperform other competing approaches on well-known benchmark datasets. Here we address their extension to multicategory classification, by developing three different methods. Two of them are based respectively on one-against-all and round-robin classification schemes, in which a number of binary discrimination problems are solved by means of a variant of DSVM. The third method directly addresses the multicategory classification task, by building a decision tree in which an optimal split to separate classes is derived at each node by a new extended formulation of DSVM. Computational tests on publicly available datasets are then conducted to compare the three multicategory classifiers based on DSVM with other methods, indicating that the proposed techniques achieve significantly higher accuracies. This research was partially supported by PRIN grant 2004132117.  相似文献   

6.
Uniform boundedness of output variables is a standard assumption in most theoretical analysis of regression algorithms. This standard assumption has recently been weaken to a moment hypothesis in least square regression (LSR) setting. Although there has been a large literature on error analysis for LSR under the moment hypothesis, very little is known about the statistical properties of support vector machines regression with unbounded sampling. In this paper, we fill the gap in the literature. Without any restriction on the boundedness of the output sampling, we establish an ad hoc convergence analysis for support vector machines regression under very mild conditions.  相似文献   

7.
Transductive learning involves the construction and application of prediction models to classify a fixed set of decision objects into discrete groups. It is a special case of classification analysis with important applications in web-mining, corporate planning and other areas. This paper proposes a novel transductive classifier that is based on the philosophy of discrete support vector machines. We formalize the task to estimate the class labels of decision objects as a mixed integer program. A memetic algorithm is developed to solve the mathematical program and to construct a transductive support vector machine classifier, respectively. Empirical experiments on synthetic and real-world data evidence the effectiveness of the new approach and demonstrate that it identifies high quality solutions in short time. Furthermore, the results suggest that the class predictions following from the memetic algorithm are significantly more accurate than the predictions of a CPLEX-based reference classifier. Comparisons to other transductive and inductive classifiers provide further support for our approach and suggest that it performs competitive with respect to several benchmarks.  相似文献   

8.
Loss given default modelling has become crucially important for banks due to the requirement that they comply with the Basel Accords and to their internal computations of economic capital. In this paper, support vector regression (SVR) techniques are applied to predict loss given default of corporate bonds, where improvements are proposed to increase prediction accuracy by modifying the SVR algorithm to account for heterogeneity of bond seniorities. We compare the predictions from SVR techniques with thirteen other algorithms. Our paper has three important results. First, at an aggregated level, the proposed improved versions of support vector regression techniques outperform other methods significantly. Second, at a segmented level, by bond seniority, least square support vector regression demonstrates significantly better predictive abilities compared with the other statistical models. Third, standard transformations of loss given default do not improve prediction accuracy. Overall our empirical results show that support vector regression techniques are a promising technique for banks to use to predict loss given default.  相似文献   

9.
Forecasting the number of warranty claims is vitally important for manufacturers/warranty providers in preparing fiscal plans. In existing literature, a number of techniques such as log-linear Poisson models, Kalman filter, time series models, and artificial neural network models have been developed. Nevertheless, one might find two weaknesses existing in these approaches: (1) they do not consider the fact that warranty claims reported in the recent months might be more important in forecasting future warranty claims than those reported in the earlier months, and (2) they are developed based on repair rates (i.e., the total number of claims divided by the total number of products in service), which can cause information loss through such an arithmetic-mean operation.To overcome the above two weaknesses, this paper introduces two different approaches to forecasting warranty claims: the first is a weighted support vector regression (SVR) model and the second is a weighted SVR-based time series model. These two approaches can be applied to two scenarios: when only claim rate data are available and when original claim data are available. Two case studies are conducted to validate the two modelling approaches. On the basis of model evaluation over six months ahead forecasting, the results show that the proposed models exhibit superior performance compared to that of multilayer perceptrons, radial basis function networks and ordinary support vector regression models.  相似文献   

10.
The availability of abundant data posts a challenge to integrate static customer data and longitudinal behavioral data to improve performance in customer churn prediction. Usually, longitudinal behavioral data are transformed into static data before being included in a prediction model. In this study, a framework with ensemble techniques is presented for customer churn prediction directly using longitudinal behavioral data. A novel approach called the hierarchical multiple kernel support vector machine (H-MK-SVM) is formulated. A three phase training algorithm for the H-MK-SVM is developed, implemented and tested. The H-MK-SVM constructs a classification function by estimating the coefficients of both static and longitudinal behavioral variables in the training process without transformation of the longitudinal behavioral data. The training process of the H-MK-SVM is also a feature selection and time subsequence selection process because the sparse non-zero coefficients correspond to the variables selected. Computational experiments using three real-world databases were conducted. Computational results using multiple criteria measuring performance show that the H-MK-SVM directly using longitudinal behavioral data performs better than currently available classifiers.  相似文献   

11.
Support Vector Machines (SVMs) is known to be a powerful nonparametric classification technique even for high-dimensional data. Although predictive ability is important, obtaining an easy-to-interpret classifier is also crucial in many applications. Linear SVM provides a classifier based on a linear score. In the case of functional data, the coefficient function that defines such linear score usually has many irregular oscillations, making it difficult to interpret.  相似文献   

12.
Emilio Carrizosa 《TOP》2006,14(2):399-424
A key problem in Multiple-Criteria Decision Making is how to measure the importance of the different criteria when just a partial preference relation among actions is given. In this note we address the problem of constructing a linear score function (and thus how to associate weights of importance to the criteria) when a binary relation comparing actions and partial information (relative importance) on the criteria are given. It is shown that these tasks can be done viaSupport Vector Machines, an increasingly popular Data Mining technique, which reduces the search of the weights to the resolution of (a series of) nonlinear convex optimization problems with linear constraints. An interactive method is then presented and illustrated by solving a multiple-objective 0–1 knapsack problem. Extensions to the case in which data are imprecise (given by intervals) or intransitivities in strict preferences exist are outlined.  相似文献   

13.
LAD estimation for nonlinear regression models with randomly censored data   总被引:3,自引:0,他引:3  
The least absolute deviations (LAD) estimation for nonlinear regression models with randomly censored data is studied and the asymptotic properties of LAD estimators such as consistency, boundedness in probability and asymptotic normality are established. Simulation results show that for the problems with censored data, LAD estimation performs much more robustly than the least squares estimation.  相似文献   

14.
We consider the problem of deleting bad influential observations (outliers) in linear regression models. The problem is formulated as a Quadratic Mixed Integer Programming (QMIP) problem, where penalty costs for discarding outliers are used into the objective function. The optimum solution defines a robust regression estimator called penalized trimmed squares (PTS). Due to the high computational complexity of the resulting QMIP problem, the proposed robust procedure is computationally suitable for small sample data. The computational performance and the effectiveness of the new procedure are improved significantly by using the idea of ε-Insensitive loss function from support vectors machine regression. Small errors are ignored, and the mathematical formula gains the sparseness property. The good performance of the ε-Insensitive PTS (IPTS) estimator allows identification of multiple outliers avoiding masking or swamping effects. The computational effectiveness and successful outlier detection of the proposed method is demonstrated via simulated experiments. This research has been partially funded by the Greek Ministry of Education under the program Pythagoras II.  相似文献   

15.
This paper is concerned with cross-validation (CV) criteria for choice of models, which can be regarded as approximately unbiased estimators for two types of risk functions. One is AIC type of risk or equivalently the expected Kullback-Leibler distance between the distributions of observations under a candidate model and the true model. The other is based on the expected mean squared error of prediction. In this paper we study asymptotic properties of CV criteria for selecting multivariate regression models and growth curve models under the assumption that a candidate model includes the true model. Based on the results, we propose their corrected versions which are more nearly unbiased for their risks. Through numerical experiments, some tendency of the CV criteria will be also pointed.  相似文献   

16.
Composite quantile regression with randomly censored data is studied. Moreover, adaptive LASSO methods for composite quantile regression with randomly censored data are proposed. The consistency, asymptotic normality and oracle property of the proposed estimators are established. The proposals are illustrated via simulation studies and the Australian AIDS dataset.  相似文献   

17.
We consider inverse regression models with convolution-type operators which mediate convolution on (d≥1) and prove a pointwise central limit theorem for spectral regularisation estimators which can be applied to construct pointwise confidence regions. Here, we cope with the unknown bias of such estimators by undersmoothing. Moreover, we prove consistency of the residual bootstrap in this setting and demonstrate the feasibility of the bootstrap confidence bands at moderate sample sizes in a simulation study.  相似文献   

18.
Support vector machines (SVMs), that utilize a mixture of the L1L1-norm and the L2L2-norm penalties, are capable of performing simultaneous classification and selection of highly correlated features. These SVMs, typically set up as convex programming problems, are re-formulated here as simple convex quadratic minimization problems over non-negativity constraints, giving rise to a new formulation – the pq-SVM method. Solutions to our re-formulation are obtained efficiently by an extremely simple algorithm. Computational results on a range of publicly available datasets indicate that these methods allow greater classification accuracy in addition to selecting groups of highly correlated features. These methods were also compared on a new dataset assessing HIV-associated neurocognitive disorder in a group of 97 HIV-infected individuals.  相似文献   

19.
The authors consider various procedures for testing the hypotheses of independence of two sets of variables and certain regression coefficients are zero under multivariate regression model. Various properties of these procedures and the asymptotic distributions associated with these procedures are also considered.  相似文献   

20.
This work deals with log‐symmetric regression models, which are particularly useful when the response variable is continuous, strictly positive, and following an asymmetric distribution, with the possibility of modeling atypical observations by means of robust estimation. In these regression models, the distribution of the random errors is a member of the log‐symmetric family, which is composed by the log‐contaminated‐normal, log‐hyperbolic, log‐normal, log‐power‐exponential, log‐slash and log‐Student‐t distributions, among others. One way to select the best family member in log‐symmetric regression models is using information criteria. In this paper, we formulate log‐symmetric regression models and conduct a Monte Carlo simulation study to investigate the accuracy of popular information criteria, as Akaike, Bayesian, and Hannan‐Quinn, and their respective corrected versions to choose adequate log‐symmetric regressions models. As a business application, a movie data set assembled by authors is analyzed to compare and obtain the best possible log‐symmetric regression model for box offices. The results provide relevant information for model selection criteria in log‐symmetric regressions and for the movie industry. Economic implications of our study are discussed after the numerical illustrations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号