首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
本文研究了由高斯核的方差在算法中引起的一些误差,利用再生核的一些特殊性质对这些误差进行分析.这些误差在分析算法的收敛速度时起到了重要的作用.  相似文献   

2.
Recently, there has been considerable work on analyzing learning algorithms with pairwise loss functions in the batch setting. There is relatively little theoretical work on analyzing their online algorithms, despite of their popularity in practice due to the scalability to big data. In this paper, we consider online learning algorithms with pairwise loss functions based on regularization schemes in reproducing kernel Hilbert spaces. In particular, we establish the convergence of the last iterate of the online algorithm under a very weak assumption on the step sizes and derive satisfactory convergence rates for polynomially decaying step sizes. Our technique uses Rademacher complexities which handle function classes associated with pairwise loss functions. Since pairwise learning involves pairs of examples, which are no longer i.i.d., standard techniques do not directly apply to such pairwise learning algorithms. Hence, our results are a non-trivial extension of those in the setting of univariate loss functions to the pairwise setting.  相似文献   

3.
In this paper, we are interested in the analysis of regularized online algorithms associated with reproducing kernel Hilbert spaces. General conditions on the loss function and step sizes are given to ensure convergence. Explicit learning rates are also given for particular step sizes. The author’s current address: Department of Computer Sciences, University College London, Gower Street, London WC1E, England, UK.  相似文献   

4.
In this paper, we consider unregularized online learning algorithms in a Reproducing Kernel Hilbert Space (RKHS). Firstly, we derive explicit convergence rates of the unregularized online learning algorithms for classification associated with a general α-activating loss (see Definition 1 below). Our results extend and refine the results in [30] for the least square loss and the recent result [3] for the loss function with a Lipschitz-continuous gradient. Moreover, we establish a very general condition on the step sizes which guarantees the convergence of the last iterate of such algorithms. Secondly, we establish, for the first time, the convergence of the unregularized pairwise learning algorithm with a general loss function and derive explicit rates under the assumption of polynomially decaying step sizes. Concrete examples are used to illustrate our main results. The main techniques are tools from convex analysis, refined inequalities of Gaussian averages [5], and an induction approach.  相似文献   

5.
Learning Rates of Least-Square Regularized Regression   总被引:1,自引:0,他引:1  
This paper considers the regularized learning algorithm associated with the least-square loss and reproducing kernel Hilbert spaces. The target is the error analysis for the regression problem in learning theory. A novel regularization approach is presented, which yields satisfactory learning rates. The rates depend on the approximation property and on the capacity of the reproducing kernel Hilbert space measured by covering numbers. When the kernel is C and the regression function lies in the corresponding reproducing kernel Hilbert space, the rate is mζ with ζ arbitrarily close to 1, regardless of the variance of the bounded probability distribution.  相似文献   

6.
We continue our study on classification learning algorithms generated by Tikhonov regularization schemes associated with Gaussian kernels and general convex loss functions. Our main purpose of this paper is to improve error bounds by presenting a new comparison theorem associated with general convex loss functions and Tsybakov noise conditions. Some concrete examples are provided to illustrate the improved learning rates which demonstrate the effect of various loss functions for learning algorithms. In our ...  相似文献   

7.
A family of classification algorithms generated from Tikhonov regularization schemes are considered. They involve multi-kernel spaces and general convex loss functions. Our main purpose is to provide satisfactory estimates for the excess misclassification error of these multi-kernel regularized classifiers when the loss functions achieve the zero value. The error analysis consists of two parts: regularization error and sample error. Allowing multi-kernels in the algorithm improves the regularization error and approximation error, which is one advantage of the multi-kernel setting. For a general loss function, we show how to bound the regularization error by the approximation in some weighted LqLq spaces. For the sample error, we use a projection operator. The projection in connection with the decay of the regularization error enables us to improve convergence rates in the literature even for the one-kernel schemes and special loss functions: least-square loss and hinge loss for support vector machine soft margin classifiers. Existence of the optimization problem for the regularization scheme associated with multi-kernels is verified when the kernel functions are continuous with respect to the index set. Concrete examples, including Gaussian kernels with flexible variances and probability distributions with some noise conditions, are used to illustrate the general theory.  相似文献   

8.
On Early Stopping in Gradient Descent Learning   总被引:1,自引:0,他引:1  
In this paper we study a family of gradient descent algorithms to approximate the regression function from reproducing kernel Hilbert spaces (RKHSs), the family being characterized by a polynomial decreasing rate of step sizes (or learning rate). By solving a bias-variance trade-off we obtain an early stopping rule and some probabilistic upper bounds for the convergence of the algorithms. We also discuss the implication of these results in the context of classification where some fast convergence rates can be achieved for plug-in classifiers. Some connections are addressed with Boosting, Landweber iterations, and the online learning algorithms as stochastic approximations of the gradient descent method.  相似文献   

9.
Learning gradients is one approach for variable selection and feature covariation estimation when dealing with large data of many variables or coordinates. In a classification setting involving a convex loss function, a possible algorithm for gradient learning is implemented by solving convex quadratic programming optimization problems induced by regularization schemes in reproducing kernel Hilbert spaces. The complexity for such an algorithm might be very high when the number of variables or samples is huge. We introduce a gradient descent algorithm for gradient learning in classification. The implementation of this algorithm is simple and its convergence is elegantly studied. Explicit learning rates are presented in terms of the regularization parameter and the step size. Deep analysis for approximation by reproducing kernel Hilbert spaces under some mild conditions on the probability measure for sampling allows us to deal with a general class of convex loss functions.  相似文献   

10.
A standard assumption in theoretical study of learning algorithms for regression is uniform boundedness of output sample values. This excludes the common case with Gaussian noise. In this paper we investigate the learning algorithm for regression generated by the least squares regularization scheme in reproducing kernel Hilbert spaces without the assumption of uniform boundedness for sampling. By imposing some incremental conditions on moments of the output variable, we derive learning rates in terms of regularity of the regression function and capacity of the hypothesis space. The novelty of our analysis is a new covering number argument for bounding the sample error.  相似文献   

11.
In this paper we study conditional quantile regression by learning algorithms generated from Tikhonov regularization schemes associated with pinball loss and varying Gaussian kernels. Our main goal is to provide convergence rates for the algorithm and illustrate differences between the conditional quantile regression and the least square regression. Applying varying Gaussian kernels improves the approximation ability of the algorithm. Bounds for the sample error are achieved by using a projection operator, a variance-expectation bound derived from a condition on conditional distributions and a tight bound for the covering numbers involving the Gaussian kernels.  相似文献   

12.
Learning function relations or understanding structures of data lying in manifolds embedded in huge dimensional Euclidean spaces is an important topic in learning theory. In this paper we study the approximation and learning by Gaussians of functions defined on a d-dimensional connected compact C Riemannian submanifold of which is isometrically embedded. We show that the convolution with the Gaussian kernel with variance σ provides the uniform approximation order of O(σ s ) when the approximated function is Lipschitz s ∈(0, 1]. The uniform normal neighborhoods of a compact Riemannian manifold play a central role in deriving the approximation order. This approximation result is used to investigate the regression learning algorithm generated by the multi-kernel least square regularization scheme associated with Gaussian kernels with flexible variances. When the regression function is Lipschitz s, our learning rate is (log2 m)/m) s/(8 s + 4 d) where m is the sample size. When the manifold dimension d is smaller than the dimension n of the underlying Euclidean space, this rate is much faster compared with those in the literature. By comparing approximation orders, we also show the essential difference between approximation schemes with flexible variances and those with a single variance. Supported partially by the Research Grants Council of Hong Kong [Project No. CityU 103405], City University of Hong Kong [Project No. 7001983], National Science Fund for Distinguished Young Scholars of China [Project No. 10529101], and National Basic Research Program of China [Project No. 973-2006CB303102].  相似文献   

13.
We introduce a class of analytic positive definite multivariate kernels which includes infinite dot product kernels as sometimes used in machine learning, certain new nonlinearly factorizable kernels, and a kernel which is closely related to the Gaussian. Each such kernel reproduces in a certain “native” Hilbert space of multivariate analytic functions. If functions from this space are interpolated in scattered locations by translates of the kernel, we prove spectral convergence rates of the interpolants and all derivatives. By truncation of the power series of the kernel-based interpolants, we constructively generalize the classical Bernstein theorem concerning polynomial approximation of analytic functions to the multivariate case. An application to machine learning algorithms is presented.   相似文献   

14.
The regularization parameter choice is a fundamental problem in Learning Theory since the performance of most supervised algorithms crucially depends on the choice of one or more of such parameters. In particular a main theoretical issue regards the amount of prior knowledge needed to choose the regularization parameter in order to obtain good learning rates. In this paper we present a parameter choice strategy, called the balancing principle, to choose the regularization parameter without knowledge of the regularity of the target function. Such a choice adaptively achieves the best error rate. Our main result applies to regularization algorithms in reproducing kernel Hilbert space with the square loss, though we also study how a similar principle can be used in other situations. As a straightforward corollary we can immediately derive adaptive parameter choices for various kernel methods recently studied. Numerical experiments with the proposed parameter choice rules are also presented.  相似文献   

15.
《Applied Mathematical Modelling》2014,38(11-12):2800-2818
Electrical discharge machining (EDM) is inherently a stochastic process. Predicting the output of such a process with reasonable accuracy is rather difficult. Modern learning based methodologies, being capable of reading the underlying unseen effect of control factors on responses, appear to be effective in this regard. In the present work, support vector machine (SVM), one of the supervised learning methods, is applied for developing the model of EDM process. Gaussian radial basis function and ε-insensitive loss function are used as kernel function and loss function respectively. Separate models of material removal rate (MRR) and average surface roughness parameter (Ra) are developed by minimizing the mean absolute percentage error (MAPE) of training data obtained for different set of SVM parameter combinations. Particle swarm optimization (PSO) is employed for the purpose of optimizing SVM parameter combinations. Models thus developed are then tested with disjoint testing data sets. Optimum parameter settings for maximum MRR and minimum Ra are further investigated applying PSO on the developed models.  相似文献   

16.
包括图像识别在内的很多应用领域里,把单个样本表示成向量的集合的形式是很自然的想法,利用一个合适的核函数我们可以把这些向量映射到一个更高维的Hilbert空间,在这个高维空间里用Kernel PCA方法找到样本的高斯分布族,这样就可以把样本上的核函数定义成它们所服从的高斯分布密度函数的Bhattacharrya仿射.这样得到的核函数具有比较好的性质,比如说在各种变换下有稳定性表现,从而也说明了即使还有别的表示样本的方法,用向量集合的形式来表示单个的样本也是具有合理性的.  相似文献   

17.
The regularity of functions from reproducing kernel Hilbert spaces (RKHSs) is studied in the setting of learning theory. We provide a reproducing property for partial derivatives up to order s when the Mercer kernel is C2s. For such a kernel on a general domain we show that the RKHS can be embedded into the function space Cs. These observations yield a representer theorem for regularized learning algorithms involving data for function values and gradients. Examples of Hermite learning and semi-supervised learning penalized by gradients on data are considered.  相似文献   

18.
本文讨论了再生核Hilbert 空间上一类广泛的正则化回归算法的学习率问题. 在分析算法的样本误差时, 我们利用了一种复加权的经验过程, 保证了方差与惩罚泛函同时被阈值控制, 从而避免了繁琐的迭代过程. 本文得到了比之前文献结果更为快速的学习率.  相似文献   

19.
In regularized kernel methods, the solution of a learning problem is found by minimizing a functional consisting of a empirical risk and a regularization term. In this paper, we study the existence of optimal solution of multi-kernel regularization learning. First, we ameliorate a previous conclusion about this problem given by Micchelli and Pontil, and prove that the optimal solution exists whenever the kernel set is a compact set. Second, we consider this problem for Gaussian kernels with variance σ∈(0,∞), and give some conditions under which the optimal solution exists.  相似文献   

20.
This paper considers online classification learning algorithms for regularized classification schemes with generalized gradient. A novel capacity independent approach is presented. It verifies the strong convergence of sizes and yields satisfactory convergence rates for polynomially decaying step sizes. Compared with the gradient schemes, this algorithm needs only less additional assumptions on the loss function and derives a stronger result with respect to the choice of step sizes and the regularization pa...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号