共查询到20条相似文献,搜索用时 15 毫秒
1.
本文研究了由高斯核的方差在算法中引起的一些误差,利用再生核的一些特殊性质对这些误差进行分析.这些误差在分析算法的收敛速度时起到了重要的作用. 相似文献
2.
Recently, there has been considerable work on analyzing learning algorithms with pairwise loss functions in the batch setting. There is relatively little theoretical work on analyzing their online algorithms, despite of their popularity in practice due to the scalability to big data. In this paper, we consider online learning algorithms with pairwise loss functions based on regularization schemes in reproducing kernel Hilbert spaces. In particular, we establish the convergence of the last iterate of the online algorithm under a very weak assumption on the step sizes and derive satisfactory convergence rates for polynomially decaying step sizes. Our technique uses Rademacher complexities which handle function classes associated with pairwise loss functions. Since pairwise learning involves pairs of examples, which are no longer i.i.d., standard techniques do not directly apply to such pairwise learning algorithms. Hence, our results are a non-trivial extension of those in the setting of univariate loss functions to the pairwise setting. 相似文献
3.
Yiming Ying 《Advances in Computational Mathematics》2007,27(3):273-291
In this paper, we are interested in the analysis of regularized online algorithms associated with reproducing kernel Hilbert
spaces. General conditions on the loss function and step sizes are given to ensure convergence. Explicit learning rates are
also given for particular step sizes.
★The author’s current address: Department of Computer Sciences, University College London, Gower Street, London WC1E, England,
UK. 相似文献
4.
In this paper, we consider unregularized online learning algorithms in a Reproducing Kernel Hilbert Space (RKHS). Firstly, we derive explicit convergence rates of the unregularized online learning algorithms for classification associated with a general α-activating loss (see Definition 1 below). Our results extend and refine the results in [30] for the least square loss and the recent result [3] for the loss function with a Lipschitz-continuous gradient. Moreover, we establish a very general condition on the step sizes which guarantees the convergence of the last iterate of such algorithms. Secondly, we establish, for the first time, the convergence of the unregularized pairwise learning algorithm with a general loss function and derive explicit rates under the assumption of polynomially decaying step sizes. Concrete examples are used to illustrate our main results. The main techniques are tools from convex analysis, refined inequalities of Gaussian averages [5], and an induction approach. 相似文献
5.
Learning Rates of Least-Square Regularized Regression 总被引:1,自引:0,他引:1
This paper considers the regularized learning algorithm associated
with the least-square loss and reproducing kernel Hilbert spaces. The target is the error analysis for the regression problem
in learning theory. A novel regularization approach is presented, which yields satisfactory learning rates. The rates depend
on the approximation property and on the capacity of the reproducing kernel Hilbert space measured by covering numbers. When
the kernel is C∞ and the regression function lies in the corresponding reproducing kernel Hilbert space, the rate is mζ with ζ arbitrarily close to 1, regardless of the variance of the bounded probability distribution. 相似文献
6.
XIANG DaoHong 《中国科学 数学(英文版)》2011,(1)
We continue our study on classification learning algorithms generated by Tikhonov regularization schemes associated with Gaussian kernels and general convex loss functions. Our main purpose of this paper is to improve error bounds by presenting a new comparison theorem associated with general convex loss functions and Tsybakov noise conditions. Some concrete examples are provided to illustrate the improved learning rates which demonstrate the effect of various loss functions for learning algorithms. In our ... 相似文献
7.
A family of classification algorithms generated from Tikhonov regularization schemes are considered. They involve multi-kernel spaces and general convex loss functions. Our main purpose is to provide satisfactory estimates for the excess misclassification error of these multi-kernel regularized classifiers when the loss functions achieve the zero value. The error analysis consists of two parts: regularization error and sample error. Allowing multi-kernels in the algorithm improves the regularization error and approximation error, which is one advantage of the multi-kernel setting. For a general loss function, we show how to bound the regularization error by the approximation in some weighted Lq spaces. For the sample error, we use a projection operator. The projection in connection with the decay of the regularization error enables us to improve convergence rates in the literature even for the one-kernel schemes and special loss functions: least-square loss and hinge loss for support vector machine soft margin classifiers. Existence of the optimization problem for the regularization scheme associated with multi-kernels is verified when the kernel functions are continuous with respect to the index set. Concrete examples, including Gaussian kernels with flexible variances and probability distributions with some noise conditions, are used to illustrate the general theory. 相似文献
8.
On Early Stopping in Gradient Descent Learning 总被引:1,自引:0,他引:1
In this paper we study a family of gradient descent algorithms to approximate the regression function from reproducing kernel
Hilbert spaces (RKHSs), the family being characterized by a polynomial decreasing rate of step sizes (or learning rate). By
solving a bias-variance trade-off we obtain an early stopping rule and some
probabilistic upper bounds for the convergence of the algorithms. We also discuss the implication of these results in the
context of classification where some fast convergence rates can be achieved for plug-in classifiers. Some connections are
addressed with Boosting, Landweber iterations, and the online learning algorithms as stochastic approximations of the gradient
descent method. 相似文献
9.
Learning gradients is one approach for variable selection and feature covariation estimation when dealing with large data of many variables or coordinates. In a classification setting involving a convex loss function, a possible algorithm for gradient learning is implemented by solving convex quadratic programming optimization problems induced by regularization schemes in reproducing kernel Hilbert spaces. The complexity for such an algorithm might be very high when the number of variables or samples is huge. We introduce a gradient descent algorithm for gradient learning in classification. The implementation of this algorithm is simple and its convergence is elegantly studied. Explicit learning rates are presented in terms of the regularization parameter and the step size. Deep analysis for approximation by reproducing kernel Hilbert spaces under some mild conditions on the probability measure for sampling allows us to deal with a general class of convex loss functions. 相似文献
10.
Cheng Wang 《Journal of Complexity》2011,27(1):55-67
A standard assumption in theoretical study of learning algorithms for regression is uniform boundedness of output sample values. This excludes the common case with Gaussian noise. In this paper we investigate the learning algorithm for regression generated by the least squares regularization scheme in reproducing kernel Hilbert spaces without the assumption of uniform boundedness for sampling. By imposing some incremental conditions on moments of the output variable, we derive learning rates in terms of regularity of the regression function and capacity of the hypothesis space. The novelty of our analysis is a new covering number argument for bounding the sample error. 相似文献
11.
Dao-Hong Xiang 《Advances in Computational Mathematics》2013,38(4):723-735
In this paper we study conditional quantile regression by learning algorithms generated from Tikhonov regularization schemes associated with pinball loss and varying Gaussian kernels. Our main goal is to provide convergence rates for the algorithm and illustrate differences between the conditional quantile regression and the least square regression. Applying varying Gaussian kernels improves the approximation ability of the algorithm. Bounds for the sample error are achieved by using a projection operator, a variance-expectation bound derived from a condition on conditional distributions and a tight bound for the covering numbers involving the Gaussian kernels. 相似文献
12.
Learning function relations or understanding structures of data lying in manifolds embedded in huge dimensional Euclidean
spaces is an important topic in learning theory. In this paper we study the approximation and learning by Gaussians of functions
defined on a d-dimensional connected compact C
∞ Riemannian submanifold of which is isometrically embedded. We show that the convolution with the Gaussian kernel with variance σ provides the uniform approximation order of O(σ
s
) when the approximated function is Lipschitz s ∈(0, 1]. The uniform normal neighborhoods of a compact Riemannian manifold play a central role in deriving the approximation
order. This approximation result is used to investigate the regression learning algorithm generated by the multi-kernel least
square regularization scheme associated with Gaussian kernels with flexible variances. When the regression function is Lipschitz
s, our learning rate is (log2
m)/m)
s/(8 s + 4 d) where m is the sample size. When the manifold dimension d is smaller than the dimension n of the underlying Euclidean space, this rate is much faster compared with those in the literature. By comparing approximation
orders, we also show the essential difference between approximation schemes with flexible variances and those with a single
variance.
Supported partially by the Research Grants Council of Hong Kong [Project No. CityU 103405], City University of Hong Kong [Project
No. 7001983], National Science Fund for Distinguished Young Scholars of China [Project No. 10529101], and National Basic Research
Program of China [Project No. 973-2006CB303102]. 相似文献
13.
Barbara Zwicknagl 《Constructive Approximation》2009,29(1):61-84
We introduce a class of analytic positive definite multivariate kernels which includes infinite dot product kernels as sometimes
used in machine learning, certain new nonlinearly factorizable kernels, and a kernel which is closely related to the Gaussian.
Each such kernel reproduces in a certain “native” Hilbert space of multivariate analytic functions. If functions from this
space are interpolated in scattered locations by translates of the kernel, we prove spectral convergence rates of the interpolants
and all derivatives. By truncation of the power series of the kernel-based interpolants, we constructively generalize the
classical Bernstein theorem concerning polynomial approximation of analytic functions to the multivariate case. An application
to machine learning algorithms is presented.
相似文献
14.
The regularization parameter choice is a fundamental problem in Learning Theory since the performance of most supervised algorithms
crucially depends on the choice of one or more of such parameters. In particular a main theoretical issue regards the amount
of prior knowledge needed to choose the regularization parameter in order to obtain good learning rates. In this paper we
present a parameter choice strategy, called the balancing principle, to choose the regularization parameter without knowledge
of the regularity of the target function. Such a choice adaptively achieves the best error rate. Our main result applies to
regularization algorithms in reproducing kernel Hilbert space with the square loss, though we also study how a similar principle
can be used in other situations. As a straightforward corollary we can immediately derive adaptive parameter choices for various
kernel methods recently studied. Numerical experiments with the proposed parameter choice rules are also presented. 相似文献
15.
《Applied Mathematical Modelling》2014,38(11-12):2800-2818
Electrical discharge machining (EDM) is inherently a stochastic process. Predicting the output of such a process with reasonable accuracy is rather difficult. Modern learning based methodologies, being capable of reading the underlying unseen effect of control factors on responses, appear to be effective in this regard. In the present work, support vector machine (SVM), one of the supervised learning methods, is applied for developing the model of EDM process. Gaussian radial basis function and ε-insensitive loss function are used as kernel function and loss function respectively. Separate models of material removal rate (MRR) and average surface roughness parameter (Ra) are developed by minimizing the mean absolute percentage error (MAPE) of training data obtained for different set of SVM parameter combinations. Particle swarm optimization (PSO) is employed for the purpose of optimizing SVM parameter combinations. Models thus developed are then tested with disjoint testing data sets. Optimum parameter settings for maximum MRR and minimum Ra are further investigated applying PSO on the developed models. 相似文献
16.
包括图像识别在内的很多应用领域里,把单个样本表示成向量的集合的形式是很自然的想法,利用一个合适的核函数我们可以把这些向量映射到一个更高维的Hilbert空间,在这个高维空间里用Kernel PCA方法找到样本的高斯分布族,这样就可以把样本上的核函数定义成它们所服从的高斯分布密度函数的Bhattacharrya仿射.这样得到的核函数具有比较好的性质,比如说在各种变换下有稳定性表现,从而也说明了即使还有别的表示样本的方法,用向量集合的形式来表示单个的样本也是具有合理性的. 相似文献
17.
The regularity of functions from reproducing kernel Hilbert spaces (RKHSs) is studied in the setting of learning theory. We provide a reproducing property for partial derivatives up to order s when the Mercer kernel is C2s. For such a kernel on a general domain we show that the RKHS can be embedded into the function space Cs. These observations yield a representer theorem for regularized learning algorithms involving data for function values and gradients. Examples of Hermite learning and semi-supervised learning penalized by gradients on data are considered. 相似文献
18.
19.
In regularized kernel methods, the solution of a learning problem is found by minimizing a functional consisting of a empirical risk and a regularization term. In this paper, we study the existence of optimal solution of multi-kernel regularization learning. First, we ameliorate a previous conclusion about this problem given by Micchelli and Pontil, and prove that the optimal solution exists whenever the kernel set is a compact set. Second, we consider this problem for Gaussian kernels with variance σ∈(0,∞), and give some conditions under which the optimal solution exists. 相似文献
20.
This paper considers online classification learning algorithms for regularized classification schemes with generalized gradient. A novel capacity independent approach is presented. It verifies the strong convergence of sizes and yields satisfactory convergence rates for polynomially decaying step sizes. Compared with the gradient schemes, this algorithm needs only less additional assumptions on the loss function and derives a stronger result with respect to the choice of step sizes and the regularization pa... 相似文献