首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The curse of dimensionality is based on the fact that high dimensional data is often difficult to work with. A large number of features can increase the noise of the data and thus the error of a learning algorithm. Feature selection is a solution for such problems where there is a need to reduce the data dimensionality. Different feature selection algorithms may yield feature subsets that can be considered local optima in the space of feature subsets. Ensemble feature selection combines independent feature subsets and might give a better approximation to the optimal subset of features. We propose an ensemble feature selection approach based on feature selectors’ reliability assessment. It aims at providing a unique and stable feature selection without ignoring the predictive accuracy aspect. A classification algorithm is used as an evaluator to assign a confidence to features selected by ensemble members based on their associated classification performance. We compare our proposed approach to several existing techniques and to individual feature selection algorithms. Results show that our approach often improves classification performance and feature selection stability for high dimensional data sets.  相似文献   

2.
Nonlinear dimensionality reduction (NLDR) algorithms such as Isomap, LLE, and Laplacian Eigenmaps address the problem of representing high-dimensional nonlinear data in terms of low-dimensional coordinates which represent the intrinsic structure of the data. This paradigm incorporates the assumption that real-valued coordinates provide a rich enough class of functions to represent the data faithfully and efficiently. On the other hand, there are simple structures which challenge this assumption: the circle, for example, is one-dimensional, but its faithful representation requires two real coordinates. In this work, we present a strategy for constructing circle-valued functions on a statistical data set. We develop a machinery of persistent cohomology to identify candidates for significant circle-structures in the data, and we use harmonic smoothing and integration to obtain the circle-valued coordinate functions themselves. We suggest that this enriched class of coordinate functions permits a precise NLDR analysis of a broader range of realistic data sets.  相似文献   

3.
Finding the set of nearest neighbors for a query point of interest appears in a variety of algorithms for machine learning and pattern recognition. Examples include k nearest neighbor classification, information retrieval, case-based reasoning, manifold learning, and nonlinear dimensionality reduction. In this work, we propose a new approach for determining a distance metric from the data for finding such neighboring points. For a query point of interest, our approach learns a generalized quadratic distance (GQD) metric based on the statistical properties in a “small” neighborhood for the point of interest. The locally learned GQD metric captures information such as the density, curvature, and the intrinsic dimensionality for the points falling in this particular neighborhood. Unfortunately, learning the GQD parameters under such a local learning mechanism is a challenging problem with a high computational overhead. To address these challenges, we estimate the GQD parameters using the minimum volume covering ellipsoid (MVCE) for a set of points. The advantage of the MVCE is two-fold. First, the MVCE together with the local learning approach approximate the functionality of a well known robust estimator for covariance matrices. Second, computing the MVCE is a convex optimization problem which, in addition to having a unique global solution, can be efficiently solved using a first order optimization algorithm. We validate our metric learning approach on a large variety of datasets and show that the proposed metric has promising results when compared with five algorithms from the literature for supervised metric learning.  相似文献   

4.
5.
We consider weakly coupled map lattices with a decaying interaction. That is, we consider systems which consist of a phase space at every site such that the dynamics at a site is little affected by the dynamics at far away sites.We develop a functional analysis framework which formulates quantitatively the decay of the interaction and is able to deal with lattices such that the sites are manifolds. This framework is very well suited to study systematically invariant objects. One obtains that the invariant objects are essentially local.We use this framework to prove a stable manifold theorem and show that the manifolds are as smooth as the maps and have decay properties (i.e. the derivatives of one of the coordinates of the manifold with respect to the coordinates at far away sites are small). Other applications of the framework are the study of the structural stability of maps with decay close to uncoupled possessing hyperbolic sets and the decay properties of the invariant manifolds of their hyperbolic sets, in the companion paper by Fontich et al. (2011) [10].  相似文献   

6.
We introduce vector diffusion maps (VDM), a new mathematical framework for organizing and analyzing massive high‐dimensional data sets, images, and shapes. VDMis a mathematical and algorithmic generalization of diffusion maps and other nonlinear dimensionality reduction methods, such as LLE, ISOMAP, and Laplacian eigenmaps. While existing methods are either directly or indirectly related to the heat kernel for functions over the data, VDM is based on the heat kernel for vector fields. VDM provides tools for organizing complex data sets, embedding them in a low‐dimensional space, and interpolating and regressing vector fields over the data. In particular, it equips the data with a metric, which we refer to as the vector diffusion distance. In the manifold learning setup, where the data set is distributed on a low‐dimensional manifold ${\cal M}^d$ embedded in $\font\open=msbm10 at 10pt\def\R{\hbox{\open R}}\R^p$ , we prove the relation between VDM and the connection Laplacian operator for vector fields over the manifold. © 2012 Wiley Periodicals, Inc.  相似文献   

7.
This paper presents an effective and efficient kernel approach to recognize image set which is represented as a point on extended Grassmannian manifold. Several recent studies focus on the applicability of discriminant analysis on Grassmannian manifold and suffer from not obtaining the inherent nonlinear structure of the data itself. Therefore, we propose an extension of Grassmannian manifold to address this issue. Instead of using a linear data embedding with PCA, we develop a non-linear data embedding of such manifold using kernel PCA. This paper mainly consider three folds: 1) introduce a non-linear data embedding of extended Grassmannian manifold, 2) derive a distance metric of Grassmannian manifold, 3) develop an effective and efficient Grassmannian kernel for SVM classification. The extended Grassmannian manifold naturally arises in the application to recognition based on image set, such as face and object recognition. Experiments on several standard databases show better classification accuracy. Furthermore, experimental results indicate that our proposed approach significantly reduces time complexity in comparison to graph embedding discriminant analysis.  相似文献   

8.
A Gaussian kernel approximation algorithm for a feedforward neural network is presented. The approach used by the algorithm, which is based on a constructive learning algorithm, is to create the hidden units directly so that automatic design of the architecture of neural networks can be carried out. The algorithm is defined using the linear summation of input patterns and their randomized input weights. Hidden-layer nodes are defined so as to partition the input space into homogeneous regions, where each region contains patterns belonging to the same class. The largest region is used to define the center of the corresponding Gaussian hidden nodes. The algorithm is tested on three benchmark data sets of different dimensionality and sample sizes to compare the approach presented here with other algorithms. Real medical diagnoses and a biological classification of mushrooms are used to illustrate the performance of the algorithm. These results confirm the effectiveness of the proposed algorithm.  相似文献   

9.
A popular approach for analyzing high-dimensional datasets is to perform dimensionality reduction by applying non-parametric affinity kernels. Usually, it is assumed that the represented affinities are related to an underlying low-dimensional manifold from which the data is sampled. This approach works under the assumption that, due to the low-dimensionality of the underlying manifold, the kernel has a low numerical rank. Essentially, this means that the kernel can be represented by a small set of numerically-significant eigenvalues and their corresponding eigenvectors.We present an upper bound for the numerical rank of Gaussian convolution operators, which are commonly used as kernels by spectral manifold-learning methods. The achieved bound is based on the underlying geometry that is provided by the manifold from which the dataset is assumed to be sampled. The bound can be used to determine the number of significant eigenvalues/eigenvectors that are needed for spectral analysis purposes. Furthermore, the results in this paper provide a relation between the underlying geometry of the manifold (or dataset) and the numerical rank of its Gaussian affinities.The term cover-based bound is used because the computations of this bound are done by using a finite set of small constant-volume boxes that cover the underlying manifold (or the dataset). We present bounds for finite Gaussian kernel matrices as well as for the continuous Gaussian convolution operator. We explore and demonstrate the relations between the bounds that are achieved for finite and continuous cases. The cover-oriented methodology is also used to provide a relation between the geodesic length of a curve and the numerical rank of Gaussian kernel of datasets that are sampled from it.  相似文献   

10.
We present our recent work on both linear and nonlinear data reduction methods and algorithms: for the linear case we discuss results on structure analysis of SVD of column-partitioned matrices and sparse low-rank approximation; for the nonlinear case we investigate methods for nonlinear dimensionality reduction and manifold learning. The problems we address have attracted great deal of interest in data mining and machine learning.  相似文献   

11.

We consider Lagrangian coherent structures (LCSs) as the boundaries of material subsets whose advective evolution is metastable under weak diffusion. For their detection, we first transform the Eulerian advection–diffusion equation to Lagrangian coordinates, in which it takes the form of a time-dependent diffusion or heat equation. By this coordinate transformation, the reversible effects of advection are separated from the irreversible joint effects of advection and diffusion. In this framework, LCSs express themselves as (boundaries of) metastable sets under the Lagrangian diffusion process. In the case of spatially homogeneous isotropic diffusion, averaging the time-dependent family of Lagrangian diffusion operators yields Froyland’s dynamic Laplacian. In the associated geometric heat equation, the distribution of heat is governed by the dynamically induced intrinsic geometry on the material manifold, to which we refer as the geometry of mixing. We study and visualize this geometry in detail, and discuss connections between geometric features and LCSs viewed as diffusion barriers in two numerical examples. Our approach facilitates the discovery of connections between some prominent methods for coherent structure detection: the dynamic isoperimetry methodology, the variational geometric approaches to elliptic LCSs, a class of graph Laplacian-based methods and the effective diffusivity framework used in physical oceanography.

  相似文献   

12.
This paper discusses the mathematical framework for designing methods of Large Deformation Diffeomorphic Matching (LDM) for image registration in computational anatomy. After reviewing the geometrical framework of LDM image registration methods, we prove a theorem showing that these methods may be designed by using the actions of diffeomorphisms on the image data structure to define their associated momentum representations as (cotangent-lift) momentum maps. To illustrate its use, the momentum map theorem is shown to recover the known algorithms for matching landmarks, scalar images, and vector fields. After briefly discussing the use of this approach for diffusion tensor (DT) images, we explain how to use momentum maps in the design of registration algorithms for more general data structures. For example, we extend our methods to determine the corresponding momentum map for registration using semidirect product groups, for the purpose of matching images at two different length scales. Finally, we discuss the use of momentum maps in the design of image registration algorithms when the image data is defined on manifolds instead of vector spaces.  相似文献   

13.
This paper deals with a method, called locally linear embedding (LLE). It is a nonlinear dimensionality reduction technique that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional data and attempts to discover a nonlinear structure (including manifolds) in high-dimensional data. In practice, the nonlinear manifold learning methods are applied in image processing, text mining, etc. The implementation of the LLE algorithm is fairly straightforward, because the algorithm has only two control parameters: the number of neighbors of each data point and the regularization parameter. The mapping quality is quite sensitive to these parameters. In this paper, we propose a new way of selecting a regularization parameter of a local Gram matrix.  相似文献   

14.
针对经典的流形学习算法Isomap在非线性数据稀疏时降维效果下降甚至失效的问题,提出改进的切近邻等距特征映射算法(Cut-Neighbors Isometric feature mapping,CN-Isomap).该算法在数据稀疏的情况下首先通过有效识别样本点的"流形邻居"来剔除近邻图上的"短路"边,然后再通过最短路径算法拟合测地线距离,使得拟合的测地线距离不会偏离流形区域,从而低维嵌入映射能够正确地反映高维输入空间样本点间的内在拓扑特征,很好地发现蕴含在高维空间里的低维流形,有效地对非线性稀疏数据进行降维.通过对Benchmark数据集的实验表明了算法的有效性.CN-Isomap算法是Isomap算法的推广,不仅能有效地对非线性稀疏数据进行降维,同样也适用于数据非稀疏的情况.  相似文献   

15.
We construct Otto-Villani's coupling for general reversible diffusion processes on a Riemannian manifold. As an application, some new estimates are obtained for Wasserstein distances by using a Sobolev-Poincaré type inequality introduced by Lata?a and Oleszkiewicz. The corresponding concentration estimates of the measure are presented. Finally, our main result is applied to obtain the transportation cost inequalities on the path space with respect to both of the L2-distance and the intrinsic distance. In particular, Talagrand's inequality holds on the path space over a compact manifold.  相似文献   

16.
A new approach to derive Pareto front approximations with evolutionary computations is proposed here. At present, evolutionary multiobjective optimization algorithms derive a discrete approximation of the Pareto front (the set of objective maps of efficient solutions) by selecting feasible solutions such that their objective maps are close to the Pareto front. However, accuracy of such approximations is known only if the Pareto front is known, which makes their usefulness questionable. Here we propose to exploit also elements outside feasible sets to derive pairs of such Pareto front approximations that for each approximation pair the corresponding Pareto front lies, in a certain sense, in-between. Accuracies of Pareto front approximations by such pairs can be measured and controlled with respect to distance between elements of a pair. A rudimentary algorithm to derive pairs of Pareto front approximations is presented and the viability of the idea is verified on a limited number of test problems.  相似文献   

17.
Data sets in high-dimensional spaces are often concentrated near low-dimensional sets. Geometric Multi-Resolution Analysis (Allard, Chen, Maggioni, 2012) was introduced as a method for approximating (in a robust, multiscale fashion) a low-dimensional set around which data may concentrated and also providing dictionary for sparse representation of the data. Moreover, the procedure is very computationally efficient. We introduce an estimator for low-dimensional sets supporting the data constructed from the GMRA approximations. We exhibit (near optimal) finite sample bounds on its performance, and demonstrate the robustness of this estimator with respect to noise and model error. In particular, our results imply that, if the data is supported on a low-dimensional manifold, the proposed sparse representations result in an error which depends only on the intrinsic dimension of the manifold. (© 2014 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

18.
High‐dimensionality reduction techniques are very important tools in machine learning and data mining. The method of generalized low rank approximations of matrices (GLRAM) is a popular technique for dimensionality reduction and image compression. However, it suffers from heavily computational overhead in practice, especially for data with high dimension. In order to reduce the cost of this algorithm, we propose a randomized GLRAM algorithm based on randomized singular value decomposition (RSVD). The theoretical contribution of our work is threefold. First, we discuss the decaying property of singular values of the matrices during iterations of the GLRAM algorithm, and provide a target rank required in the RSVD process from a theoretical point of view. Second, we establish the relationship between the reconstruction errors generated by the standard GLRAM algorithm and the randomized GLRAM algorithm. It is shown that the reconstruction errors generated by the former and the latter are comparable, even if the solutions are computed inaccurately during iterations. Third, the convergence of the randomized GLRAM algorithm is investigated. Numerical experiments on some real‐world data sets illustrate the superiority of our proposed algorithm over its original counterpart and some state‐of‐the‐art GLRAM‐type algorithms.  相似文献   

19.
研究含参数$l$非方矩阵对广义特征值极小扰动问题所导出的一类复乘积流形约束矩阵最小二乘问题.与已有工作不同,本文直接针对复问题模型,结合复乘积流形的几何性质和欧式空间上的改进Fletcher-Reeves共轭梯度法,设计一类适用于问题模型的黎曼非线性共轭梯度求解算法,并给出全局收敛性分析.数值实验和数值比较表明该算法比参数$l=1$的已有算法收敛速度更快,与参数$l=n$的已有算法能得到相同精度的解.与部分其它流形优化相比与已有的黎曼Dai非线性共轭梯度法具有相当的迭代效率,与黎曼二阶算法相比单步迭代成本较低、总体迭代时间较少,与部分非流形优化算法相比在迭代效率上有明显优势.  相似文献   

20.
The problem of minimizing a continuously differentiable convex function over an intersection of closed convex sets is ubiquitous in applied mathematics. It is particularly interesting when it is easy to project onto each separate set, but nontrivial to project onto their intersection. Algorithms based on Newton’s method such as the interior point method are viable for small to medium-scale problems. However, modern applications in statistics, engineering, and machine learning are posing problems with potentially tens of thousands of parameters or more. We revisit this convex programming problem and propose an algorithm that scales well with dimensionality. Our proposal is an instance of a sequential unconstrained minimization technique and revolves around three ideas: the majorization-minimization principle, the classical penalty method for constrained optimization, and quasi-Newton acceleration of fixed-point algorithms. The performance of our distance majorization algorithms is illustrated in several applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号