首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
针对经典的流形学习算法Isomap在非线性数据稀疏时降维效果下降甚至失效的问题,提出改进的切近邻等距特征映射算法(Cut-Neighbors Isometric feature mapping,CN-Isomap).该算法在数据稀疏的情况下首先通过有效识别样本点的"流形邻居"来剔除近邻图上的"短路"边,然后再通过最短路径算法拟合测地线距离,使得拟合的测地线距离不会偏离流形区域,从而低维嵌入映射能够正确地反映高维输入空间样本点间的内在拓扑特征,很好地发现蕴含在高维空间里的低维流形,有效地对非线性稀疏数据进行降维.通过对Benchmark数据集的实验表明了算法的有效性.CN-Isomap算法是Isomap算法的推广,不仅能有效地对非线性稀疏数据进行降维,同样也适用于数据非稀疏的情况.  相似文献   

2.
In this paper we develop a numerical method for computing higher order local approximations of center manifolds near steady states in Hamiltonian systems. The underlying system is assumed to be large in the sense that a large sparse Jacobian at the equilibrium occurs, for which only a linear solver and a low-dimensional invariant subspace is available. Our method combines this restriction from linear algebra with the requirement that the center manifold is parametrized by a symplectic mapping and that the reduced equation preserves the Hamiltonian form. Our approach can be considered as a special adaptation of a general method from Numer. Math. 80 (1998) 1-38 to the Hamiltonian case such that approximations of the reduced Hamiltonian are obtained simultaneously. As an application we treat a finite difference system for an elliptic problem on an infinite strip.  相似文献   

3.
In this paper, we extend the class of kernel methods, the so-called diffusion maps (DM) and its local kernel variants to approximate second-order differential operators defined on smooth manifolds with boundaries that naturally arise in elliptic PDE models. To achieve this goal, we introduce the ghost point diffusion maps (GPDM) estimator on an extended manifold, identified by the set of point clouds on the unknown original manifold together with a set of ghost points, specified along the estimated tangential direction at the sampled points on the boundary. The resulting GPDM estimator restricts the standard DM matrix to a set of extrapolation equations that estimates the function values at the ghost points. This adjustment is analogous to the classical ghost point method in a finite-difference scheme for solving PDEs on flat domains. As opposed to the classical DM, which diverges near the boundary, the proposed GPDM estimator converges pointwise even near the boundary. Applying the consistent GPDM estimator to solve well-posed elliptic PDEs with classical boundary conditions (Dirichlet, Neumann, and Robin), we establish the convergence of the approximate solution under appropriate smoothness assumptions. We numerically validate the proposed mesh-free PDE solver on various problems defined on simple submanifolds embedded in Euclidean spaces as well as on an unknown manifold. Numerically, we also found that the GPDM is more accurate compared to DM in solving elliptic eigenvalue problems on bounded smooth manifolds. © 2021 Wiley Periodicals LLC.  相似文献   

4.
Regularization of covariance matrices in high dimensions usually either is based on a known ordering of variables or ignores the ordering entirely. This article proposes a method for discovering meaningful orderings of variables based on their correlations using the Isomap, a nonlinear dimension reduction technique designed for manifold embeddings. These orderings are then used to construct a sparse covariance estimator, which is block-diagonal and/or banded. Finding an ordering to which banding can be applied is desirable because banded estimators have been shown to be consistent in high dimensions. We show that in situations where the variables do have such a structure, the Isomap does very well at discovering it, and the resulting regularized estimator performs better for covariance estimation than other regularization methods that ignore variable order, such as thresholding. We also propose a bootstrap approach to constructing the neighborhood graph used by the Isomap, and show it leads to better estimation. We illustrate our method on data on protein consumption, where the variables (food types) have a structure but it cannot be easily described a priori, and on a gene expression dataset. Supplementary materials are available online.  相似文献   

5.
We discuss the problem of sparse representation of domains in ℝ d . We demonstrate how the recently developed general theory of greedy approximation in Banach spaces can be used in this problem. The use of greedy approximation has two important advantages: (1) it works for an arbitrary dictionary of sets used for sparse representation and (2) the method of approximation does not depend on smoothness properties of the domains and automatically provides a near optimal rate of approximation for domains with different smoothness properties. We also give some lower estimates of the approximation error and discuss a specific greedy algorithm for approximation of convex domains in ℝ2.  相似文献   

6.

In this paper, we mainly study how to estimate the error density in the ultrahigh dimensional sparse additive model, where the number of variables is larger than the sample size. First, a smoothing method based on B-splines is applied to the estimation of regression functions. Second, an improved two-stage refitted cross-validation (RCV) procedure by random splitting technique is used to obtain the residuals of the model, and then the residual-based kernel method is applied to estimate the error density function. Under suitable sparse conditions, the large sample properties of the estimator including the weak and strong consistency, as well as normality and the law of the iterated logarithm are obtained. Especially, the relationship between the sparsity and the convergence rate of the kernel density estimator is given. The methodology is illustrated by simulations and a real data example, which suggests that the proposed method performs well.

  相似文献   

7.
We discuss adaptive sparse grid algorithms for stochastic differential equations with a particular focus on applications to electromagnetic scattering by structures with holes of uncertain size, location, and quantity. Stochastic collocation (SC) methods are used in combination with an adaptive sparse grid approach based on nested Gauss-Patterson grids. As an error estimator we demonstrate how the nested structure allows an effective error estimation through Richardson extrapolation. This is shown to allow excellent error estimation and it also provides an efficient means by which to estimate the solution at the next level of the refinement. We introduce an adaptive approach for the computation of problems with discrete random variables and demonstrate its efficiency for scattering problems with a random number of holes. The results are compared with results based on Monte Carlo methods and with Stroud based integration, confirming the accuracy and efficiency of the proposed techniques.  相似文献   

8.
The computation of Gaussian orthant probabilities has been extensively studied for low-dimensional vectors. Here, we focus on the high-dimensional case and we present a two-step procedure relying on both deterministic and stochastic techniques. The proposed estimator relies indeed on splitting the probability into a low-dimensional term and a remainder. While the low-dimensional probability can be estimated by fast and accurate quadrature, the remainder requires Monte Carlo sampling. We further refine the estimation by using a novel asymmetric nested Monte Carlo (anMC) algorithm for the remainder and we highlight cases where this approximation brings substantial efficiency gains. The proposed methods are compared against state-of-the-art techniques in a numerical study, which also calls attention to the advantages and drawbacks of the procedure. Finally, the proposed method is applied to derive conservative estimates of excursion sets of expensive to evaluate deterministic functions under a Gaussian random field prior, without requiring a Markov assumption. Supplementary material for this article is available online.  相似文献   

9.
Abstract

The existence of outliers in a data set and how to deal with them is an important problem in statistics. The minimum volume ellipsoid (MVE) estimator is a robust estimator of location and covariate structure; however its use has been limited because there are few computationally attractive methods. Determining the MVE consists of two parts—finding the subset of points to be used in the estimate and finding the ellipsoid that covers this set. This article addresses the first problem. Our method will also allow us to compute the minimum covariance determinant (MCD) estimator. The proposed method of subset selection is called the effective independence distribution (EID) method, which chooses the subset by minimizing determinants of matrices containing the data. This method is deterministic, yielding reproducible estimates of location and scatter for a given data set. The EID method of finding the MVE is applied to several regression data sets where the true estimate is known. Results show that the EID method, when applied to these data sets, produces the subset of data more quickly than conventional procedures and that there is less than 6% relative error in the estimates. We also give timing results illustrating the feasibility of our method for larger data sets. For the case of 10,000 points in 10 dimensions, the compute time is under 25 minutes.  相似文献   

10.
We consider a new method for sparse covariance matrix estimation which is motivated by previous results for the so-called Stein-type estimators. Stein proposed a method for regularizing the sample covariance matrix by shrinking together the eigenvalues; the amount of shrinkage is chosen to minimize an unbiased estimate of the risk (UBEOR) under the entropy loss function. The resulting estimator has been shown in simulations to yield significant risk reductions over the maximum likelihood estimator. Our method extends the UBEOR minimization problem by adding an ?1 penalty on the entries of the estimated covariance matrix, which encourages a sparse estimate. For a multivariate Gaussian distribution, zeros in the covariance matrix correspond to marginal independences between variables. Unlike the ?1-penalized Gaussian likelihood function, our penalized UBEOR objective is convex and can be minimized via a simple block coordinate descent procedure. We demonstrate via numerical simulations and an analysis of microarray data from breast cancer patients that our proposed method generally outperforms other methods for sparse covariance matrix estimation and can be computed efficiently even in high dimensions.  相似文献   

11.
This paper is a continuation of the work in [11] and [2] on the problem of estimating by a linear estimator, N unobservable input vectors, undergoing the same linear transformation, from noise-corrupted observable output vectors. Whereas in the aforementioned papers, only the matrix representing the linear transformation was assumed uncertain, here we are concerned with the case in which the second order statistics of the noise vectors (i.e., their covariance matrices) are also subjected to uncertainty. We seek a robust mean-squared error estimator immuned against both sources of uncertainty. We show that the optimal robust mean-squared error estimator has a special form represented by an elementary block circulant matrix, and moreover when the uncertainty sets are ellipsoidal-like, the problem of finding the optimal estimator matrix can be reduced to solving an explicit semidefinite programming problem, whose size is independent of N. The research was partially supported by BSF grant #2002038  相似文献   

12.

This paper is concerned with the error density estimation in high-dimensional sparse linear model, where the number of variables may be larger than the sample size. An improved two-stage refitted cross-validation procedure by random splitting technique is used to obtain the residuals of the model, and then traditional kernel density method is applied to estimate the error density. Under suitable sparse conditions, the large sample properties of the estimator including the consistency and asymptotic normality, as well as the law of the iterated logarithm are obtained. Especially, we gave the relationship between the sparsity and the convergence rate of the kernel density estimator. The simulation results show that our error density estimator has a good performance. A real data example is presented to illustrate our methods.

  相似文献   

13.
针对高维数据集常常存在冗余和维数灾难,在其上直接构造覆盖模型难以充分反映数据分布信息的问题,提出一种基于稀疏降维近似凸壳覆盖模型.首先采用同伦算法求解稀疏表示中l_1优化问题,通过稀疏约束自动获取合理近邻数并构建图,再通过LPP(Locality Preserving Projections)来进行局部保持投影,进而实现对高维空间快速有效地降维,最后在低维空间通过构造近似凸壳覆盖实现一类分类.在UCI数据库,MNIST手写体数据库和MIT-CBCL人脸识别数据库上的实验结果证实了方法的有效性,与现有的一类分类算法相比,提出的覆盖模型具有更高的分类正确率.  相似文献   

14.
The presence of groups containing high leverage outliers makes linear regression a difficult problem due to the masking effect. The available high breakdown estimators based on Least Trimmed Squares often do not succeed in detecting masked high leverage outliers in finite samples. An alternative to the LTS estimator, called Penalised Trimmed Squares (PTS) estimator, was introduced by the authors in Zioutas and Avramidis (2005) Acta Math Appl Sin 21:323–334, Zioutas et al. (2007) REVSTAT 5:115–136 and it appears to be less sensitive to the masking problem. This estimator is defined by a Quadratic Mixed Integer Programming (QMIP) problem, where in the objective function a penalty cost for each observation is included which serves as an upper bound on the residual error for any feasible regression line. Since the PTS does not require presetting the number of outliers to delete from the data set, it has better efficiency with respect to other estimators. However, due to the high computational complexity of the resulting QMIP problem, exact solutions for moderately large regression problems is infeasible. In this paper we further establish the theoretical properties of the PTS estimator, such as high breakdown and efficiency, and propose an approximate algorithm called Fast-PTS to compute the PTS estimator for large data sets efficiently. Extensive computational experiments on sets of benchmark instances with varying degrees of outlier contamination, indicate that the proposed algorithm performs well in identifying groups of high leverage outliers in reasonable computational time.  相似文献   

15.
We introduce the concept of a Clifford–Weyl structure on a conformal manifold, which consists of an even Clifford structure parallel with respect to the tensor product of a metric connection on the Clifford bundle and a Weyl structure on the manifold. We show that the Weyl structure is necessarily closed except for some “generic” low-dimensional instances, where explicit examples of non-closed Clifford–Weyl structures can be constructed.  相似文献   

16.
In practical applications related to, for instance, machine learning, data mining and pattern recognition, one is commonly dealing with noisy data lying near some low-dimensional manifold. A well-established tool for extracting the intrinsically low-dimensional structure from such data is principal component analysis (PCA). Due to the inherent limitations of this linear method, its extensions to extraction of nonlinear structures have attracted increasing research interest in recent years. Assuming a generative model for noisy data, we develop a probabilistic approach for separating the data-generating nonlinear functions from noise. We demonstrate that ridges of the marginal density induced by the model are viable estimators for the generating functions. For projecting a given point onto a ridge of its estimated marginal density, we develop a generalized trust region Newton method and prove its convergence to a ridge point. Accuracy of the model and computational efficiency of the projection method are assessed via numerical experiments where we utilize Gaussian kernels for nonparametric estimation of the underlying densities of the test datasets.  相似文献   

17.
We treat the convergence of adaptive lowest-order FEM for some elliptic obstacle problem with affine obstacle. For error estimation, we use a residual error estimator which is an extended version of the estimator from [2] and additionally controls the data oscillations. The main result states that an appropriately weighted sum of energy error, edge residuals, and data oscillations satisfies a contraction property that leads to convergence. In addition, we discuss the generalization to the case of inhomogeneous Dirichlet data and non-affine obstacles χ ∈ H2(Ω) for which similar results are obtained. (© 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

18.
We consider a panel data semiparametric partially linear regression model with an unknown parameter vector for the linear parametric component, an unknown nonparametric function for the nonlinear component, and a one-way error component structure which allows unequal error variances (referred to as heteroscedasticity). We develop procedures to detect heteroscedasticity and one-way error component structure, and propose a weighted semiparametric least squares estimator (WSLSE) of the parametric component in the presence of heteroscedasticity and/or one-way error component structure. This WSLSE is asymptotically more efficient than the usual semiparametric least squares estimator considered in the literature. The asymptotic properties of the WSLSE are derived. The nonparametric component of the model is estimated by the local polynomial method. Some simulations are conducted to demonstrate the finite sample performances of the proposed testing and estimation procedures. An example of application on a set of panel data of medical expenditures in Australia is also illustrated.  相似文献   

19.
Maximum likelihood (ML) estimation of spatial autoregressive models for large spatial data sets is well established by making use of the commonly sparse nature of the contiguity matrix on which spatial dependence is built. Adding a measurement error that naturally separates the spatial process from the measurement error process are not well established in the literature, however, and ML estimation of such models to large data sets is challenging. Recently a reduced rank approach was suggested which re-expresses and approximates such a model as a spatial random effects model (SRE) in order to achieve fast fitting of large data sets by fitting the corresponding SRE. In this paper we propose a fast and exact method to accomplish ML estimation and restricted ML estimation of complexity of \(O(n^{3/2})\) operations when the contiguity matrix is based on a local neighbourhood. The methods are illustrated using the well known data set on house prices in Lucas County in Ohio.  相似文献   

20.
Dimensionality reduction is used to preserve significant properties of data in a low-dimensional space. In particular, data representation in a lower dimension is needed in applications, where information comes from multiple high dimensional sources. Data integration, however, is a challenge in itself.In this contribution, we consider a general framework to perform dimensionality reduction taking into account that data are heterogeneous. We propose a novel approach, called Deep Kernel Dimensionality Reduction which is designed for learning layers of new compact data representations simultaneously. The method can be also used to learn shared representations between modalities. We show by experiments on standard and on real large-scale biomedical data sets that the proposed method embeds data in a new compact meaningful representation, and leads to a lower classification error compared to the state-of-the-art methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号