共查询到20条相似文献,搜索用时 15 毫秒
2.
Stone’s dimensionality reduction principle has been confirmed on several occasions for independent observations. When dependence
is expressed with ϕ-mixing, a minimum distance estimate
is proposed for a smooth projection pursuit regression-type function θ∈Я, that is either additive or multiplicative, in the
presence of or without interactions. Upper bounds on the L
1-risk and the L
1-error of
are obtained, under restrictions on the order of decay of the mixing coefficient. The bounds show explicitly the addive effect
of ϕ-mixing on the error, and confirm the dimensionality reduction principle. 相似文献
3.
High‐dimensionality reduction techniques are very important tools in machine learning and data mining. The method of generalized low rank approximations of matrices (GLRAM) is a popular technique for dimensionality reduction and image compression. However, it suffers from heavily computational overhead in practice, especially for data with high dimension. In order to reduce the cost of this algorithm, we propose a randomized GLRAM algorithm based on randomized singular value decomposition (RSVD). The theoretical contribution of our work is threefold. First, we discuss the decaying property of singular values of the matrices during iterations of the GLRAM algorithm, and provide a target rank required in the RSVD process from a theoretical point of view. Second, we establish the relationship between the reconstruction errors generated by the standard GLRAM algorithm and the randomized GLRAM algorithm. It is shown that the reconstruction errors generated by the former and the latter are comparable, even if the solutions are computed inaccurately during iterations. Third, the convergence of the randomized GLRAM algorithm is investigated. Numerical experiments on some real‐world data sets illustrate the superiority of our proposed algorithm over its original counterpart and some state‐of‐the‐art GLRAM‐type algorithms. 相似文献
4.
We present our recent work on both linear and nonlinear data reduction methods and algorithms: for the linear case we discuss results on structure analysis of SVD of column-partitioned matrices and sparse low-rank approximation; for the nonlinear case we investigate methods for nonlinear dimensionality reduction and manifold learning. The problems we address have attracted great deal of interest in data mining and machine learning. 相似文献
5.
We consider estimating a random vector from its measurements in a fusion frame, in presence of noise and subspace erasures. A fusion frame is a collection of subspaces, for which the sum of the projection operators onto the subspaces is bounded below and above by constant multiples of the identity operator. We first consider the linear minimum mean-squared error (LMMSE) estimation of the random vector of interest from its fusion frame measurements in the presence of additive white noise. Each fusion frame measurement is a vector whose elements are inner products of an orthogonal basis for a fusion frame subspace and the random vector of interest. We derive bounds on the mean-squared error (MSE) and show that the MSE will achieve its lower bound if the fusion frame is tight. We then analyze the robustness of the constructed LMMSE estimator to erasures of the fusion frame subspaces. We limit our erasure analysis to the class of tight fusion frames and assume that all erasures are equally important. Under these assumptions, we prove that tight fusion frames consisting of equi-dimensional subspaces have maximum robustness (in the MSE sense) with respect to erasures of one subspace among all tight fusion frames, and that the optimal subspace dimension depends on signal-to-noise ratio (SNR). We also prove that tight fusion frames consisting of equi-dimensional subspaces with equal pairwise chordal distances are most robust with respect to two and more subspace erasures, among the class of equi-dimensional tight fusion frames. We call such fusion frames equi-distance tight fusion frames. We prove that the squared chordal distance between the subspaces in such fusion frames meets the so-called simplex bound, and thereby establish connections between equi-distance tight fusion frames and optimal Grassmannian packings. Finally, we present several examples for the construction of equi-distance tight fusion frames. 相似文献
6.
Dimensionality reduction is used to preserve significant properties of data in a low-dimensional space. In particular, data representation in a lower dimension is needed in applications, where information comes from multiple high dimensional sources. Data integration, however, is a challenge in itself.In this contribution, we consider a general framework to perform dimensionality reduction taking into account that data are heterogeneous. We propose a novel approach, called Deep Kernel Dimensionality Reduction which is designed for learning layers of new compact data representations simultaneously. The method can be also used to learn shared representations between modalities. We show by experiments on standard and on real large-scale biomedical data sets that the proposed method embeds data in a new compact meaningful representation, and leads to a lower classification error compared to the state-of-the-art methods. 相似文献
7.
The goal of dimensionality reduction or manifold learning for a given set of high-dimensional data points, is to find a low-dimensional
parametrization for them. Usually it is easy to carry out this parametrization process within a small region to produce a
collection of local coordinate systems. Alignment is the process to stitch those local systems together to produce a global
coordinate system and is done through the computation of a partial eigendecomposition of a so-called alignment matrix. In
this paper, we present an analysis of the alignment process, giving conditions under which the null space of the alignment
matrix recovers the global coordinate system up to an affine transformation. We also propose a post-processing step that can
determine the global coordinate system up to a rigid motion. This in turn shows that Local Tangent Space Alignment method
(LTSA) can recover a locally isometric embedding up to a rigid motion.
AMS subject classification (2000) 65F15, 62H30, 15A18 相似文献
8.
In this paper, motivated by the results in compressive phase retrieval, we study the robustness properties of dimensionality reduction with Gaussian random matrices having arbitrarily erased rows. We first study the robustness property against erasure for the almost norm preservation property of Gaussian random matrices by obtaining the optimal estimate of the erasure ratio for a small given norm distortion rate. As a consequence, we establish the robustness property of Johnson-Lindenstrauss lemma and the robustness property of restricted isometry property with corruption for Gaussian random matrices. Secondly, we obtain a sharp estimate for the optimal lower and upper bounds of norm distortion rates of Gaussian random matrices under a given erasure ratio. This allows us to establish the strong restricted isometry property with the almost optimal restricted isometry property(RIP) constants, which plays a central role in the study of phaseless compressed sensing. As a byproduct of our results, we also establish the robustness property of Gaussian random finite frames under erasure. 相似文献
9.
We consider the inversion problem for linear systems, which involves estimation of the unknown input vector. The inversion
problem is considered for a system with a vector output and a vector input assuming that the observed output is of higher
dimension than the unknown input. The problem is solved by using a controlled model in which the control stabilizes the deviations
of the model output from the system output. The stabilizing model control or its averaged form may be used as the estimate
of the unknown system input.
__________
Translated from Nelineinaya Dinamika i Upravlenie, No. 4, pp. 17–22, 2004. 相似文献
10.
We study ensembles of simple threshold classifiers for the categorization of high-dimensional data of low cardinality and give a compression bound on their prediction risk. Two approaches are utilized to produce such classifiers. One is based on univariate feature selection employing the area under the ROC curve as ranking criterion. The other approach uses a greedy selection strategy. The methods are applied to artificial data, published microarray expression profiles, and highly imbalanced data. 相似文献
11.
In this paper high order Parzen windows stated by means of basic window functions are studied for understanding some algorithms
in learning theory and randomized sampling in multivariate approximation. Learning rates are derived for the least-square
regression and density estimation on bounded domains under some decay conditions on the marginal distributions near the boundary.
These rates can be almost optimal when the marginal distributions decay fast and the order of the Parzen windows is large
enough. For randomized sampling in shift-invariant spaces, we consider the situation when the sampling points are neither
i.i.d. nor regular, but are noised from regular grids by probability density functions. The approximation orders are estimated
by means of the regularity of the approximated function and the density function and the order of the Parzen windows. 相似文献
13.
Summary We present a simple and extremely accurate procedure for approximating initial temperature for the heat equation on the line using a discrete time and spatial sampling. The procedure is based on the sinc expansion which for functions in a particular class yields a uniform exponential error bound with exponent depending on the number of spatial sample locations chosen. Further the temperature need only be sampled at one and the same temporal value for each of the spatial sampling points. For N spatial sample points, the approximation is reduced to solving a linear system with a (2 N+1)×(2 N+1) coefficient matrix. This matrix is a symmetric centrosymmetric Toeplitz matrix and hence can be determined by computing only 2 N+1 values using quadratures.Supported in part by a grant from the Texas State Advanced Research ProgramSupported by NSF MONTS grant #ISP8011449Supported in part by grants from NSA, NASA and TATRP 相似文献
14.
This study shows how data envelopment analysis (DEA) can be used to reduce vertical dimensionality of certain data mining databases. The study illustrates basic concepts using a real-world graduate admissions decision task. It is well known that cost sensitive mixed integer programming (MIP) problems are NP-complete. This study shows that heuristic solutions for cost sensitive classification problems can be obtained by solving a simple goal programming problem by that reduces the vertical dimension of the original learning dataset. Using simulated datasets and a misclassification cost performance metric, the performance of proposed goal programming heuristic is compared with the extended DEA-discriminant analysis MIP approach. The holdout sample results of our experiments shows that the proposed heuristic approach outperforms the extended DEA-discriminant analysis MIP approach. 相似文献
15.
Path loss prediction is a crucial task for the planning of networks in modern mobile communication systems. Learning machine-based
models seem to be a valid alternative to empirical and deterministic methods for predicting the propagation path loss. As
learning machine performance depends on the number of input features, a good way to get a more reliable model can be to use
techniques for reducing the dimensionality of the data. In this paper we propose a new approach combining learning machines
and dimensionality reduction techniques. We report results on a real dataset showing the efficiency of the learning machine-based
methodology and the usefulness of dimensionality reduction techniques in improving the prediction accuracy. 相似文献
16.
Dimensionality reduction is an important technique in surrogate modeling and machine learning. In this article, we propose a supervised dimensionality reduction method, “least squares regression principal component analysis” (LSR-PCA), applicable to both classification and regression problems. To show the efficacy of this method, we present different examples in visualization, classification, and regression problems, comparing it with several state-of-the-art dimensionality reduction methods. Finally, we present a kernel version of LSR-PCA for problems where the inputs are correlated nonlinearly. The examples demonstrate that LSR-PCA can be a competitive dimensionality reduction method. 相似文献
17.
Classical multidimensional scaling only works well when the noisy distances observed in a high dimensional space can be faithfully represented by Euclidean distances in a low dimensional space. Advanced models such as Maximum Variance Unfolding (MVU) and Minimum Volume Embedding (MVE) use Semi-Definite Programming (SDP) to reconstruct such faithful representations. While those SDP models are capable of producing high quality configuration numerically, they suffer two major drawbacks. One is that there exist no theoretically guaranteed bounds on the quality of the configuration. The other is that they are slow in computation when the data points are beyond moderate size. In this paper, we propose a convex optimization model of Euclidean distance matrices. We establish a non-asymptotic error bound for the random graph model with sub-Gaussian noise, and prove that our model produces a matrix estimator of high accuracy when the order of the uniform sample size is roughly the degree of freedom of a low-rank matrix up to a logarithmic factor. Our results partially explain why MVU and MVE often work well. Moreover, the convex optimization model can be efficiently solved by a recently proposed 3-block alternating direction method of multipliers. Numerical experiments show that the model can produce configurations of high quality on large data points that the SDP approach would struggle to cope with. 相似文献
18.
The canonical correlation (CANCOR) method for dimension reduction in a regression setting is based on the classical estimates of the first and second moments of the data, and therefore sensitive to outliers. In this paper, we study a weighted canonical correlation (WCANCOR) method, which captures a subspace of the central dimension reduction subspace, as well as its asymptotic properties. In the proposed WCANCOR method, each observation is weighted based on its Mahalanobis distance to the location of the predictor distribution. Robust estimates of the location and scatter, such as the minimum covariance determinant (MCD) estimator of Rousseeuw [P.J. Rousseeuw, Multivariate estimation with high breakdown point, Mathematical Statistics and Applications B (1985) 283-297], can be used to compute the Mahalanobis distance. To determine the number of significant dimensions in WCANCOR, a weighted permutation test is considered. A comparison of SIR, CANCOR and WCANCOR is also made through simulation studies to show the robustness of WCANCOR to outlying observations. As an example, the Boston housing data is analyzed using the proposed WCANCOR method. 相似文献
19.
Existing theories on shape digitization impose strong constraints on admissible shapes, and require error-free data. Consequently, these theories are not applicable to most real-world situations. In this paper, we propose a new approach that overcomes many of these limitations. It assumes that segmentation algorithms represent the detected boundary by a set of points whose deviation from the true contours is bounded. Given these error bounds, we reconstruct boundary connectivity by means of Delaunay triangulation and α-shapes. We prove that this procedure is guaranteed to result in topologically correct image segmentations under certain realistic conditions. Experiments on real and synthetic images demonstrate the good performance of the new method and confirm the predictions of our theory. 相似文献
20.
In this paper, we study an open and nested tandem queueing network, where the population constraint within each subnetwork is controlled by a semaphore queue. The total number of customers that may be present in the subnetwork can not exceed a given value. Each node has a constant service time and the arrival process to the queueing network follows an arbitrary distribution.A major characteristic of this queueing network is that the low layer flow is halted by the state of the high layer. We develop a simple and equivalent queueing network that has the same performance characteristics as the original queueing network. Using this model, the waiting time on the queueing network can be easily derived. It is interesting to see how the simplification process can be applied to multi-layered queueing network. 相似文献
|