首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This article investigates likelihood inferences for high-dimensional factor analysis of time series data. We develop a matrix decomposition technique to obtain expressions of the likelihood functions and its derivatives. With such expressions, the traditional delta method that relies heavily on score function and Hessian matrix can be extended to high-dimensional cases. We establish asymptotic theories, including consistency and asymptotic normality. Moreover, fast computational algorithms are developed for estimation. Applications to high-dimensional stock price data and portfolio analysis are discussed. The technical proofs of the asymptotic results and the computer codes are available online.  相似文献   

2.
High-dimensional multivariate time series are challenging due to the dependent and high-dimensional nature of the data, but in many applications there is additional structure that can be exploited to reduce computing time along with statistical error. We consider high-dimensional vector autoregressive processes with spatial structure, a simple and common form of additional structure. We propose novel high-dimensional methods that take advantage of such structure without making model assumptions about how distance affects dependence. We provide nonasymptotic bounds on the statistical error of parameter estimators in high-dimensional settings and show that the proposed approach reduces the statistical error. An application to air pollution in the USA demonstrates that the estimation approach reduces both computing time and prediction error and gives rise to results that are meaningful from a scientific point of view, in contrast to high-dimensional methods that ignore spatial structure. In practice, these high-dimensional methods can be used to decompose high-dimensional multivariate time series into lower-dimensional multivariate time series that can be studied by other methods in more depth. Supplementary materials for this article are available online.  相似文献   

3.
In this paper, we consider a scale adjusted-type distance-based classifier for high-dimensional data. We first give such a classifier that can ensure high accuracy in misclassification rates for two-class classification. We show that the classifier is not only consistent but also asymptotically normal for high-dimensional data. We provide sample size determination so that misclassification rates are no more than a prespecified value. We propose a classification procedure called the misclassification rate adjusted classifier. We further develop the classifier to multiclass classification. We show that the classifier can still enjoy asymptotic properties and ensure high accuracy in misclassification rates for multiclass classification. Finally, we demonstrate the proposed classifier in actual data analyses by using a microarray data set.  相似文献   

4.
We focus on inference about high-dimensional mean vectors when the sample size is much fewer than the dimension. Such data situation occurs in many areas of modern science such as genetic microarrays, medical imaging, text recognition, finance, chemometrics, and so on. First, we give a given-radius confidence region for mean vectors. This inference can be utilized as a variable selection of high-dimensional data. Next, we give a given-width confidence interval for squared norm of mean vectors. This inference can be utilized in a classification procedure of high-dimensional data. In order to assure a prespecified coverage probability, we propose a two-stage estimation methodology and determine the required sample size for each inference. Finally, we demonstrate how the new methodologies perform by using a microarray data set.  相似文献   

5.
To extract information from high-dimensional data efficiently, visualization tools based on data projection methods have been developed and shown useful. However, a single two-dimensional visualization is often insufficient for capturing all or most interesting structures in complex high-dimensional datasets. For this reason, Tipping and Bishop developed mixture probabilistic principal component analysis (MPPCA) that separates data into multiple groups and enables a unique projection per group; that is, one probabilistic principal component analysis (PPCA) data visualization per group. Because the group labels are assigned to observations based on their high-dimensional coordinates, MPPCA works well to reveal homoscedastic structures in data that differ spatially. In the presence of heteroscedasticity, however, MPPCA may still mask noteworthy data structures. We propose a new method called covariance-guided MPPCA (C-MPPCA) that groups subsets of observations based on covariance, not locality, and, similar to MPPCA, displays them using PPCA. PPCA projects data in the dimensions with the highest variances, thus grouping by covariance makes sense and enables some data structures to be visible that were masked originally by MPPCA. We demonstrate the performance of C-MPPCA in an extensive simulation study. We also apply C-MPPCA to a real world dataset. Supplementary materials for this article are available online.  相似文献   

6.
We present a short selective review of causal inference from observational data, with a particular emphasis on the high-dimensional scenario where the number of measured variables may be much larger than sample size. Despite major identifiability problems, making causal inference from observational data very ill-posed, we outline a methodology providing useful bounds for causal effects. Furthermore, we discuss open problems in optimization, non-linear estimation and for assigning statistical measures of uncertainty, and we illustrate the benefits and limitations of high-dimensional causal inference for biological applications.  相似文献   

7.
We propose a model selection algorithm for high-dimensional clustered data. Our algorithm combines a classical penalized likelihood method with a composite likelihood approach in the framework of colored graphical Gaussian models. Our method is designed to identify high-dimensional dense networks with a large number of edges but sparse edge classes. Its empirical performance is demonstrated through simulation studies and a network analysis of a gene expression dataset.  相似文献   

8.
Abstract

We propose a rudimentary taxonomy of interactive data visualization based on a triad of data analytic tasks: finding Gestalt, posing queries, and making comparisons. These tasks are supported by three classes of interactive view manipulations: focusing, linking, and arranging views. This discussion extends earlier work on the principles of focusing and linking and sets them on a firmer base. Next, we give a high-level introduction to a particular system for multivariate data visualization—XGobi. This introduction is not comprehensive but emphasizes XGobi tools that are examples of focusing, linking, and arranging views; namely, high-dimensional projections, linked scatterplot brushing, and matrices of conditional plots. Finally, in a series of case studies in data visualization, we show the powers and limitations of particular focusing, linking, and arranging tools. The discussion is dominated by high-dimensional projections that form an extremely well-developed part of XGobi. Of particular interest are the illustration of asymptotic normality of high-dimensional projections (a theorem of Diaconis and Freedman), the use of high-dimensional cubes for visualizing factorial experiments, and a method for interactively generating matrices of conditional plots with high-dimensional projections. Although there is a unifying theme to this article, each section—in particular the case studies—can be read separately.  相似文献   

9.
We consider asymptotic distributions of maximum deviations of sample covariance matrices, a fundamental problem in high-dimensional inference of covariances. Under mild dependence conditions on the entries of the data matrices, we establish the Gumbel convergence of the maximum deviations. Our result substantially generalizes earlier ones where the entries are assumed to be independent and identically distributed, and it provides a theoretical foundation for high-dimensional simultaneous inference of covariances.  相似文献   

10.
We present a technique for clustering categorical data by generating many dissimilarity matrices and combining them. We begin by demonstrating our technique on low-dimensional categorical data and comparing it to several other techniques that have been proposed. We show through simulations and examples that our method is both more accurate and more stable. Then we give conditions under which our method should yield good results in general. Our method extends to high-dimensional categorical data of equal lengths by ensembling over many choices of explanatory variables. In this context, we compare our method with two other methods. Finally, we extend our method to high-dimensional categorical data vectors of unequal length by using alignment techniques to equalize the lengths. We give an example to show that our method continues to provide useful results, in particular, providing a comparison with phylogenetic trees. Supplementary material for this article is available online.  相似文献   

11.
We present a very fast algorithm for general matrix factorization of a data matrix for use in the statistical analysis of high-dimensional data via latent factors. Such data are prevalent across many application areas and generate an ever-increasing demand for methods of dimension reduction in order to undertake the statistical analysis of interest. Our algorithm uses a gradient-based approach which can be used with an arbitrary loss function provided the latter is differentiable. The speed and effectiveness of our algorithm for dimension reduction is demonstrated in the context of supervised classification of some real high-dimensional data sets from the bioinformatics literature.  相似文献   

12.
Distance weighted discrimination (DWD) was originally proposed to handle the data piling issue in the support vector machine. In this article, we consider the sparse penalized DWD for high-dimensional classification. The state-of-the-art algorithm for solving the standard DWD is based on second-order cone programming, however such an algorithm does not work well for the sparse penalized DWD with high-dimensional data. To overcome the challenging computation difficulty, we develop a very efficient algorithm to compute the solution path of the sparse DWD at a given fine grid of regularization parameters. We implement the algorithm in a publicly available R package sdwd. We conduct extensive numerical experiments to demonstrate the computational efficiency and classification performance of our method.  相似文献   

13.
高维空间中数据的相似性度量   总被引:5,自引:0,他引:5  
高维空间中数据之间的相似性度量是目前数据挖掘、信息处理与检索等领域所面临的一个重要问题.文章在总结分析了高维数据的特点以及现有的一些度量方法的基础上,提出了一种新的度量方式,该方法在对高维数据进行相似性度量之前,首先对原始数据空间进行网格划分.文章的最后对其有效性作了定量分析,实验证明,该方式是行之有效的.  相似文献   

14.
This paper gives a review of concentration inequalities which are widely employed in non-asymptotical analyses of mathematical statistics in a wide range of settings,from distribution-free to distribution-dependent,from sub-Gaussian to sub-exponential,sub-Gamma,and sub-Weibull random vari-ables,and from the mean to the maximum concentration.This review pro-vides results in these settings with some fresh new results.Given the increas-ing popularity of high-dimensional data and inference,results in the context of high-dimensional linear and Poisson regressions are also provided.We aim to illustrate the concentration inequalities with known constants and to improve existing bounds with sharper constants.  相似文献   

15.
Inference for spatial generalized linear mixed models (SGLMMs) for high-dimensional non-Gaussian spatial data is computationally intensive. The computational challenge is due to the high-dimensional random effects and because Markov chain Monte Carlo (MCMC) algorithms for these models tend to be slow mixing. Moreover, spatial confounding inflates the variance of fixed effect (regression coefficient) estimates. Our approach addresses both the computational and confounding issues by replacing the high-dimensional spatial random effects with a reduced-dimensional representation based on random projections. Standard MCMC algorithms mix well and the reduced-dimensional setting speeds up computations per iteration. We show, via simulated examples, that Bayesian inference for this reduced-dimensional approach works well both in terms of inference as well as prediction; our methods also compare favorably to existing “reduced-rank” approaches. We also apply our methods to two real world data examples, one on bird count data and the other classifying rock types. Supplementary material for this article is available online.  相似文献   

16.
We numerically investigate the ability of a statistic to detect determinism in time series generated by high-dimensional continuous chaotic systems. This recently introduced statistic (denoted VE2) is derived from the averaged false nearest neighbors method for analyzing data. Using surrogate data tests, we show that the proposed statistic is able to discriminate high-dimensional chaotic data from their stochastic counterparts. By analyzing the effect of the length of the available data, we show that the proposed criterion is efficient for relatively short time series. Finally, we apply the method to real-world data from biomechanics, namely postural sway time series. In this case, the results led us to exclude the hypothesis of nonlinear deterministic underlying dynamics for the observed phenomena.  相似文献   

17.
This paper gives a review of concentration inequalities which are widely employed in non-asymptotical analyses of mathematical statistics in awide range of settings,fromdistribution-free to distribution-dependent,from sub-Gaussian to sub-exponential,sub-Gamma,and sub-Weibull random variables,and from the mean to the maximum concentration.This review provides results in these settings with some fresh new results.Given the increasing popularity of high-dimensional data and inference,results in the context of high-dimensional linear and Poisson regressions are also provided.We aim to illustrate the concentration inequalities with known constants and to improve existing bounds with sharper constants.  相似文献   

18.
针对传统DBSCAN算法对高维数据集聚类效果不佳且参数的选取敏感问题,提出一种新的基于相似性度量的改进DBSCAN算法.该算法构造了测地距离和共享最近邻的数据点之间的相似度矩阵,克服欧式距离对高维数据的局限性,更好地刻画数据集的真实情况.通过分析数据的分布特征来自适应确定Eps和MinPts参数.实验结果表明,所提GS-DBSCAN算法能够有效地对复杂分布的数据进行聚类,且在高维数据的聚类准确率高于对比算法,验证了算法的准确性和可行性.  相似文献   

19.
Risk management technology applied to high-dimensional portfolios needs simple and fast methods for calculation of value at risk (VaR). The multivariate normal framework provides a simple off-the-shelf methodology but lacks the heavy-tailed distributional properties that are observed in data. A principle component-based method (tied closely to the elliptical structure of the distribution) is therefore expected to be unsatisfactory. Here, we propose and analyze a technology that is based on independent component analysis (ICA). We study the proposed ICVaR methodology in an extensive simulation study and apply it to a high-dimensional portfolio situation. Our analysis yields very accurate VaRs.  相似文献   

20.
将投影寻踪动态聚类模型引入到房地产投资环境评价方法中.针对房地产投资环境评价所面临的多因素高维复杂性问题,该模型能够完全根据样本数据特性将高维数据通过投影向量投影到低维数据,同时实现对低维数据的排序和自动聚类分析,进而通过研究低维数据以实现对高维数据的研究.最后通过辽宁省工业地产投资环境评价实例验证了该模型在房地产投资环境评价中的适用性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号