首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
In this article, we propose a new estimation methodology to deal with PCA for high-dimension, low-sample-size (HDLSS) data. We first show that HDLSS datasets have different geometric representations depending on whether a ρ-mixing-type dependency appears in variables or not. When the ρ-mixing-type dependency appears in variables, the HDLSS data converge to an n-dimensional surface of unit sphere with increasing dimension. We pay special attention to this phenomenon. We propose a method called the noise-reduction methodology to estimate eigenvalues of a HDLSS dataset. We show that the eigenvalue estimator holds consistency properties along with its limiting distribution in HDLSS context. We consider consistency properties of PC directions. We apply the noise-reduction methodology to estimating PC scores. We also give an application in the discriminant analysis for HDLSS datasets by using the inverse covariance matrix estimator induced by the noise-reduction methodology.  相似文献   

2.
In this paper, we propose a new methodology to deal with PCA in high-dimension, low-sample-size (HDLSS) data situations. We give an idea of estimating eigenvalues via singular values of a cross data matrix. We provide consistency properties of the eigenvalue estimation as well as its limiting distribution when the dimension d and the sample size n both grow to infinity in such a way that n is much lower than d. We apply the new methodology to estimating PC directions and PC scores in HDLSS data situations. We give an application of the findings in this paper to a mixture model to classify a dataset into two clusters. We demonstrate how the new methodology performs by using HDLSS data from a microarray study of prostate cancer.  相似文献   

3.
Principal component analysis (PCA) is a widely used tool for data analysis and dimension reduction in applications throughout science and engineering. However, the principal components (PCs) can sometimes be difficult to interpret, because they are linear combinations of all the original variables. To facilitate interpretation, sparse PCA produces modified PCs with sparse loadings, i.e. loadings with very few non-zero elements. In this paper, we propose a new sparse PCA method, namely sparse PCA via regularized SVD (sPCA-rSVD). We use the connection of PCA with singular value decomposition (SVD) of the data matrix and extract the PCs through solving a low rank matrix approximation problem. Regularization penalties are introduced to the corresponding minimization problem to promote sparsity in PC loadings. An efficient iterative algorithm is proposed for computation. Two tuning parameter selection methods are discussed. Some theoretical results are established to justify the use of sPCA-rSVD when only the data covariance matrix is available. In addition, we give a modified definition of variance explained by the sparse PCs. The sPCA-rSVD provides a uniform treatment of both classical multivariate data and high-dimension-low-sample-size (HDLSS) data. Further understanding of sPCA-rSVD and some existing alternatives is gained through simulation studies and real data examples, which suggests that sPCA-rSVD provides competitive results.  相似文献   

4.
Generalised varying-coefficient models (GVC) are very important models. There are a considerable number of literature addressing these models. However, most of the existing literature are devoted to the estimation procedure. In this paper, we systematically investigate the statistical inference for GVC, which includes confidence band as well as hypothesis test. We establish the asymptotic distribution of the maximum discrepancy between the estimated functional coefficient and the true functional coefficient. We compare different approaches for the construction of confidence band and hypothesis test. Finally, the proposed statistical inference methods are used to analyse the data from China about contraceptive use there, which leads to some interesting findings.  相似文献   

5.
High-dimensional low sample size (HDLSS) data are becoming increasingly common in statistical applications. When the data can be partitioned into two classes, a basic task is to construct a classifier that can assign objects to the correct class. Binary linear classifiers have been shown to be especially useful in HDLSS settings and preferable to more complicated classifiers because of their ease of interpretability. We propose a computational tool called direction-projection-permutation (DiProPerm), which rigorously assesses whether a binary linear classifier is detecting statistically significant differences between two high-dimensional distributions. The basic idea behind DiProPerm involves working directly with the one-dimensional projections of the data induced by binary linear classifier. Theoretical properties of DiProPerm are studied under the HDLSS asymptotic regime whereby dimension diverges to infinity while sample size remains fixed. We show that certain variations of DiProPerm are consistent and that consistency is a nontrivial property of tests in the HDLSS asymptotic regime. The practical utility of DiProPerm is demonstrated on HDLSS gene expression microarray datasets. Finally, an empirical power study is conducted comparing DiProPerm to several alternative two-sample HDLSS tests to understand the advantages and disadvantages of each method.  相似文献   

6.
Rates of convergence for minimum contrast estimators   总被引:3,自引:0,他引:3  
Summary We shall present here a general study of minimum contrast estimators in a nonparametric setting (although our results are also valid in the classical parametric case) for independent observations. These estimators include many of the most popular estimators in various situations such as maximum likelihood estimators, least squares and other estimators of the regression function, estimators for mixture models or deconvolution... The main theorem relates the rate of convergence of those estimators to the entropy structure of the space of parameters. Optimal rates depending on entropy conditions are already known, at least for some of the models involved, and they agree with what we get for minimum contrast estimators as long as the entropy counts are not too large. But, under some circumstances (large entropies or changes in the entropy structure due to local perturbations), the resulting the rates are only suboptimal. Counterexamples are constructed which show that the phenomenon is real for non-parametric maximum likelihood or regression. This proves that, under purely metric assumptions, our theorem is optimal and that minimum contrast estimators happen to be suboptimal.  相似文献   

7.
The censored single-index model provides a flexible way for modelling the association between a response and a set of predictor variables when the response variable is randomly censored and the link function is unknown. It presents a technique for “dimension reduction” in semiparametric censored regression models and generalizes the existing accelerated failure time models for survival analysis. This paper proposes two methods for estimation of single-index models with randomly censored samples. We first transform the censored data into synthetic data or pseudo-responses unbiasedly, then obtain estimates of the index coefficients by the rOPG or rMAVE procedures of Xia (2006) [1]. Finally, we estimate the unknown nonparametric link function using techniques for univariate censored nonparametric regression. The estimators for the index coefficients are shown to be root-n consistent and asymptotically normal. In addition, the estimator for the unknown regression function is a local linear kernel regression estimator and can be estimated with the same efficiency as the parameters are known. Monte Carlo simulations are conducted to illustrate the proposed methodologies.  相似文献   

8.
Semiparametric models to describe the functional relationship between k groups of observations are broadly applied in statistical analysis, ranging from nonparametric ANOVA to proportional hazard (ph) rate models in survival analysis. In this paper we deal with the empirical assessment of the validity of such a model, which will be denoted as a “structural relationship model”. To this end Hadamard differentiability of a suitable goodness-of-fit measure in the k-sample case is proved. This yields asymptotic limit laws which are applied to construct tests for various semiparametric models, including the Cox ph model. Two types of asymptotics are obtained, first when the hypothesis of the semiparametric model under investigation holds true, and second for the case when a fixed alternative is present. The latter result can be used to validate the presence of a semiparametric model instead of simply checking the null hypothesis “the model holds true”. Finally, various bootstrap approximations are numerically investigated and a data example is analyzed.  相似文献   

9.
给定一个离散且有限随机变量的信息熵,求其对应的概率分布需要解多元非线性方程,文中提出了一个将n元信息熵方程化为至多(n-1)个一元非线性方程求解的算法,证明了算法的正确性,给出了算法误差估计;运用熵方程求解算法设计了一种基于信息熵的文本数字水印方案.  相似文献   

10.
Summary. We are concerned with a well-known characterization of the Shannon entropy by Faddeev, suitably re-examined in the frame of Ulam--Hyers "stability" of functional equations.¶By use of some results about number theoretical functions, we give a sufficient condition that the solutions of a suitable system of countably many functional inequalities approximate the Shannon entropy uniformly.  相似文献   

11.
This paper presents a fuzzy algorithm for controlling chaos in nonlinear systems via minimum entropy approach. The proposed fuzzy logic algorithm is used to minimize the Shannon entropy of a chaotic dynamics. The fuzzy laws are determined in such a way that the entropy function descends until the chaotic trajectory of the system is replaced by a regular one. The Logistic and the Henon maps as two discrete chaotic systems, and the Duffing equation as a continuous one are used to validate the proposed scheme and show the effectiveness of the control method in chaotic dynamical systems.  相似文献   

12.
Semiparametric models with both nonparametric and parametric components have become increasingly useful in many scientific fields, due to their appropriate representation of the trade-off between flexibility and efficiency of statistical models. In this paper we focus on semi-varying coefficient models (a.k.a. varying coefficient partially linear models) in a “large n, diverging p” situation, when both the number of parametric and nonparametric components diverges at appropriate rates, and we only consider the case p=o(n). Consistency of the estimator based on B-splines and asymptotic normality of the linear components are established under suitable assumptions. Interestingly (although not surprisingly) our analysis shows that the number of parametric components can diverge at a faster rate than the number of nonparametric components and the divergence rates of the number of the nonparametric components constrain the allowable divergence rates of the parametric components, which is a new phenomenon not established in the existing literature as far as we know. Finally, the finite sample behavior of the estimator is evaluated by some Monte Carlo studies.  相似文献   

13.
The fitting of finite mixture models is an ill-defined estimation problem, as completely different parameterizations can induce similar mixture distributions. This leads to multiple modes in the likelihood, which is a problem for frequentist maximum likelihood estimation, and complicates statistical inference of Markov chain Monte Carlo draws in Bayesian estimation. For the analysis of the posterior density of these draws, a suitable separation into different modes is desirable. In addition, a unique labelling of the component specific estimates is necessary to solve the label switching problem. This paper presents and compares two approaches to achieve these goals: relabelling under multimodality and constrained clustering. The algorithmic details are discussed, and their application is demonstrated on artificial and real-world data.  相似文献   

14.
This paper extends the results in Li and Loken [A unified theory of statistical analysis and inference for variance component models for dyadic data, Statist. Sinica 12 (2002) 519-535] on the statistical analysis of measurements taken on dyads to the situations in which more than one attribute are measured on each dyad. Starting from the covariance structure for the univariate case obtained in Li and Loken (2002), the covariance structure for the multivariate case is derived based on the group symmetry induced by the assumed exchangeability in the units. Our primary objective is to document the Gaussian likelihood and the sufficient statistics for multivariate dyadic data in closed form, so that they can be referenced by researchers as they analyze those data. The derivation carried out can also serve as an example of multivariate extension of univariate models based on exchangeability.  相似文献   

15.
Data are often affected by uncertainty. Uncertainty is usually referred to as randomness. Nonetheless, other sources of uncertainty may occur. In particular, the empirical information may also be affected by imprecision. Also in these cases it can be fruitful to analyze the underlying structure of the data. In this paper we address the problem of summarizing a sample of three-way imprecise data. In order to manage the different sources of uncertainty a twofold strategy is adopted. On the one hand, imprecise data are transformed into fuzzy sets by means of the so-called fuzzification process. The so-obtained fuzzy data are then analyzed by suitable generalizations of the Tucker3 and CANDECOMP/PARAFAC models, which are the two most popular three-way extensions of Principal Component Analysis. On the other hand, the statistical validity of the obtained underlying structure is evaluated by (nonparametric) bootstrapping. A simulation experiment is performed for assessing whether the use of fuzzy data is helpful in order to summarize three-way uncertain data. Finally, to show how our models work in practice, an application to real data is discussed.  相似文献   

16.
Measures of uncertainty in past and residual lifetime distributions have been proposed in the information-theoretic literature. Recently, Di Crescenzo and Longobardi (2006) introduced weighted differential entropy and its dynamic versions. These information-theoretic uncertainty measures are shift-dependent. In this paper, we study the weighted differential information measure for two-sided truncated random variables. This new measure is a generalization of recent dynamic weighted entropy measures. We study various properties of this measure, including its connection with weighted residual and past entropies, and we obtain its upper and lower bounds.  相似文献   

17.
Quantifying Dynamical Predictability: the Pseudo-Ensemble Approach   总被引:1,自引:0,他引:1  
The ensemble technique has been widely used in numerical weather prediction and extended-range forecasting. Current approaches to evaluate the predictability using the ensemble technique can be divided into two major groups. One is dynamical, including generating Lyapunov vectors, bred vectors, and singular vectors, sampling the fastest error-growing directions of the phase space, and examining the dependence of prediction efficiency on ensemble size. The other is statistical, including distributional analysis and quantifying prediction utility by the Shannon entropy and the relative entropy. Currently, with simple models, one could choose as many ensembles as possible, with each ensemble containing a large number of members. When the forecast models become increasingly complicated, however, one would only be able to afford a small number of ensembles, each with limited number of members, thus sacrificing estimation accuracy of the forecast errors. To uncover connections between different information theoretic approaches and between dynamical and statistical approaches, we propose an (ε, T)-entropy and scale-dependent Lyapunov exponent--based general theoretical framework to quantify information loss in ensemble forecasting. More importantly, to tremendously expedite computations, reduce data storage, and improve forecasting accuracy, we propose a technique for constructing a large number of "pseudo" ensembles from one single solution or scalar dataset. This pseudo-ensemble technique appears to be applicable under rather general conditions, one important situation being that observational data are available but the exact dynamical model is unknown.  相似文献   

18.
We propose an index to measure cooperation among different time-series based on the Rényi entropy of the eigenvalues of the signal correlation matrix and an optimization step. The index could be considered as a generalization of a previously known index, based instead on the Shannon entropy. The extension to Rényi entropy and the optimization step allow a better use of the information conveyed by the correlation matrix, especially when dealing with a small number of signals.  相似文献   

19.
Linear mixed models and penalized least squares   总被引:1,自引:0,他引:1  
Linear mixed-effects models are an important class of statistical models that are used directly in many fields of applications and also are used as iterative steps in fitting other types of mixed-effects models, such as generalized linear mixed models. The parameters in these models are typically estimated by maximum likelihood or restricted maximum likelihood. In general, there is no closed-form solution for these estimates and they must be determined by iterative algorithms such as EM iterations or general nonlinear optimization. Many of the intermediate calculations for such iterations have been expressed as generalized least squares problems. We show that an alternative representation as a penalized least squares problem has many advantageous computational properties including the ability to evaluate explicitly a profiled log-likelihood or log-restricted likelihood, the gradient and Hessian of this profiled objective, and an ECME update to refine this objective.  相似文献   

20.
This paper shows that multivariate distributions can be characterized as maximum entropy (ME) models based on the well-known general representation of density function of the ME distribution subject to moment constraints. In this approach, the problem of ME characterization simplifies to the problem of representing the multivariate density in the ME form, hence there is no need for case-by-case proofs by calculus of variations or other methods. The main vehicle for this ME characterization approach is the information distinguishability relationship, which extends to the multivariate case. Results are also formulated that encapsulate implications of the multiplication rule of probability and the entropy transformation formula for ME characterization. The dependence structure of multivariate ME distribution in terms of the moments and the support of distribution is studied. The relationships of ME distributions with the exponential family and with bivariate distributions having exponential family conditionals are explored. Applications include new ME characterizations of many bivariate distributions, including some singular distributions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号