首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
High dimensional data routinely arises in image analysis, genetic experiments, network analysis, and various other research areas. Many such datasets do not correspond to well-studied probability distributions, and in several applications the data-cloud prominently displays non-symmetric and non-convex shape features. We propose using spatial quantiles and their generalizations, in particular, the projection quantile, for describing, analyzing and conducting inference with multivariate data. Minimal assumptions are made about the nature and shape characteristics of the underlying probability distribution, and we do not require the sample size to be as high as the data-dimension. We present theoretical properties of the generalized spatial quantiles, and an algorithm to compute them quickly. Our quantiles may be used to obtain multidimensional confidence or credible regions that are not required to conform to a pre-determined shape. We also propose a new notion of multidimensional order statistics, which may be used to obtain multidimensional outliers. Many of the features revealed using a generalized spatial quantile-based analysis would be missed if the data was shoehorned into a well-known probabilistic configuration.  相似文献   

2.
In this paper, a fixed design regression model where the errors follow a strictly stationary process is considered. In this model the conditional mean function and the conditional variance function are unknown curves. Correlated errors when observations are missing in the response variable are assumed. Four nonparametric estimators of the conditional variance function based on local polynomial fitting are proposed. Expressions of the asymptotic bias and variance of these estimators are obtained. A simulation study illustrates the behavior of the proposed estimators.  相似文献   

3.
A class of discriminant rules which includes Fisher’s linear discriminant function and the likelihood ratio criterion is defined. Using asymptotic expansions of the distributions of the discriminant functions in this class, we derive a formula for cut-off points which satisfy some conditions on misclassification probabilities, and derive the optimal rules for some criteria. Some numerical experiments are carried out to examine the performance of the optimal rules for finite numbers of samples.  相似文献   

4.
PRIM analysis     
This paper analyzes a data mining/bump hunting technique known as PRIM [1]. PRIM finds regions in high-dimensional input space with large values of a real output variable. This paper provides the first thorough study of statistical properties of PRIM. Amongst others, we characterize the output regions PRIM produces, and derive rates of convergence for these regions. Since the dimension of the input variables is allowed to grow with the sample size, the presented results provide some insight about the qualitative behavior of PRIM in very high dimensions. Our investigations also reveal some shortcomings of PRIM, resulting in some proposals for modifications.  相似文献   

5.
The tail dependence indexes of a multivariate distribution describe the amount of dependence in the upper right tail or lower left tail of the distribution and can be used to analyse the dependence among extremal random events. This paper examines the tail dependence of multivariate t-distributions whose copulas are not explicitly accessible. The tractable formulas of tail dependence indexes of a multivariate t-distribution are derived in terms of the joint moments of its underlying multivariate normal distribution, and the monotonicity properties of these indexes with respect to the distribution parameters are established. Simulation results are presented to illustrate the results.  相似文献   

6.
Motivated by the likelihood functions of several incomplete categorical data, this article introduces a new family of distributions, grouped Dirichlet distributions (GDD), which includes the classical Dirichlet distribution (DD) as a special case. First, we develop distribution theory for the GDD in its own right. Second, we use this expanded family as a new tool for statistical analysis of incomplete categorical data. Starting with a GDD with two partitions, we derive its stochastic representation that provides a simple procedure for simulation. Other properties such as mixed moments, mode, marginal and conditional distributions are also derived. The general GDD with more than two partitions is considered in a parallel manner. Three data sets from a case-control study, a leprosy survey, and a neurological study are used to illustrate how the GDD can be used as a new tool for analyzing incomplete categorical data. Our approach based on GDD has at least two advantages over the commonly used approach based on the DD in both frequentist and conjugate Bayesian inference: (a) in some cases, both the maximum likelihood and Bayes estimates have closed-form expressions in the new approach, but not so when they are based on the commonly-used approach; and (b) even if a closed-form solution is not available, the EM and data augmentation algorithms in the new approach converge much faster than in the commonly-used approach.  相似文献   

7.
The celebrated U-conjecture states that under the Nn(0,In) distribution of the random vector X=(X1,…,Xn) in Rn, two polynomials P(X) and Q(X) are unlinkable if they are independent [see Kagan et al., Characterization Problems in Mathematical Statistics, Wiley, New York, 1973]. Some results have been established in this direction, although the original conjecture is yet to be proved in generality. Here, we demonstrate that the conjecture is true in an important special case of the above, where P and Q are convex nonnegative polynomials with P(0)=0.  相似文献   

8.
In this paper, a new measure of dependence is proposed. Our approach is based on transforming univariate data to the space where the marginal distributions are normally distributed and then, using the inverse transformation to obtain the distribution function in the original space. The pseudo-maximum likelihood method and the two-stage maximum likelihood approach are used to estimate the unknown parameters. It is shown that the estimated parameters are asymptotical normally distributed in both cases. Inference procedures for testing the independence are also studied.  相似文献   

9.
Orthant tail dependence of multivariate extreme value distributions   总被引:2,自引:0,他引:2  
The orthant tail dependence describes the relative deviation of upper- (or lower-) orthant tail probabilities of a random vector from similar orthant tail probabilities of a subset of its components, and can be used in the study of dependence among extreme values. Using the conditional approach, this paper examines the extremal dependence properties of multivariate extreme value distributions and their scale mixtures, and derives the explicit expressions of orthant tail dependence parameters for these distributions. Properties of the tail dependence parameters, including their relations with other extremal dependence measures used in the literature, are discussed. Various examples involving multivariate exponential, multivariate logistic distributions and copulas of Archimedean type are presented to illustrate the results.  相似文献   

10.
Wiener processes with random effects for degradation data   总被引:12,自引:0,他引:12  
This article studies the maximum likelihood inference on a class of Wiener processes with random effects for degradation data. Degradation data are special case of functional data with monotone trend. The setting for degradation data is one on which n independent subjects, each with a Wiener process with random drift and diffusion parameters, are observed at possible different times. Unit-to-unit variability is incorporated into the model by these random effects. EM algorithm is used to obtain the maximum likelihood estimators of the unknown parameters. Asymptotic properties such as consistency and convergence rate are established. Bootstrap method is used for assessing the uncertainties of the estimators. Simulations are used to validate the method. The model is fitted to bridge beam data and corresponding goodness-of-fit tests are carried out. Failure time distributions in terms of degradation level passages are calculated and illustrated.  相似文献   

11.
12.
In this paper, we propose auto-associative (AA) models to generalize Principal component analysis (PCA). AA models have been introduced in data analysis from a geometrical point of view. They are based on the approximation of the observations scatter-plot by a differentiable manifold. In this paper, they are interpreted as Projection pursuit models adapted to the auto-associative case. Their theoretical properties are established and are shown to extend the PCA ones. An iterative algorithm of construction is proposed and its principle is illustrated both on simulated and real data from image analysis.  相似文献   

13.
In this paper we show how, based on a decomposition of the likelihood ratio test for sphericity into two independent tests and a suitably developed decomposition of the characteristic function of the logarithm of the likelihood ratio test statistic to test independence in a set of variates, we may obtain extremely well-fitting near-exact distributions for both test statistics. Since both test statistics have the distribution of the product of independent Beta random variables, it is possible to obtain near-exact distributions for both statistics in the form of Generalized Near-Integer Gamma distributions or mixtures of these distributions. For the independence test statistic, numerical studies and comparisons with asymptotic distributions proposed by other authors show the extremely high accuracy of the near-exact distributions developed as approximations to the exact distribution. Concerning the sphericity test statistic, comparisons with formerly developed near-exact distributions show the advantages of these new near-exact distributions.  相似文献   

14.
Outcome-dependent sampling designs are commonly used in economics, market research and epidemiological studies. Case-control sampling design is a classic example of outcome-dependent sampling, where exposure information is collected on subjects conditional on their disease status. In many situations, the outcome under consideration may have multiple categories instead of a simple dichotomization. For example, in a case-control study, there may be disease sub-classification among the “cases” based on progression of the disease, or in terms of other histological and morphological characteristics of the disease. In this note, we investigate the issue of fitting prospective multivariate generalized linear models to such multiple-category outcome data, ignoring the retrospective nature of the sampling design. We first provide a set of necessary and sufficient conditions for the link functions that will allow for equivalence of prospective and retrospective inference for the parameters of interest. We show that for categorical outcomes, prospective-retrospective equivalence does not hold beyond the generalized multinomial logit link. We then derive an approximate expression for the bias incurred when link functions outside this class are used. Most popular models for ordinal response fall outside the multiplicative intercept class and one should be cautious while performing a naive prospective analysis of such data as the bias could be substantial. We illustrate the extent of bias through a real data example, based on the ongoing Prostate, Lung, Colorectal and Ovarian (PLCO) cancer screening trial by the National Cancer Institute. The simulations based on the real study illustrate that the bias approximations work well in practice.  相似文献   

15.
A new empirical likelihood approach is developed to analyze data from two-stage sampling designs, in which a primary sample of rough or proxy measures for the variables of interest and a validation subsample of exact information are available. The validation sample is assumed to be a simple random subsample from the primary one. The proposed empirical likelihood approach is capable of utilizing all the information from both the specific models and the two available samples flexibly. It maintains some nice features of the empirical likelihood method and improves the asymptotic efficiency of the existing inferential procedures. The asymptotic properties are derived for the new approach. Some numerical studies are carried out to assess the finite sample performance.  相似文献   

16.
A new way of choosing a suitable copula to model dependence is introduced. Instead of relying on a given parametric family of copulas or applying the other extreme of modelling dependence in a nonparametric way, an intermediate approach is proposed, based on a sequence of parametric models containing more and more dependency aspects. In contrast to a similar way of thinking in testing theory, the method here, intended for estimating the copula, often requires a somewhat larger number of steps. One approach is based on exponential families, another on contamination families. An extensive numerical investigation is supplied on a large number of well-known copulas. The method based on contamination families is recommended. A Gaussian start in this approximation looks very promising.  相似文献   

17.
It is well known that full knowledge of all conditional distributions will typically serve to completely characterize a bivariate distribution. Partial knowledge will often suffice. For example, knowledge of the conditional distribution of X given Y and the conditional mean of Y given X is often adequate to determine the joint distribution of X and Y. In this paper, we investigate the extent to which a conditional percentile function or a conditional mode function (of Y given X), together with knowledge of the conditional distribution of X given Y will determine the joint distribution. Finally, using this methodology a new characterization of the classical bivariate normal distribution is given.  相似文献   

18.
Asymptotic expansions of the distributions of typical estimators in canonical correlation analysis under nonnormality are obtained. The expansions include the Edgeworth expansions up to order O(1/n) for the parameter estimators standardized by the population standard errors, and the corresponding expansion by Hall's method with variable transformation. The expansions for the Studentized estimators are also given using the Cornish-Fisher expansion and Hall's method. The parameter estimators are dealt with in the context of estimation for the covariance structure in canonical correlation analysis. The distributions of the associated statistics (the structure of the canonical variables, the scaled log likelihood ratio and Rozeboom's between-set correlation) are also expanded. The robustness of the normal-theory asymptotic variances of the sample canonical correlations and associated statistics are shown when a latent variable model holds. Simulations are performed to see the accuracy of the asymptotic results in finite samples.  相似文献   

19.
Double-sampling designs are commonly used in real applications when it is infeasible to collect exact measurements on all variables of interest. Two samples, a primary sample on proxy measures and a validation subsample on exact measures, are available in these designs. We assume that the validation sample is drawn from the primary sample by the Bernoulli sampling with equal selection probability. An empirical likelihood based approach is proposed to estimate the parameters of interest. By allowing the number of constraints to grow as the sample size goes to infinity, the resulting maximum empirical likelihood estimator is asymptotically normal and its limiting variance-covariance matrix reaches the semiparametric efficiency bound. Moreover, the Wilks-type result of convergence to chi-squared distribution for the empirical likelihood ratio based test is established. Some simulation studies are carried out to assess the finite sample performances of the new approach.  相似文献   

20.
We consider normal ≡ Gaussian seemingly unrelated regressions (SUR) with incomplete data (ID). Imposing a natural minimal set of conditional independence constraints, we find a restricted SUR/ID model whose likelihood function and parameter space factor into the product of the likelihood functions and the parameter spaces of standard complete data multivariate analysis of variance models. Hence, the restricted model has a unimodal likelihood and permits explicit likelihood inference. In the development of our methodology, we review and extend existing results for complete data SUR models and the multivariate ID problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号