首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A clustering method is presented for analysing multivariate binary data with missing values. When not all values are observed, Govaert3 has studied the relations between clustering methods and statistical models. The author has shown how the identification of a mixture of Bernoulli distributions with the same parameter for all clusters and for all variables corresponds to a clustering criterion which uses L1 distance characterizing the MNDBIN method (Marchetti8). He first generalized this model by selecting parameters which can depend on variables and finally by selecting parameters which can depend both on variables and on clusters. We use the previous models to derive a clustering method adapted to missing data. This method optimizes a criterion by a standard iterative partitioning algorithm which removes the necessity either to ignore objects or to substitute the missing data. We study several versions of this algorithm and, finally, a brief account is given of the application of this method to some simulated data.  相似文献   

2.
In this paper, we introduce a robust extension of the three‐factor model of Diebold and Li (J. Econometrics, 130: 337–364, 2006) using the class of symmetric scale mixtures of normal distributions. Specific distributions examined include the multivariate normal, Student‐t, slash, and variance gamma distributions. In the presence of non‐normality in the data, these distributions provide an appealing robust alternative to the routine use of the normal distribution. Using a Bayesian paradigm, we developed an efficient MCMC algorithm for parameter estimation. Moreover, the mixing parameters obtained as a by‐product of the scale mixture representation can be used to identify outliers. Our results reveal that the Diebold–Li models based on the Student‐t and slash distributions provide significant improvement in in‐sample fit and out‐of‐sample forecast to the US yield data than the usual normal‐based model. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

3.
We establish computationally flexible tools for the analysis of multivariate skew normal mixtures when missing values occur in data. To facilitate the computation and simplify the theoretical derivation, two auxiliary permutation matrices are incorporated into the model for the determination of observed and missing components of each observation and are manifestly effective in reducing the computational complexity. We present an analytically feasible EM algorithm for the supervised learning of parameters as well as missing observations. The proposed mixture analyzer, including the most commonly used Gaussian mixtures as a special case, allows practitioners to handle incomplete multivariate data sets in a wide range of considerations. The methodology is illustrated through a real data set with varying proportions of synthetic missing values generated by MCAR and MAR mechanisms and shown to perform well on classification tasks.  相似文献   

4.
In this paper, we introduce a new family of multivariate distributions as the scale mixture of the multivariate power exponential distribution introduced by Gómez et al. (Comm. Statist. Theory Methods 27(3) (1998) 589) and the inverse generalized gamma distribution. Since the resulting family includes the multivariate t distribution and the multivariate generalization of the univariate GT distribution introduced by McDonald and Newey (Econometric Theory 18 (11) (1988) 4039) we call this family as the “multivariate generalized t-distributions family”, or MGT for short. We show that this family of distributions belongs to the elliptically contoured distributions family, and investigate the properties. We give the stochastic representation of a random variable distributed as a multivariate generalized t distribution. We give the marginal distribution, the conditional distribution and the distribution of the quadratic forms. We also investigate the other properties, such as, asymmetry, kurtosis and the characteristic function.  相似文献   

5.
Maximum likelihood estimation of the multivariatetdistribution, especially with unknown degrees of freedom, has been an interesting topic in the development of the EM algorithm. After a brief review of the EM algorithm and its application to finding the maximum likelihood estimates of the parameters of thetdistribution, this paper provides new versions of the ECME algorithm for maximum likelihood estimation of the multivariatetdistribution from data with possibly missing values. The results show that the new versions of the ECME algorithm converge faster than the previous procedures. Most important, the idea of this new implementation is quite general and useful for the development of the EM algorithm. Comparisons of different methods based on two datasets are presented.  相似文献   

6.
The problem of inferring a finite binary sequence w *∈{−1, 1}n is considered. It is supposed that at epochs t=1, 2,…, the learner is provided with random half‐space data in the form of finite binary sequences u (t)∈{−1, 1}n which have positive inner‐product with w *. The goal of the learner is to determine the underlying sequence w * in an efficient, on‐line fashion from the data { u (t), t≥1}. In this context, it is shown that the randomized, on‐line directed drift algorithm produces a sequence of hypotheses {w(t)∈{−1, 1}n, t≥1} which converges to w * in finite time with probability 1. It is shown that while the algorithm has a minimal space complexity of 2n bits of scratch memory, it has exponential time complexity with an expected mistake bound of order Ω(e0.139n). Batch incarnations of the algorithm are introduced which allow for massive improvements in running time with a relatively small cost in space (batch size). In particular, using a batch of 𝒪(n log n) examples at each update epoch reduces the expected mistake bound of the (batch) algorithm to 𝒪(n) (in an asynchronous bit update mode) and 𝒪(1) (in a synchronous bit update mode). The problem considered here is related to binary integer programming and to learning in a mathematical model of a neuron. ©1999 John Wiley & Sons, Inc. Random Struct. Alg., 14, 345–381 (1999)  相似文献   

7.
Copulas are popular as models for multivariate dependence because they allow the marginal densities and the joint dependence to be modeled separately. However, they usually require that the transformation from uniform marginals to the marginals of the joint dependence structure is known. This can only be done for a restricted set of copulas, for example, a normal copula. Our article introduces copula-type estimators for flexible multivariate density estimation which also allow the marginal densities to be modeled separately from the joint dependence, as in copula modeling, but overcomes the lack of flexibility of most popular copula estimators. An iterative scheme is proposed for estimating copula-type estimators and its usefulness is demonstrated through simulation and real examples. The joint dependence is modeled by mixture of normals and mixture of normal factor analyzer models, and mixture of t and mixture of t-factor analyzer models. We develop efficient variational Bayes algorithms for fitting these in which model selection is performed automatically. Based on these mixture models, we construct four classes of copula-type densities which are far more flexible than current popular copula densities, and outperform them in a simulated dataset and several real datasets. Supplementary material for this article is available online.  相似文献   

8.
M. Falk  R. Michel 《Extremes》2009,12(1):33-51
It has recently been shown by Rootzén and Tajvidi (Bernoulli, 12:917–930, 2006) that modelling exceedances of a random variable over a high threshold (peaks-over-threshold approach [POT]) can also in the multivariate setup be done rationally only by a multivariate generalized Pareto distribution (GPD). The selection of a proper threshold is, however, a crucial problem. The contribution of this paper is twofold: We develop first a non asymptotic and exact level-α test based on the single-sample t-test, which checks whether multivariate data are actually generated by a multivariate GPD. Secondly, this procedure is utilized for the derivation of a t-test based threshold selection rule in multivariate peaks-over-threshold models. The application to a hydrological data set illustrates this approach.   相似文献   

9.
The GGH family of multivariate distributions is obtained by scale mixing on the Exponential Power distribution using the Extended Generalised Inverse Gaussian distribution. The resulting GGH family encompasses the multivariate generalised hyperbolic (GH), which itself contains the multivariate t and multivariate Variance-Gamma (VG) distributions as special cases. It also contains the generalised multivariate t distribution [O. Arslan, Family of multivariate generalised t distribution, Journal of Multivariate Analysis 89 (2004) 329–337] and a new generalisation of the VG as special cases. Our approach unifies into a single GH-type family the hitherto separately treated t-type [O. Arslan, A new class of multivariate distribution: Scale mixture of Kotz-type distributions, Statistics and Probability Letters 75 (2005) 18–28; O. Arslan, Variance–mean mixture of Kotz-type distributions, Communications in Statistics-Theory and Methods 38 (2009) 272–284] and VG-type cases. The GGH distribution is dual to the distribution obtained by analogous mixing on the scale parameter of a spherically symmetric stable distribution. Duality between the multivariate t and multivariate VG [S.W. Harrar, E. Seneta, A.K. Gupta, Duality between matrix variate t and matrix variate V.G. distributions, Journal of Multivariate Analysis 97 (2006) 1467–1475] does however extend in one sense to their generalisations.  相似文献   

10.
The known estimation and simulation methods for multivariate t distributions are reviewed. A review of selected applications is also provided. We believe that this review will serve as an important reference and encourage further research activities in the area.  相似文献   

11.
We propose a minimum mean absolute error linear interpolator (MMAELI), based on theL 1 approach. A linear functional of the observed time series due to non-normal innovations is derived. The solution equation for the coefficients of this linear functional is established in terms of the innovation series. It is found that information implied in the innovation series is useful for the interpolation of missing values. The MMAELIs of the AR(1) model with innovations following mixed normal andt distributions are studied in detail. The MMAELI also approximates the minimum mean squared error linear interpolator (MMSELI) well in mean squared error but outperforms the MMSELI in mean absolute error. An application to a real series is presented. Extensions to the general ARMA model and other time series models are discussed. This research was supported by a CityU Research Grant and Natural Science Foundation of China.  相似文献   

12.
An approximate algorithm to efficiently solve the k-Closest-Pairs problem on large high-dimensional data sets is presented. The algorithm runs, for a suitable choice of the input parameters, in time, where d is the dimensionality and n is the number of points of the input data set, and requires linear space in the input size. It performs at most d+1 iterations. At each iteration a shifted version of the data set is sequentially scanned according to the order induced on it by the Hilbert space filling curve and points whose contribution to the solution has already been analyzed are detected and eliminated. The pruning is lossless, in fact the remaining points along with the approximate solution found can be used for the computation of the exact solution. If the data set is entirely pruned, then the algorithm returns the exact solution. We prove that the pruning ability of the algorithm is related to the nearest neighbor distance distribution of the data set and show that there exists a class of data sets for which the method, augmented with a final step that applies an exact method to the reduced data set, calculates the exact solution with the same time requirements.Although we are able to guarantee a approximation to the solution, where t{1,2,...,} identifies the Minkowski (Lt) metric of interest, experimental results give the exact k closest pairs for all the large high-dimensional synthetic and real data sets considered and show that the pruning of the search space is effective. We present a thorough scaling analysis of the algorithm for in-memory and disk-resident data sets showing that the algorithm scales well in both cases.Mathematics Subject Classification (2000) 68W25.  相似文献   

13.
In this paper we present a class of regime switching diffusion models described by a pair $(X(t), Y(t)) \in \mathbb{R}^n \times {\cal S}In this paper we present a class of regime switching diffusion models described by a pair (X(t), Y(t)) ? \mathbbRn ×S(X(t), Y(t)) \in \mathbb{R}^n \times {\cal S}, S = {1,2,?, N }{\cal S} = \{1,2,\ldots, N \}, Y(t) being a Markov chain, for which the marginal probability of the diffusive component X(t) is a given mixture. Our main motivation is to extend to a multivariate setting the class of mixture models proposed by Brigo and Mercurio in a series of papers. Furthermore, a simple algorithm is available for simulating paths through a thinning mechanism. The application to option pricing is considered by proposing a mixture version for the Margrabe Option formula and the Heston stochastic volatility formula for a plain vanilla.  相似文献   

14.
This article compares methods for the numerical computation of multivariate t probabilities for hyper-rectangular integration regions. Methods based on acceptance-rejection, spherical-radial transformations, and separation-of-variables transformations are considered. Tests using randomly chosen problems show that the most efficient numerical methods use a transformation developed by Genz for multivariate normal probabilities. These methods allow moderately accurate multivariate t probabilities to be quickly computed for problems with as many as 20 variables. Methods for the noncentral multivariate t distribution are also described.  相似文献   

15.
We consider minimax problems of the type min t[a,b] max i p t (t), where thep i (t) are real polynomials which are convex on an interval [a, b]. Here, the function max i p i (t) is a convex piecewise polynomial function. The main result is an algorithm yielding the minimum and the minimizing point, which is efficient with respect to the necessary number of computed polynomial zeros. The algorithms presented here are applicable to decision problems with convex loss. While primarily of theoretical interest, they are implementable for quadratic loss functions with finitely many states of nature. An example of this case is given. The algorithms are finite when polynomial zeros need not be approximated.The author is indebted to T. E. S. Raghavan for many stimulating discussions.  相似文献   

16.
The t-solutions introduced in R. W. Rosenthal (1989, Int J Game Theory 18:273–292) are quantal response equilibria based on the linear probability model. Choice probabilities in t-solutions are related to the determination of leveling taxes in taxation problems. The set of t-solutions coincides with the set of Nash equilibria of a game with quadratic control costs. Evaluating the set of t-solutions for increasing values of t yields that players become increasingly capable of iteratively eliminating never-best replies and eventually only play rationalizable actions with positive probability. These features are not shared by logit quantal response equilibria. Moreover, there exists a path of t-solutions linking uniform randomization to Nash equilibrium  相似文献   

17.

Multiple linear regression model based on normally distributed and uncorrelated errors is a popular statistical tool with application in various fields. But these assumptions of normality and no serial correlation are hardly met in real life. Hence, this study considers the linear regression time series model for series with outliers and autocorrelated errors. These autocorrelated errors are represented by a covariance-stationary autoregressive process where the independent innovations are driven by shape mixture of skew-t normal distribution. The shape mixture of skew-t normal distribution is a flexible extension of the skew-t normal with an additional shape parameter that controls skewness and kurtosis. With this error model, stochastic modeling of multiple outliers is possible with an adaptive robust maximum likelihood estimation of all the parameters. An Expectation Conditional Maximization Either algorithm is developed to carryout the maximum likelihood estimation. We derive asymptotic standard errors of the estimators through an information-based approximation. The performance of the estimation procedure developed is evaluated through Monte Carlo simulations and real life data analysis.

  相似文献   

18.
Sampling from a truncated multivariate normal distribution (TMVND) constitutes the core computational module in fitting many statistical and econometric models. We propose two efficient methods, an iterative data augmentation (DA) algorithm and a non-iterative inverse Bayes formulae (IBF) sampler, to simulate TMVND and generalize them to multivariate normal distributions with linear inequality constraints. By creating a Bayesian incomplete-data structure, the posterior step of the DA algorithm directly generates random vector draws as opposed to single element draws, resulting obvious computational advantage and easy coding with common statistical software packages such as S-PLUS, MATLAB and GAUSS. Furthermore, the DA provides a ready structure for implementing a fast EM algorithm to identify the mode of TMVND, which has many potential applications in statistical inference of constrained parameter problems. In addition, utilizing this mode as an intermediate result, the IBF sampling provides a novel alternative to Gibbs sampling and eliminates problems with convergence and possible slow convergence due to the high correlation between components of a TMVND. The DA algorithm is applied to a linear regression model with constrained parameters and is illustrated with a published data set. Numerical comparisons show that the proposed DA algorithm and IBF sampler are more efficient than the Gibbs sampler and the accept-reject algorithm.  相似文献   

19.
Abstract

We present an efficient algorithm for generating exact permutational distributions for linear rank statistics defined on stratified 2 × c contingency tables. The algorithm can compute exact p values and confidence intervals for a rich class of nonparametric problems. These include exact p values for stratified two-population Wilcoxon, Logrank, and Van der Waerden tests, exact p values for stratified tests of trend across several binomial populations, exact p values for stratified permutation tests with arbitrary scores, and exact confidence intervals for odds ratios embedded in stratified 2 × c tables. The algorithm uses network-based recursions to generate stratum-specific distributions and then combines them into an overall permutation distribution by convolution. Where only the tail area of a permutation distribution is desired, additional efficiency gains are achieved by backward induction and branch-and-bound processing of the network. The algorithm is especially efficient for highly imbalanced categorical data, a situation where the asymptotic theory is unreliable. The backward induction component of the algorithm can also be used to evaluate the conditional maximum likelihood, and its higher order derivatives, for the logistic regression model with grouped data. We illustrate the techniques with an analysis of two data sets: The leukemia data on survivors of the Hiroshima atomic bomb and data from an animal toxicology experiment provided by the U.S. Food and Drug Administration.  相似文献   

20.
A Gaussian measurement error assumption, that is, an assumption that the data are observed up to Gaussian noise, can bias any parameter estimation in the presence of outliers. A heavy tailed error assumption based on Student’s t distribution helps reduce the bias. However, it may be less efficient in estimating parameters if the heavy tailed assumption is uniformly applied to all of the data when most of them are normally observed. We propose a mixture error assumption that selectively converts Gaussian errors into Student’s t errors according to latent outlier indicators, leveraging the best of the Gaussian and Student’s t errors; a parameter estimation can be not only robust but also accurate. Using simulated hospital profiling data and astronomical time series of brightness data, we demonstrate the potential for the proposed mixture error assumption to estimate parameters accurately in the presence of outliers. Supplemental materials for this article are available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号