首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到12条相似文献,搜索用时 0 毫秒
1.
Advantages of statistical model-based unsupervised classification over heuristic alternatives have been widely demonstrated in the scientific literature. However, the existing model-based approaches are often both conceptually and numerically instable for large and complex data sets. Here we consider a Bayesian model-based method for unsupervised classification of discrete valued vectors, that has certain advantages over standard solutions based on latent class models. Our theoretical formulation defines a posterior probability measure on the space of classification solutions corresponding to stochastic partitions of observed data. To efficiently explore the classification space we use a parallel search strategy based on non-reversible stochastic processes. A decision-theoretic approach is utilized to formalize the inferential process in the context of unsupervised classification. Both real and simulated data sets are used for the illustration of the discussed methods.  相似文献   

2.
The ridge estimator of the usual linear model is generalized by the introduction of an a priori vector r and an associated positive semidefinite matrix S. It is then shown that the generalized ridge estimator can be justified in two ways: (a) by the minimization of the residual sum of squares subject to a constraint on the length, in the metric S, of the vector of differences between r and the estimated linear model coefficients, (b) by incorporating prior knowledge, r playing the role of the vector of means and S proportional to the precision matrix. Both a Bayesian and an Aitken generalized least squares frameworks are used for the latter. The properties of the new estimator are derived and compared to the ordinary least squares estimator. The new method is illustrated with different assumptions on the form of the S matrix.  相似文献   

3.
4.
Bayesian networks are graphical tools used to represent a high-dimensional probability distribution. They are used frequently in machine learning and many applications such as medical science. This paper studies whether the concept classes induced by a Bayesian network can be embedded into a low-dimensional inner product space. We focus on two-label classification tasks over the Boolean domain. For full Bayesian networks and almost full Bayesian networks with n variables, we show that VC dimension and the minimum dimension of the inner product space induced by them are 2n-1. Also, for each Bayesian network we show that if the network constructed from by removing Xn satisfies either (i) is a full Bayesian network with n-1 variables, i is the number of parents of Xn, and i<n-1 or (ii) is an almost full Bayesian network, the set of all parents of Xn PAn={X1,X2,Xn3,…,Xni} and 2i<n-1. Our results in the paper are useful in evaluating the VC dimension and the minimum dimension of the inner product space of concept classes induced by other Bayesian networks.  相似文献   

5.
Summary  This paper considers simulation-based approaches for the gamma stochastic frontier model. Efficient Markov chain Monte Carlo methods are proposed for sampling the posterior distribution of the parameters. Maximum likelihood estimation is also discussed based on the stochastic approximation algorithm. The methods are applied to a data set of the U.S. electric utility industry. The authors are grateful to two anonymous referees for their useful comments, which improved an earlier version of the paper. The first author also thanks the financial support by the Japanese Ministry of Education, Culture, Sports, Science and Technology under the Grant-in-Aid for Scientific Research No.14730022.  相似文献   

6.
Summary  Independent measurements are taken from distinct populations which may differ in mean, variance and in shape, for instance in the number of modes and the heaviness of the tails. Our goal is to characterize differences between these different populations. To avoid pre-judging the nature of the heterogeneity, for instance by assuming a parametric form, and to reduce the loss of information by calculating summary statistics, the observations are transformed to the empirical characteristic function (ECF). An eigen decomposition is applied to the ECFs to represent the populations as points in a low dimensional space and the choice of optimal dimension is made by minimising a mean square error. Interpretation of these plots is naturally provided by the corresponding density estimate obtained by inverting the ECF projected on the reduced dimension space. Some simulated examples indicate the promise of the technique and an application to the growth of Mirabilis plants is given.  相似文献   

7.
This paper deals with two criteria for selection of variables for the discriminant analysis in the case of two multivariate normal populations with different means and a common covariance matrix. One is based on the estimated error rate of misclassification. The other uses Akaike's information criterion. The asymptotic distributions and error rate risks of the criteria are obtained. The result will prove that the two criteria are asymptotically equivalent in the sense of their asymptotic distributions and error rate risks being identical.  相似文献   

8.
9.
10.
11.
12.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号