首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A mixture approach to clustering is an important technique in cluster analysis. A mixture of multivariate multinomial distributions is usually used to analyze categorical data with latent class model. The parameter estimation is an important step for a mixture distribution. Described here are four approaches to estimating the parameters of a mixture of multivariate multinomial distributions. The first approach is an extended maximum likelihood (ML) method. The second approach is based on the well-known expectation maximization (EM) algorithm. The third approach is the classification maximum likelihood (CML) algorithm. In this paper, we propose a new approach using the so-called fuzzy class model and then create the fuzzy classification maximum likelihood (FCML) approach for categorical data. The accuracy, robustness and effectiveness of these four types of algorithms for estimating the parameters of multivariate binomial mixtures are compared using real empirical data and samples drawn from the multivariate binomial mixtures of two classes. The results show that the proposed FCML algorithm presents better accuracy, robustness and effectiveness. Overall, the FCML algorithm has the superiority over the ML, EM and CML algorithms. Thus, we recommend FCML as another good tool for estimating the parameters of mixture multivariate multinomial models.  相似文献   

2.
3.
In this paper we investigate some algebraic and geometric properties of fuzzy partition spaces (convex hulls of hard or conventional partition spaces). In particular, we obtain their dimensions, and describe a number of algorithms for effecting convex decompositions. Two of these are easily programmable, and each affords a different insight about data structures suggested by the fuzzy partition decomposed. We also show how the sequence of partitions in any convex decomposition leads to a matrix for which the norm of the corresponding coefficient vector equals a scalar measure of partition fuzziness used with certain fuzzy clustering algorithms.  相似文献   

4.
含模糊参数系统的可靠性理论研究具有广泛的实际应用背景,但由于模糊数运算的隶属函数表达困难,影响和制约着模糊参数系统的模糊可靠性理论与应用的研究。本文利用模糊数的结构元表示,给出了模糊表达式隶属函数确定的两种方法,进而得到了具有模糊参数的不可修复串联和并联系统模糊可靠度的隶属函数表达式。  相似文献   

5.
Large data sets, either coming from a large number of independent replications, or because of hierarchies in the data with large numbers of within-unit replication, may pose challenges to the data analyst up to the point of making conventional inferential methods, such as maximum likelihood, prohibitive. Based on general pseudo-likelihood concepts, we propose a method to partition such a set of data, analyze each partition member, and properly combine the inferences into a single one. It is shown that the method is fully efficient for independent partitions, while with dependent sub-samples efficiency is sometimes but not always equal to one. It is argued that, for important realistic settings, efficiency is often very high. Illustrative examples enhance insight in the method’s operation, while real-data analysis underscores its power for practice.  相似文献   

6.
Conventionally, sociologists measure the membership of an individual to a group by a “0 or 1” characteristic function. But when the definition of that group is fuzzy and an individual is neither a full member nor a nonmember, this dichotomous characteristic function may distort the reality. Instead of the “0 or 1” characteristic function by classical set theory, fuzzy set theory introduces a membership function which is a gradation from 0 to 1 to measure the degree to which an object (an individual) belongs to a concept (a group). Based on the rationale of fuzzy set theory, we suggest some new methods of data collection and analysis. Among several noteworthy findings, two points are emphasized: 1) the fuzzy set is an appropriate way of measuring the fuzziness of human thought; and 2) it allows one to relax the conventional assumption that all individuals have identical distributions and deviations around their means.  相似文献   

7.
Gaussian process models have been widely used in spatial statistics but face tremendous modeling and computational challenges for very large nonstationary spatial datasets. To address these challenges, we develop a Bayesian modeling approach using a nonstationary covariance function constructed based on adaptively selected partitions. The partitioned nonstationary class allows one to knit together local covariance parameters into a valid global nonstationary covariance for prediction, where the local covariance parameters are allowed to be estimated within each partition to reduce computational cost. To further facilitate the computations in local covariance estimation and global prediction, we use the full-scale covariance approximation (FSA) approach for the Bayesian inference of our model. One of our contributions is to model the partitions stochastically by embedding a modified treed partitioning process into the hierarchical models that leads to automated partitioning and substantial computational benefits. We illustrate the utility of our method with simulation studies and the global Total Ozone Matrix Spectrometer (TOMS) data. Supplementary materials for this article are available online.  相似文献   

8.
《Fuzzy Sets and Systems》2004,141(2):319-332
Fuzziness of fuzzy sets can be reduced by intensification (sharpening) of large membership grades closer to one and small membership grades closer to zero. The relationship between intensification and partial defuzzification of a special type of fuzzy sets, fuzzy clusters, is studied. Some guidelines for assessment of the degree of possible defuzzification of a probabilistic fuzzy partition are suggested. An operator of linear intensification of fuzzy clusters is proposed and illustrated with examples. It is shown how different goals of partial defuzzification can be achieved by modification of this operator.  相似文献   

9.
We consider Bayesian nonparametric regression through random partition models. Our approach involves the construction of a covariate-dependent prior distribution on partitions of individuals. Our goal is to use covariate information to improve predictive inference. To do so, we propose a prior on partitions based on the Potts clustering model associated with the observed covariates. This drives by covariate proximity both the formation of clusters, and the prior predictive distribution. The resulting prior model is flexible enough to support many different types of likelihood models. We focus the discussion on nonparametric regression. Implementation details are discussed for the specific case of multivariate multiple linear regression. The proposed model performs well in terms of model fitting and prediction when compared to other alternative nonparametric regression approaches. We illustrate the methodology with an application to the health status of nations at the turn of the 21st century. Supplementary materials are available online.  相似文献   

10.
We analyse an exponential family of distributions which generalises the exponential distribution for censored failure time data, analogous to the way in which the class of generalised linear models generalises the normal distribution. The parameter of the distribution depends on a linear combination of covariates via a possibly nonlinear link function, and we allow another level of heterogeneity: the data may contain "immune" individuals who are not subject to failure. Thus the data is modelled by a mixture of a distribution from the exponential family and a "mass at infinity" representing individuals who never fail. Our results include large sample distributions for parameter estimators and for hypothesis test statistics obtained by maximising the likelihood of a sample. The asymptotic distribution of the likelihood ratio test statistic for the hypothesis that there are no immunes present in the population is shown to be "non-standard"; it is a 50-50 mixture of a chi-squared distribution on 1 degree of freedom and a point mass at 0. Our analysis clearly shows how "negligibility" of individual covariate values and "sufficient followup" conditions are required for the asymptotic properties.  相似文献   

11.
Motivated by the likelihood functions of several incomplete categorical data, this article introduces a new family of distributions, grouped Dirichlet distributions (GDD), which includes the classical Dirichlet distribution (DD) as a special case. First, we develop distribution theory for the GDD in its own right. Second, we use this expanded family as a new tool for statistical analysis of incomplete categorical data. Starting with a GDD with two partitions, we derive its stochastic representation that provides a simple procedure for simulation. Other properties such as mixed moments, mode, marginal and conditional distributions are also derived. The general GDD with more than two partitions is considered in a parallel manner. Three data sets from a case-control study, a leprosy survey, and a neurological study are used to illustrate how the GDD can be used as a new tool for analyzing incomplete categorical data. Our approach based on GDD has at least two advantages over the commonly used approach based on the DD in both frequentist and conjugate Bayesian inference: (a) in some cases, both the maximum likelihood and Bayes estimates have closed-form expressions in the new approach, but not so when they are based on the commonly-used approach; and (b) even if a closed-form solution is not available, the EM and data augmentation algorithms in the new approach converge much faster than in the commonly-used approach.  相似文献   

12.
We propose two methods for tuning membership functions of a kernel fuzzy classifier based on the idea of SVM (support vector machine) training. We assume that in a kernel fuzzy classifier a fuzzy rule is defined for each class in the feature space. In the first method, we tune the slopes of the membership functions at the same time so that the margin between classes is maximized under the constraints that the degree of membership to which a data sample belongs is the maximum among all the classes. This method is similar to a linear all-at-once SVM. We call this AAO tuning. In the second method, we tune the membership function of a class one at a time. Namely, for a class the slope of the associated membership function is tuned so that the margin between the class and the remaining classes is maximized under the constraints that the degrees of membership for the data belonging to the class are large and those for the remaining data are small. This method is similar to a linear one-against-all SVM. This is called OAA tuning. According to the computer experiment for fuzzy classifiers based on kernel discriminant analysis and those with ellipsoidal regions, usually both methods improve classification performance by tuning membership functions and classification performance by AAO tuning is slightly better than that by OAA tuning.  相似文献   

13.
A method is proposed for estimating the parameters in a parametric statistical model when the observations are fuzzy and are assumed to be related to underlying crisp realizations of a random sample. This method is based on maximizing the observed-data likelihood defined as the probability of the fuzzy data. It is shown that the EM algorithm may be used for that purpose, which makes it possible to solve a wide range of statistical problems involving fuzzy data. This approach, called the fuzzy EM (FEM) method, is illustrated using three classical problems: normal mean and variance estimation from a fuzzy sample, multiple linear regression with crisp inputs and fuzzy outputs, and univariate finite normal mixture estimation from fuzzy data.  相似文献   

14.
Fuzzy linear regression models can provide an estimated fuzzy number that has a fuzzy membership function. If a point that has the highest membership value from the estimated fuzzy number is not within the support of the observed fuzzy membership function, a decision-maker can have high risk from the estimate. In this study a modification of fuzzy linear regression analysis based on a criterion of minimizing the difference of the fuzzy membership values between the observed and estimated fuzzy numbers is proposed. Two numerical examples are used to evaluate the fuzzy regression models.  相似文献   

15.
The purposes of this paper are to introduce a multivariate non-stationary stochastic time series model without individual detrending and to extract the multiple relationships between variables. To infer the statistical relation between variables, we attempt to estimate the co-movement of multivariate non-stationary time series components. The model is expressed in state-space form, and time series components are estimated by the maximum likelihood method using numerical optimization algorithm. The Kalman filter algorithm is used to compute the likelihood of the model. The AIC procedure gives a criterion for selecting the best model fit for the data. The multiple relationship becomes clear by analysing estimated AR coefficients. Real economic data are used for a numerical example.  相似文献   

16.
This article presents the application of finite-element fuzzy model updating to the DLR AIRMOD structure. The proposed approach is initially demonstrated on a simulated mass-spring system with three degrees of freedom. Considering the effect of the assembly process on variability measurements, modal tests were carried out for the repeatedly disassembled and reassembled DLR AIRMOD structure. The histograms of the measured data attributed to the uncertainty of the structural components in terms of mass and stiffness are utilised to obtain the membership functions of the chosen fuzzy outputs and to determine the updated membership functions of the uncertain input parameters represented by fuzzy variables. In this regard, a fuzzy parameter is introduced to represent a set of interval parameters through the membership function, and a meta model (kriging, in this work) is used to speed up the updating. The use of non-probabilistic models, i.e. interval and fuzzy models, for updating models with uncertainties is often more practical when the large quantities of test data that are necessary for probabilistic model updating are unavailable.  相似文献   

17.
The segmentation of customers on multiple bases is a pervasive problem in marketing research. For example, segmentation service providers partition customers using a variety of demographic and psychographic characteristics, as well as an array of consumption attributes such as brand loyalty, switching behavior, and product/service satisfaction. Unfortunately, the partitions obtained from multiple bases are often not in good agreement with one another, making effective segmentation a difficult managerial task. Therefore, the construction of segments using multiple independent bases often results in a need to establish a partition that represents an amalgamation or consensus of the individual partitions. In this paper, we compare three methods for finding a consensus partition. The first two methods are deterministic, do not use a statistical model in the development of the consensus partition, and are representative of methods used in commercial settings, whereas the third method is based on finite mixture modeling. In a large-scale simulation experiment the finite mixture model yielded better average recovery of holdout (validation) partitions than its non-model-based competitors. This result calls for important changes in the current practice of segmentation service providers that group customers for a variety of managerial goals related to the design and marketing of products and services.  相似文献   

18.
Detection of changes to multivariate patterns is an important topic in a number of different domains. Modern data sets often include categorical and numerical data and potentially complex in-control regions. Given a flexible, robust decision rule for this environment that signals based on an individual observation vector, an important issue is how to extend the rule to incorporate time-based information. A decision rule can be learned to detect shifts through artificial data that transforms the problem to one of supervised learning. Then class probability ratios are derived from a relationship to likelihood ratios to form the basis for time-weighted updates of the monitoring scheme.  相似文献   

19.
We propose a procedure based on a latent variable model for the comparison of two partitions of different units described by the same set of variables. The null hypothesis here is that the two partitions come from the same underlying mixture model. We define a method of “projecting” partitions using a supervised classification method: once one partition is taken as a reference; the individuals of the second data set are allocated to the clusters of the reference partition; it gives two partitions of the same units of the second data set: the original and the projected one and we evaluate their difference by usual measures of association. The empirical distributions of the association measures are derived by simulation.  相似文献   

20.
This paper presents a two stage procedure for building optimal fuzzy model from data for nonlinear dynamical systems. Both stages are embedded into Genetic Algorithm (GA) and in the first stage emphasis is placed on structural optimization by assigning a suitable fitness to each individual member of population in a canonical GA. These individuals represent coded information about the structure of the model (number of antecedents and rules). This information is consequently utilized by subtractive clustering to partition the input space and construct a compact fuzzy rule base. In the second stage, Unscented Filter (UF) is employed for optimization of model parameters, that is, parameters of the input–output Membership Functions (MFs).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号