首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The two-dimensional representation of documents which allows documents to be represented in a two-dimensional Cartesian plane has proved to be a valid visualization tool for Automated Text Categorization (ATC) for understanding the relationships between categories of textual documents, and to help users to visually audit the classifier and identify suspicious training data. This paper analyzes a specific use of this visualization approach in the case of the Naive Bayes (NB) model for text classification and the Binary Independence Model (BIM) for text retrieval. For text categorization, a reformulation of the equation for the decision of classification has to be written in such a way that each coordinate of a document is the sum of two addends: a variable component P(d|ci), and a constant component P(ci), the prior of the category. When plotted in the Cartesian plane according to this formulation, the documents that are constantly shifted along the x-axis and the y-axis can be seen. This effect of shifting is more or less evident according to which NB model, Bernoulli or multinomial, is chosen. For text retrieval, the same reformulation can be applied in the case of the BIM model. The visualization helps to understand the decisions that are taken to order the documents, in particular in the case of relevance feedback.  相似文献   

2.
《Applied Mathematical Modelling》2014,38(21-22):5092-5112
One of the most complicated decision making problems for managers is the evaluation of supply chain (SC) performance which involves various criteria. Though vast studies have been recorded on supply chain efficiency evaluation via balanced scorecard (BSC) approach, these studies do not focus on the relationships between the four perspectives of BSC approach. The present paper is an attempt focusing on these relationships, especially the returnable ones. To do so, at first, all relationships between the four perspectives of BSC were determined and then the DEMATEL approach was employed to obtain a network structure. This network structure was then used to create a network DEA model. Since it was not possible to calculate the efficiency evaluation score by BSC, the data envelopment analysis (DEA) model was used for such an evaluation. Moreover, after reviewing different tools to evaluate the performance of supply chain, a new approach, relying on network DEA with BSC approach, was generated. Finally, this model was applied in the Iranian food industry to evaluate its supply chains efficiency and the results proved the high efficiency of the model designed. The findings could be used in various evaluation processes in different industries.  相似文献   

3.
Motivated by the problem of minefield detection, we investigate the problem of classifying mixtures of spatial point processes. In particular we are interested in testing the hypothesis that a given dataset was generated by a Poisson process versus a mixture of a Poisson process and a hard-core Strauss process. We propose testing this hypothesis by comparing the evidence for each model by using partial Bayes factors. We use the term partial Bayes factor to describe a Bayes factor, a ratio of integrated likelihoods, based on only part of the available information, namely that information contained in a small number of functionals of the data. We applied our method to both real and simulated data, and considering the difficulty of classifying these point patterns by eye, our approach overall produced good results.  相似文献   

4.
结构方程模型在社会学、教育学、医学、市场营销学和行为学中有很广泛的应用。在这些领域中,缺失数据比较常见,很多学者提出了带有缺失数据的结构方程模型,并对此模型进行过很多研究。在这一类模型的应用中,模型选择非常重要,本文将一个基于贝叶斯准则的统计量,称为L_v测度,应用到此类模型中进行模型选择。最后,本文通过一个模拟研究及实例分析来说明L_v测度的有效性及应用,并在实例分析中给出了根据贝叶斯因子进行模型选择的结果,以此来进一步说明该测度的有效性。  相似文献   

5.
This study provides operational guidance for building naïve Bayes Bayesian network (BN) models for bankruptcy prediction. First, we suggest a heuristic method that guides the selection of bankruptcy predictors. Based on the correlations and partial correlations among variables, the method aims at eliminating redundant and less relevant variables. A naïve Bayes model is developed using the proposed heuristic method and is found to perform well based on a 10-fold validation analysis. The developed naïve Bayes model consists of eight first-order variables, six of which are continuous. We also provide guidance on building a cascaded model by selecting second-order variables to compensate for missing values of first-order variables. Second, we analyze whether the number of states into which the six continuous variables are discretized has an impact on the model’s performance. Our results show that the model’s performance is the best when the number of states for discretization is either two or three. Starting from four states, the performance starts to deteriorate, probably due to over-fitting. Finally, we experiment whether modeling continuous variables with continuous distributions instead of discretizing them can improve the model’s performance. Our finding suggests that this is not true. One possible reason is that continuous distributions tested by the study do not represent well the underlying distributions of empirical data. Finally, the results of this study could also be applicable to business decision-making contexts other than bankruptcy prediction.  相似文献   

6.
Non-linear structural equation models are widely used to analyze the relationships among outcomes and latent variables in modern educational, medical, social and psychological studies. However, the existing theories and methods for analyzing non-linear structural equation models focus on the assumptions of outcomes from an exponential family, and hence can’t be used to analyze non-exponential family outcomes. In this paper, a Bayesian method is developed to analyze non-linear structural equation models in which the manifest variables are from a reproductive dispersion model (RDM) and/or may be missing with non-ignorable missingness mechanism. The non-ignorable missingness mechanism is specified by a logistic regression model. A hybrid algorithm combining the Gibbs sampler and the Metropolis–Hastings algorithm is used to obtain the joint Bayesian estimates of structural parameters, latent variables and parameters in the logistic regression model, and a procedure calculating the Bayes factor for model comparison is given via path sampling. A goodness-of-fit statistic is proposed to assess the plausibility of the posited model. A simulation study and a real example are presented to illustrate the newly developed Bayesian methodologies.  相似文献   

7.
A Bayesian shrinkage estimate for the mean in the generalized linear empirical Bayes model is proposed. The posterior mean under the empirical Bayes model has a shrinkage pattern. The shrinkage factor is estimated by using a Bayesian method with the regression coefficients to be fixed at the maximum extended quasi-likelihood estimates. This approach develops a Bayesian shrinkage estimate of the mean which is numerically quite tractable. The method is illustrated with a data set, and the estimate is compared with an earlier one based on an empirical Bayes method. In a special case of the homogeneous model with exchangeable priors, the performance of the Bayesian estimate is illustrated by computer simulations. The simulation result shows as improvement of the Bayesian estimate over the empirical Bayes estimate in some situations.  相似文献   

8.
9.
Correspondence analysis, a data analytic technique used to study two‐way cross‐classifications, is applied to social relational data. Such data are frequently termed “sociometric” or “network” data. The method allows one to model forms of relational data and types of empirical relationships not easily analyzed using either standard social network methods or common scaling or clustering techniques. In particular, correspondence analysis allows one to model:

—two‐mode networks (rows and columns of a sociomatrix refer to different objects)

—valued relations (e.g. counts, ratings, or frequencies).

In general, the technique provides scale values for row and column units, visual presentation of relationships among rows and columns, and criteria for assessing “dimensionality” or graphical complexity of the data and goodness‐of‐fit to particular models. Correspondence analysis has recently been the subject of research by Goodman, Haberman, and Gilula, who have termed their approach to the problem “canonical analysis” to reflect its similarity to canonical correlation analysis of continuous multivariate data. This generalization links the technique to more standard categorical data analysis models, and provides a much‐needed statistical justificatioa

We review both correspondence and canonical analysis, and present these ideas by analyzing relational data on the 1980 monetary donations from corporations to nonprofit organizations in the Minneapolis St. Paul metropolitan area. We also show how these techniques are related to dyadic independence models, first introduced by Holland, Leinhardt, Fienberg, and Wasserman in the early 1980's. The highlight of this paper is the relationship between correspondence and canonical analysis, and these dyadic independence models, which are designed specifically for relational data. The paper concludes with a discussion of this relationship, and some data analyses that illustrate the fart that correspondence analysis models can be used as approximate dyadic independence models.  相似文献   

10.
Logistic regression is a natural and simple tool to understand how covariates contribute to explain the topology of a binary network. Once the model is fitted, the practitioner is interested in the goodness of fit of the regression to check if the covariates are sufficient to explain the whole topology of the network and, if they are not, to analyze the residual structure. To address this problem, we introduce a generic model that combines logistic regression with a network-oriented residual term. This residual term takes the form of the graphon function of a W-graph. Using a variational Bayes framework, we infer the residual graphon by averaging over a series of blockwise constant functions. This approach allows us to define a generic goodness-of-fit criterion, which corresponds to the posterior probability for the residual graphon to be constant. Experiments on toy data are carried out to assess the accuracy of the procedure. Several networks from social sciences and ecology are studied to illustrate the proposed methodology. Supplementary material for this article is available online.  相似文献   

11.
This paper describes a method by which a neural network learns to fit a distribution to sample data. The neural network may be used to replace the input distributions required in a simulation or mathematical model and it allows random variates to be generated for subsequent use in the model. Results are given for several data sets which indicate the method is robust and can represent different families of continuous distributions. The neural network is a three-layer feed-forward network of size (1-3-3-1). This paper suggests that the method is an alternative approach to the problem of selection of suitable continuous distributions and random variate generation techniques for use in simulation and mathematical models.  相似文献   

12.
This paper explores time heterogeneity in stochastic actor oriented models (SAOM) proposed by Snijders (Sociological Methodology. Blackwell, Boston, pp 361-395, 2001) which are meant to study the evolution of networks. SAOMs model social networks as directed graphs with nodes representing people, organizations, etc., and dichotomous relations representing underlying relationships of friendship, advice, etc. We illustrate several reasons why heterogeneity should be statistically tested and provide a fast, convenient method for assessment and model correction. SAOMs provide a flexible framework for network dynamics which allow a researcher to test selection, influence, behavioral, and structural properties in network data over time. We show how the forward-selecting, score type test proposed by Schweinberger (Chapter 4: Statistical modeling of network panel data: goodness of fit. PhD thesis, University of Groningen 2007) can be employed to quickly assess heterogeneity at almost no additional computational cost. One step estimates are used to assess the magnitude of the heterogeneity. Simulation studies are conducted to support the validity of this approach. The ASSIST dataset (Campbell et al. Lancet 371(9624):1595-1602, 2008) is reanalyzed with the score type test, one step estimators, and a full estimation for illustration. These tools are implemented in the RSiena package, and a brief walkthrough is provided.  相似文献   

13.
新兴社会化商务社会中人与人之间的高交互性及推荐信息的海量化和高动态性,对平台分析消费者感知信任提出了新的挑战。然而,对推荐信息进行聚类难以体现消费者主观性及主体间关系。本文将感知推荐信任聚类问题转化为复合网络划分问题,将主观逻辑方法与基于Normal矩阵的谱平分方法相结合构建社会化商务中消费者感知推荐信任的聚类方法。首先,将推荐信息转化为感知推荐信任,然后,从社交网络中抽取感知推荐信任相似度与关系亲密度网络,并构建Normal矩阵用谱平分方法进行划分。最后,通过多组仿真实验证明了该方法的实用性和有效性。该方法能够为社会化商务中消费者信任的分析提供新视角,为平台制定精准化营销策略提供支持。  相似文献   

14.
In this paper Bayesian statistical analysis of masked data is considered based on the Pareto distribution. The likelihood function is simplified by introducing auxiliary variables, which describe the causes of failure. Three Bayesian approaches (Bayes using subjective priors, hierarchical Bayes and empirical Bayes) are utilized to estimate the parameters, and we compare these methods by analyzing a real data. Finally we discuss the method of avoiding the choice of the hyperparameters in the prior distributions.  相似文献   

15.
Many studies in the social and behavioral sciences involve multivariate discrete measurements, which are often characterized by the presence of an underlying individual trait, the existence of clusters such as domains of measurements, and the availability of multiple waves of cohort data. Motivated by an application in child development, we propose a class of extended multivariate discrete hidden Markov models for analyzing domain-based measurements of cognition and behavior. A random effects model is used to capture the long-term trait. Additionally, we develop a model selection criterion based on the Bayes factor for the extended hidden Markov model. The National Longitudinal Survey of Youth (NLSY) is used to illustrate the methods. Supplementary technical details and computer codes are available online.  相似文献   

16.
股票时间序列预测在经济和管理领域具有重要的应用前景,也是很多商业和金融机构成功的基础.首先利用奇异谱分析对股市时间序列重构,降低噪声并提取趋势序列.再利用C-C算法确定股市时间序列的嵌入维数和延迟阶数,对股市时间序列进行相空间重构,生成神经网络的学习矩阵.进一步利用Boosting技术和不同的神经网络模型,生成神经网络集成个体.最后采用带有惩罚项的半参数回归模型进行集成,并利用遗传算法选择最优的光滑参数,以此建立遗传算法和半参数回归的神经网络集成股市预测模型.通过上证指数开盘价进行实例分析,与传统的时间序列分析和其他集成方法对比,发现该方法能获得更准确的预测结果.计算结果表明该方法能充分反映股票价格时间序列趋势,为金融时间序列预测提供一个有效方法.  相似文献   

17.
Recent management research has evidenced the significance of organizational social networks, and communication is believed to impact the interpersonal relationships. However, we have little knowledge on how communication affects organizational social networks. This paper studies the dynamics between organizational communication patterns and the growth of organizational social networks. We propose an organizational social network growth model, and then collect empirical data to test model validity. The simulation results agree well with the empirical data. The results of simulation experiments enrich our knowledge on communication with the findings that organizational management practices that discourage employees from communicating within and across group boundaries have disparate and significant negative effect on the social network’s density, scalar assortativity and discrete assortativity, each of which correlates with the organization’s performance. These findings also suggest concrete measures for management to construct and develop the organizational social network.  相似文献   

18.
随机效应模型中方差分量的经验Bayes检验问题   总被引:4,自引:0,他引:4  
给出了双向分类随机效应模型中方差分量的Bayes检验的判决函数,利用核估计的方法,构造了相应的经验Bayes(EB)检验的判决函数.在适当的条件下证明了EB判决函数是渐近最优的且有收敛速度.给出了模型的特例和推广.最后,举出一个满足定理条件的例子.  相似文献   

19.
A Bayesian model selection procedure for comparing models subject to inequality and/or equality constraints is proposed. An encompassing prior approach is used, and a general form of the Bayes factor of a constrained model against the encompassing model is derived. A simple estimation method is proposed which can estimate the Bayes factors for all candidate models simultaneously by using one set of samples from the encompassing model. A simulation study and a real data analysis demonstrate performance of the method.  相似文献   

20.
Genome-wide association studies (GWAS) aim to assess relationships between single nucleotide polymorphisms (SNPs) and diseases. They are one of the most popular problems in genetics, and have some peculiarities given the large number of SNPs compared to the number of subjects in the study. Individuals might not be independent, especially in animal breeding studies or genetic diseases in isolated populations with highly inbred individuals. We propose a family-based GWAS model in a two-stage approach comprising a dimension reduction and a subsequent model selection. The first stage, in which the genetic relatedness between the subjects is taken into account, selects the promising SNPs. The second stage uses Bayes factors for comparison among all candidate models and a random search strategy for exploring the space of all the regression models in a fully Bayesian approach. A simulation study shows that our approach is superior to Bayesian lasso for model selection in this setting. We also illustrate its performance in a study on Beta-thalassemia disorder in an isolated population from Sardinia. Supplementary Material describing the implementation of the method proposed in this article is available online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号