首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Abstract

Statistical software systems include modules for manipulating data sets, model fitting, and graphics. Because plots display data, and models are fit to data, both the model-fitting and graphics modules depend on the data. Today's statistical environments allow the analyst to choose or even build a suitable data structure for storing the data and to implement new kinds of plots. The multiplicity problem caused by many plot varieties and many data representations is avoided by constructing a plot-data interface. The interface is a convention by which plots communicate with data sets, allowing plots to be independent of the actual data representation. This article describes the components of such a plot-data interface. The same strategy may be used to deal with the dependence of model-fitting procedures on data.  相似文献   

2.
Abstract

We consider visual methods based on mosaic plots for interpreting and modeling categorical data. Categorical data are most often modeled using loglinear models. For certain loglinear models, mosaic plots have unique shapes that do not depend on the actual data being modeled. These shapes reflect the structure of a model, defined by the presence and absence of particular model coefficients. Displaying the expected values of a loglinear model allows one to incorporate the residuals of the model graphically and to visually judge the adequacy of the loglinear fit. This procedure leads to stepwise interactive graphical modeling of loglinear models. We show that it often results in a deeper understanding of the structure of the data. Linking mosaic plots to other interactive displays offers additional power that allows the investigation of more complex dependence models than provided by static displays.  相似文献   

3.
Correspondence analysis, a data analytic technique used to study two‐way cross‐classifications, is applied to social relational data. Such data are frequently termed “sociometric” or “network” data. The method allows one to model forms of relational data and types of empirical relationships not easily analyzed using either standard social network methods or common scaling or clustering techniques. In particular, correspondence analysis allows one to model:

—two‐mode networks (rows and columns of a sociomatrix refer to different objects)

—valued relations (e.g. counts, ratings, or frequencies).

In general, the technique provides scale values for row and column units, visual presentation of relationships among rows and columns, and criteria for assessing “dimensionality” or graphical complexity of the data and goodness‐of‐fit to particular models. Correspondence analysis has recently been the subject of research by Goodman, Haberman, and Gilula, who have termed their approach to the problem “canonical analysis” to reflect its similarity to canonical correlation analysis of continuous multivariate data. This generalization links the technique to more standard categorical data analysis models, and provides a much‐needed statistical justificatioa

We review both correspondence and canonical analysis, and present these ideas by analyzing relational data on the 1980 monetary donations from corporations to nonprofit organizations in the Minneapolis St. Paul metropolitan area. We also show how these techniques are related to dyadic independence models, first introduced by Holland, Leinhardt, Fienberg, and Wasserman in the early 1980's. The highlight of this paper is the relationship between correspondence and canonical analysis, and these dyadic independence models, which are designed specifically for relational data. The paper concludes with a discussion of this relationship, and some data analyses that illustrate the fart that correspondence analysis models can be used as approximate dyadic independence models.  相似文献   

4.
Recently, we proposed variants as a statistical model for treating ambiguity. If data are extracted from an object with a machine then it might not be able to give a unique safe answer due to ambiguity about the correct interpretation of the object. On the other hand, the machine is often able to produce a finite number of alternative feature sets (of the same object) that contain the desired one. We call these feature sets variants of the object. Data sets that contain variants may be analyzed by means of statistical methods and all chapters of multivariate analysis can be seen in the light of variants. In this communication, we focus on point estimation in the presence of variants and outliers. Besides robust parameter estimation, this task requires also selecting the regular objects and their valid feature sets (regular variants). We determine the mixed MAP-ML estimator for a model with spurious variants and outliers as well as estimators based on the integrated likelihood. We also prove asymptotic results which show that the estimators are nearly consistent.The problem of variant selection turns out to be computationally hard; therefore, we also design algorithms for efficient approximation. We finally demonstrate their efficacy with a simulated data set and a real data set from genetics.  相似文献   

5.
During the recent past, there has been a renewed interest in Markov chain for its attractive properties for analyzing real life data emerging from time series or longitudinal data in various fields. The models were proposed for fitting first or higher order Markov chains. However, there is a serious lack of realistic methods for linking covariate dependence with transition probabilities in order to analyze the factors associated with such transitions especially for higher order Markov chains. L.R. Muenz and L.V. Rubinstein [Markov models for covariate dependence of binary sequences, Biometrics 41 (1985) 91–101] employed logistic regression models to analyze the transition probabilities for a first order Markov model. The methodology is still far from generalization in terms of formulating a model for higher order Markov chains. In this study, it is aimed to provide a comprehensive covariate-dependent Markov model for higher order. The proposed model generalizes the estimation procedure for Markov models for any order. The proposed models and inference procedures are simple and the covariate dependence of the transition probabilities of any order can be examined without making the underlying model complex. An example from rainfall data is illustrated in this paper that shows the utility of the proposed model for analyzing complex real life problems. The application of the proposed method indicates that the higher order covariate dependent Markov models can be conveniently employed in a very useful manner and the results can provide in-depth insights to both the researchers and policymakers to resolve complex problems of underlying factors attributing to different types of transitions, reverse transitions and repeated transitions. The estimation and test procedures can be employed for any order of Markov model without making the theory and interpretation difficult for the common users.  相似文献   

6.
There are already a lot of models to fit a set of stationary time series, such as AR, MA, and ARMA models. For the non-stationary data, an ARIMA or seasonal ARIMA models can be used to fit the given data. Moreover, there are also many statistical softwares that can be used to build a stationary or non-stationary time series model for a given set of time series data, such as SAS, SPLUS, etc. However, some statistical softwares wouldn't work well for small samples with or without missing data, especially for small time series data with seasonal trend. A nonparametric smoothing technique to build a forecasting model for a given small seasonal time series data is carried out in this paper. And then, both the method provided in this paper and that in SAS package are applied to the modeling of international airline passengers data respectively, the comparisons between the two methods are done afterwards. The results of the comparison show us the method provided in this paper has superiority over SAS's method.  相似文献   

7.
Abstract

Spatial data in mining, hydrology, and pollution monitoring commonly have a substantial proportion of zeros. One way to model such data is to suppose that some pointwise transformation of the observations follows the law of a truncated Gaussian random field. This article considers Monte Carlo methods for prediction and inference problems based on this model. In particular, a method for computing the conditional distribution of the random field at an unobserved location, given the data, is described. These results are compared to those obtained by simple kriging and indicator cokriging. Simple kriging is shown to give highly misleading results about conditional distributions; indicator cokriging does quite a bit better but still can give answers that are substantially different from the conditional distributions. A slight modification of this basic technique is developed for calculating the likelihood function for such models, which provides a method for computing maximum likelihood estimates of unknown parameters and Bayesian predictive distributions for values of the process at unobserved locations.  相似文献   

8.

We study asymptotic properties of Bayesian multiple testing procedures and provide sufficient conditions for strong consistency under general dependence structure. We also consider a novel Bayesian multiple testing procedure and associated error measures that coherently accounts for the dependence structure present in the model. We advocate posterior versions of FDR and FNR as appropriate error rates and show that their asymptotic convergence rates are directly associated with the Kullback–Leibler divergence from the true model. The theories hold regardless of the class of postulated models being misspecified. We illustrate our results in a variable selection problem with autoregressive response variables and compare our procedure with some existing methods through simulation studies. Superior performance of the new procedure compared to the others indicates that proper exploitation of the dependence structure by multiple testing methods is indeed important. Moreover, we obtain encouraging results in a maize dataset, where we select influential marker variables.

  相似文献   

9.
For multivariate data from an observational study, inferences of interest can include conditional probabilities or quantiles for one variable given other variables. For statistical modeling, one could fit a parametric multivariate model, such as a vine copula, to the data and then use the model-based conditional distributions for further inference. Some results are derived for properties of conditional distributions under different positive dependence assumptions for some copula-based models. The multivariate version of the stochastically increasing ordering of conditional distributions is introduced for this purpose. Results are explained in the context of multivariate Gaussian distributions, as properties for Gaussian distributions can help to understand the properties of copula extensions based on vines.  相似文献   

10.
In this paper, we present a procedure, based on statistical criteria, that allows one to discriminate experimental data belonging to different sample populations and to fit them with different models.  相似文献   

11.
In this article, we focus on statistical models for binary data on a regular two-dimensional lattice. We study two classes of models, the Markov mesh models (MMMs) based on causal-like, asymmetric spatial dependence, and symmetric Markov random fields (SMFs) based on noncausal-like, symmetric spatial dependence. Building on results of Enting (1977), we give sufficient conditions for the asymmetrically defined binary MMMs (of third order) to be equivalent to a symmetrically defined binary SMF. Although not every binary SMF can be written as a binary MMM, our results show that many can. For such SMFs, their joint distribution can be written in closed form and their realizations can be simulated with just one pass through the lattice. An important consequence of the latter observation is that there are nontrivial spatial processes for which exact probabilities can be used to benchmark the performance of Markov-chain-Monte-Carlo and other algorithms.  相似文献   

12.

Association or interdependence of two stock prices is analyzed, and selection criteria for a suitable model developed in the present paper. The association is generated by stochastic correlation, given by a stochastic differential equation (SDE), creating interdependent Wiener processes. These, in turn, drive the SDEs in the Heston model for stock prices. To choose from possible stochastic correlation models, two goodness-of-fit procedures are proposed based on the copula of Wiener increments. One uses the confidence domain for the centered Kendall function, and the other relies on strong and weak tail dependence. The constant correlation model and two different stochastic correlation models, given by Jacobi and hyperbolic tangent transformation of Ornstein-Uhlenbeck (HtanOU) processes, are compared by analyzing daily close prices for Apple and Microsoft stocks. The constant correlation, i.e., the Gaussian copula model, is unanimously rejected by the methods, but all other two are acceptable at a 95% confidence level. The analysis also reveals that even for Wiener processes, stochastic correlation can create tail dependence, unlike constant correlation, which results in multivariate normal distributions and hence zero tail dependence. Hence models with stochastic correlation are suitable to describe more dangerous situations in terms of correlation risk.

  相似文献   

13.

The weak variance-alpha-gamma process is a multivariate Lévy process constructed by weakly subordinating Brownian motion, possibly with correlated components with an alpha-gamma subordinator. It generalises the variance-alpha-gamma process of Semeraro constructed by traditional subordination. We compare three calibration methods for the weak variance-alpha-gamma process, method of moments, maximum likelihood estimation (MLE) and digital moment estimation (DME). We derive a condition for Fourier invertibility needed to apply MLE and show in our simulations that MLE produces a better fit when this condition holds, while DME produces a better fit when it is violated. We also find that the weak variance-alpha-gamma process exhibits a wider range of dependence and produces a significantly better fit than the variance-alpha-gamma process on a S&P500-FTSE100 data set, and that DME produces the best fit in this situation.

  相似文献   

14.
We consider the analysis of time series data which require models with a heavy-tailed marginal distribution. A natural model to attempt to fit to time series data is an autoregression of order p, where p itself is often determined from the data. Several methods of parameter estimation for heavy tailed autoregressions have been considered, including Yule–Walker estimation, linear programming estimators, and periodogram based estimators. We investigate the statistical pitfalls of the first two methods when the models are mis-specified—either completely or due to the presence of outliers. We illustrate the results of our considerations on both simulated and real data sets. A warning is sounded against the assumption that autoregressions will be an applicable class of models for fitting heavy tailed data.  相似文献   

15.
ABSTRACT

This paper concerns the mathematical analysis of a mathematical model for price formation. We take a large number of rational buyers and vendors in the market who are trading the same good into consideration. Each buyer or vendor will choose his optimal strategy to buy or sell goods. Since markets seldom stabilize, our model mimics the real market behavior. We introduce three models. All of them are modifications of the original J.-M. Lasry and P. L. Lions evolution model. In the first modified model, a random term is added to mimic the randomness of trading in the real market. This reflects markets with low volatility, where it might be difficulty to buy or sell goods at specific price. In the second model, we use cumulative density function instead of density function. We give numerical simulations on these two models in order to have a general picture on the solution. In the third model, we add a term associated with the parameter R to destabilize the original Larsy–Lions model and study oscillations and wave solutions depending on different values of R. We also study existence and uniqueness of the solution. Moreover, Several plots are given to demonstrate these results corresponding to the theoretical prediction.  相似文献   

16.
Networks are being increasingly used to represent relational data. As the patterns of relations tends to be complex, many probabilistic models have been proposed to capture the structural properties of the process that generated the networks. Two features of network phenomena not captured by the simplest models is the variation in the number of relations individual entities have and the clustering of their relations. In this paper we present a statistical model within the curved exponential family class that can represent both arbitrary degree distributions and an average clustering coefficient. We present two tunable parameterizations of the model and give their interpretation. We also present a Markov Chain Monte Carlo (MCMC) algorithm that can be used to generate networks from this model.  相似文献   

17.
Conditionally specified statistical models are frequently constructed from one-parameter exponential family conditional distributions. One way to formulate such a model is to specify the dependence structure among random variables through the use of a Markov random field (MRF). A common assumption on the Gibbsian form of the MRF model is that dependence is expressed only through pairs of random variables, which we refer to as the “pairwise-only dependence” assumption. Based on this assumption, J. Besag (1974, J. Roy. Statist. Soc. Ser. B36, 192–225) formulated exponential family “auto-models” and showed the form that one-parameter exponential family conditional densities must take in such models. We extend these results by relaxing the pairwise-only dependence assumption, and we give a necessary form that one-parameter exponential family conditional densities must take under more general conditions of multiway dependence. Data on the spatial distribution of the European corn borer larvae are fitted using a model with Bernoulli conditional distributions and several dependence structures, including pairwise-only, three-way, and four-way dependencies.  相似文献   

18.

Partially linear models (PLMs) have been widely used in statistical modeling, where prior knowledge is often required on which variables have linear or nonlinear effects in the PLMs. In this paper, we propose a model-free structure selection method for the PLMs, which aims to discover the model structure in the PLMs through automatically identifying variables that have linear or nonlinear effects on the response. The proposed method is formulated in a framework of gradient learning, equipped with a flexible reproducing kernel Hilbert space. The resultant optimization task is solved by an efficient proximal gradient descent algorithm. More importantly, the asymptotic estimation and selection consistencies of the proposed method are established without specifying any explicit model assumption, which assure that the true model structure in the PLMs can be correctly identified with high probability. The effectiveness of the proposed method is also supported by a variety of simulated and real-life examples.

  相似文献   

19.

It is well known that variable selection in multiple regression can be unstable and that the model uncertainty can be considerable. The model uncertainty can be quantified and explored by bootstrap resampling, see Sauerbrei et al. (Biom J 57:531–555, 2015). Here approaches are introduced that use the results of bootstrap replications of the variable selection process to obtain more detailed information about the data. Analyses will be based on dissimilarities between the results of the analyses of different bootstrap samples. Dissimilarities are computed between the vector of predictions, and between the sets of selected variables. The dissimilarities are used to map the models by multidimensional scaling, to cluster them, and to construct heatplots. Clusters can point to different interpretations of the data that could arise from different selections of variables supported by different bootstrap samples. A new measure of variable selection instability is also defined. The methodology can be applied to various regression models, estimators, and variable selection methods. It will be illustrated by three real data examples, using linear regression and a Cox proportional hazards model, and model selection by AIC and BIC.

  相似文献   

20.
在Binomial模型中考虑over-dispersion时,每一个独立事件成功的概率通常被视为一个连续随机变量.在本文中,我们提出了成功概率服从Kumaraswamy分布的混合Binomial模型.讨论了这个模型的随机序和相依性;并用数据来拟合这些模型,数值计算结果表明在拟合某些数据时KB模型比BB模型拟合效果更好.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号