首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In nonlife insurance, frequency and severity are two essential building blocks in the actuarial modeling of insurance claims. In this paper, we propose a dependent modeling framework to jointly examine the two components in a longitudinal context where the quantity of interest is the predictive distribution. The proposed model accommodates the temporal correlation in both the frequency and the severity, as well as the association between the frequency and severity using a novel copula regression. The resulting predictive claims distribution allows to incorporate the claim history on both the frequency and severity into ratemaking and other prediction applications. In this application, we examine the insurance claim frequencies and severities for specific peril types from a government property insurance portfolio, namely lightning and vehicle claims, which tend to be frequent in terms of their count. We discover that the frequencies and severities of these frequent peril types tend to have a high serial correlation over time. Using dependence modeling in a longitudinal setting, we demonstrate how the prediction of these frequent claims can be improved.  相似文献   

2.
We introduce methods for visualization of data structured along trees, especially hierarchically structured collections of time series. To this end, we identify questions that often emerge when working with hierarchical data and provide an R package to simplify their investigation. Our key contribution is the adaptation of the visualization principles of focus-plus-context and linking to the study of tree-structured data. Our motivating application is to the analysis of bacterial time series, where an evolutionary tree relating bacteria is available a priori. However, we have identified common problem types where, if a tree is not directly available, it can be constructed from data and then studied using our techniques. We perform detailed case studies to describe the alternative use cases, interpretations, and utility of the proposed visualization methods.  相似文献   

3.
Abstract

We consider visual methods based on mosaic plots for interpreting and modeling categorical data. Categorical data are most often modeled using loglinear models. For certain loglinear models, mosaic plots have unique shapes that do not depend on the actual data being modeled. These shapes reflect the structure of a model, defined by the presence and absence of particular model coefficients. Displaying the expected values of a loglinear model allows one to incorporate the residuals of the model graphically and to visually judge the adequacy of the loglinear fit. This procedure leads to stepwise interactive graphical modeling of loglinear models. We show that it often results in a deeper understanding of the structure of the data. Linking mosaic plots to other interactive displays offers additional power that allows the investigation of more complex dependence models than provided by static displays.  相似文献   

4.
Graphics play a crucial role in statistical analysis and data mining. Being able to quantify structure in data that is visible in plots, and how people read the structure from plots is an ongoing challenge. The lineup protocol provides a formal framework for data plots, making inference possible. The data plot is treated like a test statistic, and lineup protocol acts like a comparison with the sampling distribution of the nulls. This article describes metrics for describing structure in data plots and evaluates them in relation to the choices that human readers made during several large Amazon Turk studies using lineups. The metrics that were more specific to the plot types tended to better match subject choices, than generic metrics. The process that we followed to evaluate metrics will be useful for general development of numerically measuring structure in plots, and also in future experiments on lineups for choosing blocks of pictures. Supplementary materials for this article are available online.  相似文献   

5.
Abstract

XGobi is a data visualization system with state-of-the-art interactive and dynamic methods for the manipulation of views of data. It implements 2-D displays of projections of points and lines in high-dimensional spaces, as well as parallel coordinate displays and textual views thereof. Projection tools include dotplots of single variables, plots of pairs of variables, 3-D data rotations, various grand tours, and interactive projection pursuit. Views of the data can be reshaped. Points can be labeled and brushed with glyphs and colors. Lines can be edited and colored. Several XGobi processes can be run simultaneously and linked for labeling, brushing, and sharing of projections. Missing data are accommodated and their patterns can be examined; multiple imputations can be given to XGobi for rapid visual diagnostics. XGobi includes an extensive online help facility. XGobi can be integrated in other software systems, as has been done for the data analysis language S, the geographic information system (GIS) Arc View?, and the interactive multidimensional scaling program XGvis. XGobi is implemented in the X Window System? for portability as well as the ability to run across a network.  相似文献   

6.
Abstract

We describe methods developed to interpolate and project flight paths of aircraft in controlled airspace over the continental United States from aperiodic position reports. There are a number of unusual features of the dynamic displays we have developed. Our visualizations can be viewed from either a fixed or moving viewpoint. The direction and distance of the focal point from the viewing point is under program control (allowing viewing in a direction other than the direction of motion of the viewpoint). The maximum and minimum depth of field is under program control (allowing viewing of selected local subsets of the data).  相似文献   

7.
This article introduces a graphical goodness-of-fit test for copulas in more than two dimensions. The test is based on pairs of variables and can thus be interpreted as a first-order approximation of the underlying dependence structure. The idea is to first transform pairs of data columns with the Rosenblatt transform to bivariate standard uniform distributions under the null hypothesis. This hypothesis can be graphically tested with a matrix of bivariate scatterplots, Q-Q plots, or other transformations. Furthermore, additional information can be encoded as background color, such as measures of association or (approximate) p-values of tests of independence. The proposed goodness-of-fit test is designed as a basic graphical tool for detecting deviations from a postulated, possibly high-dimensional, dependence model. Various examples are given and the methodology is applied to a financial dataset. An implementation is provided by the R package copula. Supplementary material for this article is available online, which provides the R package copula and reproduces all the graphical results of this article.  相似文献   

8.
In this paper we examine the relationship between a newly developed local dependence measure, the local Gaussian correlation, and standard copula theory. We are able to describe characteristics of the dependence structure in different copula models in terms of the local Gaussian correlation. Further, we construct a goodness-of-fit test for bivariate copula models. An essential ingredient of this test is the use of a canonical local Gaussian correlation and Gaussian pseudo-observations which make the test independent of the margins, so that it is a genuine test of the copula structure. A Monte Carlo study reveals that the test performs very well compared to a commonly used alternative test. We also propose two types of diagnostic plots which can be used to investigate the cause of a rejected null. Finally, our methods are applied to a “classical” insurance data set.  相似文献   

9.
Three-dimensional dynamic scatterplots can reveal certain features of data that cannot be apprehended in marginal two-dimensional displays. Using graduate students as subjects, we sought to establish whether the detection of clusters and nonlinearity in 3-D plots varies by easily characterized properties of the data and the design of the display. We found that the probability of detection of clusters increased smoothly with cluster separation, and that, at a fixed level of separation, “diagonally” displaced clusters were easier to detect than “horizontally” displaced clusters. Cluster detection appeared to be affected to a smaller extent by the design of the display. Three further experiments addressed the detection of nonlinearity in 3-D dynamic scatterplots. Most subjects were able to respond in a reasonable manner to properties of the data, so that the probability of detection of nonlinearity increased with its level, particularly when the signal was strong. As in the experiment on cluster detection, subjects' performance was also affected, though to a lesser extent, by characteristics of the displays; for example, spinning the display horizontally in the regression plane was particularly effective. We discuss the implications of these results for the design of statistical software incorporating dynamic 3-D scatterplots.  相似文献   

10.
Decompositions of the plane into disjoint components separated by curves occur frequently. We describe a package of subroutines which provides facilities for defining, building, and modifying such decompositions and for efficiently solving various point and area location problems. Beyond the point that the specification of this package may be useful to others, we reach the broader conclusion that well-designed data structures and support routines allow the use of more conceptual or non-numerical portions of mathematics in the computational process, thereby extending greatly the potential scope of the use of computers in scientific problem solving. Ideas from conceptual mathematics, symbolic computation, and computer science can be utilized within the framework of scientific computing and have an important role to play in that area.  相似文献   

11.
本文在多种复杂数据下, 研究一类半参数变系数部分线性模型的统计推断理论和方法. 首先在纵向数据和测量误差数据等复杂数据下, 研究半参数变系数部分线性模型的经验似然推断问题, 分别提出分组的和纠偏的经验似然方法. 该方法可以有效地处理纵向数据的组内相关性给构造经验似然比函数所带来的困难. 其次在测量误差数据和缺失数据等复杂数据下, 研究模型的变量选择问题, 分别提出一个“纠偏” 的和基于借补值的变量选择方法. 该变量选择方法可以同时选择参数分量及非参数分量中的重要变量, 并且变量选择与回归系数的估计同时进行. 通过选择适当的惩罚参数, 证明该变量选择方法可以相合地识别出真实模型, 并且所得的正则估计具有oracle 性质.  相似文献   

12.
Recent developments in data-driven science have led researchers to integrate data from several sources, over diverse experimental procedures, or databases. This alone poses a major challenge in truthfully visualizing data, especially when the number of data points varies between classes. To aid the representation of datasets with differing sample size, we have developed a new type of plot overcoming limitations of current standard visualization charts. SinaPlot is inspired by the strip chart and the violin plot and operates by letting the normalized density of points restrict the jitter along the x-axis. The plot displays the same contour as a violin plot but resembles a simple strip chart for a small number of data points. By normalizing jitter over all classes, the plot provides a fair representation for comparison between classes with a varying number of samples. In this way, the plot conveys information of both the number of data points, the density distribution, outliers and data spread in a very simple, comprehensible, and condensed format. The package for producing the plots is available for R through the CRAN network using base graphics package and as geom for ggplot through ggforce. We also provide access to a web-server accepting excel sheets to produce the plots (http://servers.binf.ku.dk:8890/sinaplot/).  相似文献   

13.
We present CARTscans, a graphical tool that displays predicted values across a fourdimensional subspace. We show how these plots are useful for understanding the structure and relationships between variables in a wide variety of models, including (but not limited to) regression trees, ensembles of trees, and linear regressions with varying degrees of interactions. In addition, the common visualization framework allows diverse complex models to be visually compared in a way that illuminates the similarities and differences in the underlying methods, facilitates the choice of a particular model structure, and provides a useful check for implausible predictions of future observations in regions with little or no data.  相似文献   

14.
In this paper, we use directed acyclic graphs (DAGs) with temporal structure to describe models of nonignorable nonresponse mechanisms for binary outcomes in longitudinal studies, and we discuss identification of these models under an assumption that the sequence of variables has the first-order Markov dependence, that is, the future variables are independent of the past variables conditional on the present variables. We give a stepwise approach for checking identifiability of DAG models. For an unidentifiable model, we propose adding completely observed variables such that this model becomes identifiable.  相似文献   

15.
Interactive graphics provide a very important tool that facilitates the process of exploratory data and model analysis which is a crucial step in real-world applied statistics. Only a very limited set of software exists that provides truly interactive graphics for data analysis, partially because it is not easy to implement. Very often specialized software is created to offer graphics for a particular problem, but many fundamental plots are omitted since it is not considered new research. In this paper we discuss a general framework that allows to create interactive graphics software on a sound foundation that offers consistent user interface, fast prototyping of new plots and extensibility to support interactive models. In addition, we also discuss one implementation of the general framework: iPlots eXtreme—next-generation interactive graphics for analysis of large data in R. It provides most fundamental plot types and allows new interactive plots to be created. The implementation raises interactive graphics performance to an entirely new level. We will discuss briefly several methods that allowed us to achieve this goal and illustrate the use of advanced programmability features in conjunction with R.  相似文献   

16.
Longitudinal study has become one of the most commonly adopted designs in medical research. The generalized estimating equations (GEE) method and/or mixed effects models are employed very often in causal inferences. The related model diagnostic procedures are not yet fully formalized, and perhaps never will be. The potential causes of major problems are the high variety of the dependence within subjects and/or the number of repeated measurements. A single testing procedure, e.g., run test, is not possible to resolve all model diagnostics problems in longitudinal data analysis. Multiple quantitative indexes for model diagnostics are needed to take into account this variety. We propose eight testing procedures for randomness accompanied with some conventional and/or non-conventional plots to remedy model diagnostics in longitudinal data analysis. The proposed issue in this paper is well illustrated with four clinical studies in Taiwan.  相似文献   

17.
Abstract

This article first illustrates the use of mosaic displays for the analysis of multiway contingency tables. We then introduce several extensions of mosaic displays designed to integrate graphical methods for categorical data with those used for quantitative data. The scatterplot matrix shows all pairwise (bivariate marginal) views of a set of variables in a coherent display. One analog for categorical data is a matrix of mosaic displays showing some aspect of the bivariate relation between all pairs of variables. The simplest case shows the bivariate marginal relation for each pair of variables. Another case shows the conditional relation between each pair, with all other variables partialled out. For quantitative data this represents (a) a visualization of the conditional independence relations studied by graphical models, and (b) a generalization of partial residual plots. The conditioning plot, or coplot shows a collection of partial views of several quantitative variables, conditioned by the values of one or more other variables. A direct analog of the coplot for categorical data is an array of mosaic plots of the dependence among two or more variables, stratified by the values of one or more given variables. Each such panel then shows the partial associations among the foreground variables; the collection of such plots shows how these associations change as the given variables vary.  相似文献   

18.
We present a unified semiparametric Bayesian approach based on Markov random field priors for analyzing the dependence of multicategorical response variables on time, space and further covariates. The general model extends dynamic, or state space, models for categorical time series and longitudinal data by including spatial effects as well as nonlinear effects of metrical covariates in flexible semiparametric form. Trend and seasonal components, different types of covariates and spatial effects are all treated within the same general framework by assigning appropriate priors with different forms and degrees of smoothness. Inference is fully Bayesian and uses MCMC techniques for posterior analysis. The approach in this paper is based on latent semiparametric utility models and is particularly useful for probit models. The methods are illustrated by applications to unemployment data and a forest damage survey.  相似文献   

19.
许多研究领域中都会涉及到纵向资料的分析处理.在纵向资料的分析中,常常遇到带有时变协变量的情况.传统的方差分析难以处理具有时变协变量的纵向资料.对线性混合效应模型应用于分析这类资料进行了方法学的探讨,并编制了模型拟合的SAS程序.通过实例应用,对线性混合效应模型分析纵向资料的方法和过程给出详细的介绍.  相似文献   

20.
This article presents individual conditional expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm. Classical partial dependence plots (PDPs) help visualize the average partial relationship between the predicted response and one or more features. In the presence of substantial interaction effects, the partial response relationship can be heterogeneous. Thus, an average curve, such as the PDP, can obfuscate the complexity of the modeled relationship. Accordingly, ICE plots refine the PDP by graphing the functional relationship between the predicted response and the feature for individual observations. Specifically, ICE plots highlight the variation in the fitted values across the range of a covariate, suggesting where and to what extent heterogeneities might exist. In addition to providing a plotting suite for exploratory analysis, we include a visual test for additive structure in the data-generating model. Through simulated examples and real datasets, we demonstrate how ICE plots can shed light on estimated models in ways PDPs cannot. Procedures outlined are available in the R package ICEbox.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号