共查询到20条相似文献,搜索用时 0 毫秒
1.
Stuart G. Baker 《Journal of computational and graphical statistics》2013,22(1):63-76
Abstract A simple matrix formula is given for the observed information matrix when the EM algorithm is applied to categorical data with missing values. The formula requires only the design matrices, a matrix linking the complete and incomplete data, and a few simple derivatives. It can be easily programmed using a computer language with operators for matrix multiplication, element-by-element multiplication and division, matrix concatenation, and creation of diagonal and block diagonal arrays. The formula is applicable whenever the incomplete data can be expressed as a linear function of the complete data, such as when the observed counts represent the sum of latent classes, a supplemental margin, or the number censored. In addition, the formula applies to a wide variety of models for categorical data, including those with linear, logistic, and log-linear components. Examples include a linear model for genetics, a log-linear model for two variables and nonignorable nonresponse, the product of a log-linear model for two variables and a logit model for nonignorable nonresponse, a latent class model for the results of two diagnostic tests, and a product of linear models under double sampling. 相似文献
2.
李望月 《数学的实践与认识》2014,(9)
基于2008年经济普查的数据,从描述统计分析和回归分析两方面分别对微观数据和宏观汇总数据在统计分析上的差异进行了实证分析.在描述统计分析中发现,宏观汇总数据比微观数据更接近正态分布,但对数化处理后的数据并非如此;在回归分析中发现,基于微观数据和宏观汇总数据估计的生产函数,在消除异方差和多重共线性之前,无论是在生产函数的规模效应、生产要素的贡献率以及生产要素对产出的解释力度上均存在着差异,但是在消除异方差和多重共线性之后,在要素对产出的解释力度上仍存在很大差异. 相似文献
3.
The technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Extracting useful information from such massive datasets is an ongoing challenge as traditional data visualization tools typically do not scale well in high-dimensional settings. An existing visualization technique that is particularly well suited to visualizing large datasets is the heatmap. Although heatmaps are extremely popular in fields such as bioinformatics, they remain a severely underutilized visualization tool in modern data analysis. This article introduces superheat, a new R package that provides an extremely flexible and customizable platform for visualizing complex datasets. Superheat produces attractive and extendable heatmaps to which the user can add a response variable as a scatterplot, model results as boxplots, correlation information as barplots, and more. The goal of this article is two-fold: (1) to demonstrate the potential of the heatmap as a core visualization method for a range of data types, and (2) to highlight the customizability and ease of implementation of the superheat R package for creating beautiful and extendable heatmaps. The capabilities and fundamental applicability of the superheat package will be explored via three reproducible case studies, each based on publicly available data sources. 相似文献
4.
Peter Sutherland Anthony Rossini Thomas Lumley Nicholas Lewin-Koh Julie Dickerson Zach Cox 《Journal of computational and graphical statistics》2013,22(3):509-529
Abstract This article describes constructing interactive and dynamic linked data views using the Java programming language. The data views are designed for data that have a multivariate component. The approach to displaying data comes from earlier research on building statistical graphics based on data pipelines, in which different aspects of data processing and graphical rendering are organized conceptually into segments of a pipeline. The software design takes advantage of the object-oriented nature of the Java language to open up the data pipeline, allowing developers to have greater control over their visualization applications. Importantly, new types of data views coded to adhere to a few simple design requirements can easily be integrated with existing pipe sections. This allows access to sophisticated linking and dynamic interaction across all (new and existing) view types. Pipe segments can be accessed from data analysis packages such as Omegahat or R, providing a tight coupling of visual and numerical methods. 相似文献
5.
Chi-Ying Leung 《Annals of the Institute of Statistical Mathematics》1998,50(3):417-431
Classification between two populations dealing with both continuous and binary variables is handled by splitting the problem into different locations. Given the location specified by the values of the binary variables, discrimination is performed using the continuous variables. The location probability model with homoscedastic across location conditional dispersion matrices is adopted for this problem. In this paper, we consider presence of continuous covariates with heterogeneous location conditional dispersion matrices. The continuous covariates have equal location specific mean in both populations. Conditional homoscedasticity fails when strong interaction between the continuous and binary variables is present. A plug-in covariance adjusted rule is constructed and its asymptotic distribution is derived. An asymptotic expansion for the overall error rate is given. The result is extended to include binary covariates. 相似文献
6.
Eric Rosenberg 《Journal of Heuristics》2000,6(1):9-20
This paper documents a model that was pivotal in deciding which of two architectures should be selected for a frame relay data communications network. The choices are either to continue using the current architecture, or to make a large incremental investment in new equipment which reduces the number of high speed inter-office trunks required to interconnect the switches. The analysis requires optimizing the mix of two types of customer port cards to determine the maximum customer port capacity of a switch. Simple approximations are used to estimate the number of inter-office trunks and trunk cards required. Based in large part on the costs computed by this model, an executive level decision was made to move to the new architecture. 相似文献
7.
8.
《Journal of computational and graphical statistics》2013,22(4):893-914
We consider the implications of streaming data for data analysis and data mining. Streaming data are becoming widely available from a variety of sources. In our case we consider the implications arising from Internet traffic data. By implication, streaming data are unlikely to be time homogeneous so that standard statistical and data mining procedures do not necessarily apply. Because it is essentially impossible to store streaming data, we consider recursive algorithms, algorithms which are adaptive and discount the past and also algorithms that create finite pseudo-samples. We also suggest some evolutionary graphics procedures that are suitable for streaming data. We begin our discussion with a discussion of Internet traffic in order to give the reader some sense of the time and data scale and visual resolution needed for such problems. 相似文献
9.
We investigate structure and structural change within the French machinery industry from 1984–1991 in order to detect the apparent technology leaders and to get an account of the technological variety within the sector. The theoretical background of the paper is found in modern approaches to the economics of innovation and technology, where the very nature of technological knowledge and the local character of technological change are seen as a fundamental reason for the use of different technologies and for the different performances of firms. We apply a procedure that allows us to take into account such different performances and variety, Data Envelopment Analysis (DEA). As one major result, we find several best-practice technologies as well as a measure of technical inefficiency, allowing us to classify firms with respect to their relative technical performance. Moreover, technology leaders can be assigned to specific technology fields. The change within and between these fields over time is investigated. 相似文献
10.
无失效数据情形可靠性参数的估计和调整 总被引:10,自引:0,他引:10
本文在无失效取样情形下,提出了产品可靠性参数的一种估计和调整的方法———加权多层Bayes估计法.在无失效数据情形下失效率的多层Bayes估计和引进失效信息后失效率的多层Bayes估计的基础上,对可靠性参数进行了估计和调整———给出了失效率和可靠度的加权多层Bayes估计.最后,结合发动机的实际问题进行了计算,结果表明本文提出的方法可行且便于应用. 相似文献
11.
Conventional DEA models have been introduced to deal with non-negative data. In the real world, in some occasions, we have outputs and/or inputs, which can take negative data. In DEA literature some approaches have been presented for evaluating performance of units, which operate with negative data. In this paper, firstly, we give a brief review of these works, then we present a new additive based approach in this framework. The proposed model is designed to provide a target with non-negative value associated with negative components for each observed unit, failed by other methods. An empirical application in banking is then used to show the applicability of the proposed method and make a comparison with the other approaches in the literature. 相似文献
12.
本文对于1987-1996年的季度数据分别原数据和对数数据在3个频率处进行了季节性协整检验,分析了风险水平的特征,得出了季节性调整可能破坏协整性的结论 相似文献
13.
A Strong Representation of the Product-Limit Estimator for Left Truncated and Right Censored Data 总被引:1,自引:0,他引:1
In this paper we consider the TJW product-limit estimatorFn(x) of an unknown distribution functionFwhen the data are subject to random left truncation and right censorship. An almost sure representation of PL-estimatorFn(x) is derived with an improved error bound under some weaker assumptions. We obtain the strong approximation ofFn(x)−F(x) by Gaussian processes and the functional law of the iterated logarithm is proved for maximal derivation of the product-limit estimator toF. A sharp rate of convergence theorem concerning the smoothed TJW product-limit estimator is obtained. Asymptotic properties of kernel estimators of density function based on TJW product-limit estimator is given. 相似文献
14.
无失效数据情形失效率的估计及其应用 总被引:4,自引:1,他引:4
韩明 《数学物理学报(A辑)》2000,20(3):364-369
该文对指数分布的无失效数据,在失效率 的先验密度的核为 a-1时,给出了失效率 的Bayes估计和多层Bayes估计.并对某液压电动机,在寿命眼从指数分布时,给出了该液压电动机无失效数据情形可靠度的估计. 相似文献
15.
This paper presents a dual-objective evolutionary algorithm (DOEA) for extracting multiple decision rule lists in data mining,
which aims at satisfying the classification criteria of high accuracy and ease of user comprehension. Unlike existing approaches,
the algorithm incorporates the concept of Pareto dominance to evolve a set of non-dominated decision rule lists each having
different classification accuracy and number of rules over a specified range. The classification results of DOEA are analyzed
and compared with existing rule-based and non-rule based classifiers based upon 8 test problems obtained from UCI Machine
Learning Repository. It is shown that the DOEA produces comprehensible rules with competitive classification accuracy as compared
to many methods in literature. Results obtained from box plots and t-tests further examine its invariance to random partition of datasets.
An erratum to this article is available at . 相似文献
16.
本文针对Weibull分布情形下的Ⅰ型区间删失数据,提出了产品的可靠度的优良的置信下限的理论与计算方法。 相似文献
17.
18.
Journal of the Operational Research Society - 相似文献
19.
Qualitative factors are difficult to mathematically manipulate when calculating the efficiency in data envelopment analysis (DEA). The existing methods of representing the qualitative data by ordinal variables and assigning values to obtain efficiency measures only superficially reflect the precedence relationship of the ordinal data. This paper treats the qualitative data as fuzzy numbers, and uses the DEA multipliers associated with the decision making units (DMUs) being evaluated to construct the membership functions. Based on Zadeh’s extension principle, a pair of two-level mathematical programs is formulated to calculate the α-cuts of the fuzzy efficiencies. Fuzzy efficiencies contain more information for making better decisions. A performance evaluation of the chemistry departments of 52 UK universities is used for illustration. Since the membership functions are constructed from the opinion of the DMUs being evaluated, the results are more representative and persuasive. 相似文献
20.
Deborah F. Swayne Dianne Cook Andreas Buja 《Journal of computational and graphical statistics》2013,22(1):113-130
Abstract XGobi is a data visualization system with state-of-the-art interactive and dynamic methods for the manipulation of views of data. It implements 2-D displays of projections of points and lines in high-dimensional spaces, as well as parallel coordinate displays and textual views thereof. Projection tools include dotplots of single variables, plots of pairs of variables, 3-D data rotations, various grand tours, and interactive projection pursuit. Views of the data can be reshaped. Points can be labeled and brushed with glyphs and colors. Lines can be edited and colored. Several XGobi processes can be run simultaneously and linked for labeling, brushing, and sharing of projections. Missing data are accommodated and their patterns can be examined; multiple imputations can be given to XGobi for rapid visual diagnostics. XGobi includes an extensive online help facility. XGobi can be integrated in other software systems, as has been done for the data analysis language S, the geographic information system (GIS) Arc View?, and the interactive multidimensional scaling program XGvis. XGobi is implemented in the X Window System? for portability as well as the ability to run across a network. 相似文献