期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Simple Method for Computing the Observed Information Matrix When Using the EM Algorithm with Categorical Data

Stuart G. Baker 《Journal of computational and graphical statistics》2013,22(1):63-76

Abstract

A simple matrix formula is given for the observed information matrix when the EM algorithm is applied to categorical data with missing values. The formula requires only the design matrices, a matrix linking the complete and incomplete data, and a few simple derivatives. It can be easily programmed using a computer language with operators for matrix multiplication, element-by-element multiplication and division, matrix concatenation, and creation of diagonal and block diagonal arrays. The formula is applicable whenever the incomplete data can be expressed as a linear function of the complete data, such as when the observed counts represent the sum of latent classes, a supplemental margin, or the number censored. In addition, the formula applies to a wide variety of models for categorical data, including those with linear, logistic, and log-linear components. Examples include a linear model for genetics, a log-linear model for two variables and nonignorable nonresponse, the product of a log-linear model for two variables and a logit model for nonignorable nonresponse, a latent class model for the results of two diagnostic tests, and a product of linear models under double sampling. 相似文献

2.

微观数据和宏观汇总数据在统计分析上的差异——以C-D生产函数为例

李望月《数学的实践与认识》2014,(9)

基于2008年经济普查的数据,从描述统计分析和回归分析两方面分别对微观数据和宏观汇总数据在统计分析上的差异进行了实证分析.在描述统计分析中发现,宏观汇总数据比微观数据更接近正态分布,但对数化处理后的数据并非如此;在回归分析中发现,基于微观数据和宏观汇总数据估计的生产函数,在消除异方差和多重共线性之前,无论是在生产函数的规模效应、生产要素的贡献率以及生产要素对产出的解释力度上均存在着差异,但是在消除异方差和多重共线性之后,在要素对产出的解释力度上仍存在很大差异. 相似文献

3.

Superheat: An R Package for Creating Beautiful and Extendable Heatmaps for Visualizing Complex Data

Rebecca L. Barter Bin Yu 《Journal of computational and graphical statistics》2013,22(4):910-922

The technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Extracting useful information from such massive datasets is an ongoing challenge as traditional data visualization tools typically do not scale well in high-dimensional settings. An existing visualization technique that is particularly well suited to visualizing large datasets is the heatmap. Although heatmaps are extremely popular in fields such as bioinformatics, they remain a severely underutilized visualization tool in modern data analysis. This article introduces superheat, a new R package that provides an extremely flexible and customizable platform for visualizing complex datasets. Superheat produces attractive and extendable heatmaps to which the user can add a response variable as a scatterplot, model results as boxplots, correlation information as barplots, and more. The goal of this article is two-fold: (1) to demonstrate the potential of the heatmap as a core visualization method for a range of data types, and (2) to highlight the customizability and ease of implementation of the superheat R package for creating beautiful and extendable heatmaps. The capabilities and fundamental applicability of the superheat package will be explored via three reproducible case studies, each based on publicly available data sources. 相似文献

4.

Orca: A Visualization Toolkit for High-Dimensional Data

Peter Sutherland Anthony Rossini Thomas Lumley Nicholas Lewin-Koh Julie Dickerson Zach Cox 《Journal of computational and graphical statistics》2013,22(3):509-529

Abstract

This article describes constructing interactive and dynamic linked data views using the Java programming language. The data views are designed for data that have a multivariate component. The approach to displaying data comes from earlier research on building statistical graphics based on data pipelines, in which different aspects of data processing and graphical rendering are organized conceptually into segments of a pipeline. The software design takes advantage of the object-oriented nature of the Java language to open up the data pipeline, allowing developers to have greater control over their visualization applications. Importantly, new types of data views coded to adhere to a few simple design requirements can easily be integrated with existing pipe sections. This allows access to sophisticated linking and dynamic interaction across all (new and existing) view types. Pipe segments can be accessed from data analysis packages such as Omegahat or R, providing a tight coupling of visual and numerical methods. 相似文献

5.

The Covariance Adjusted Location Linear Discriminant Function for Classifying Data with Unequal Dispersion Matrices in Different Locations

Chi-Ying Leung 《Annals of the Institute of Statistical Mathematics》1998,50(3):417-431

Classification between two populations dealing with both continuous and binary variables is handled by splitting the problem into different locations. Given the location specified by the values of the binary variables, discrimination is performed using the continuous variables. The location probability model with homoscedastic across location conditional dispersion matrices is adopted for this problem. In this paper, we consider presence of continuous covariates with heterogeneous location conditional dispersion matrices. The continuous covariates have equal location specific mean in both populations. Conditional homoscedasticity fails when strong interaction between the continuous and binary variables is present. A plug-in covariance adjusted rule is constructed and its asymptotic distribution is derived. An asymptotic expansion for the overall error rate is given. The result is extended to include binary covariates. 相似文献

6.

Telecommunications Network Case Study: Selecting a Data Network Architecture

Eric Rosenberg 《Journal of Heuristics》2000,6(1):9-20

This paper documents a model that was pivotal in deciding which of two architectures should be selected for a frame relay data communications network. The choices are either to continue using the current architecture, or to make a large incremental investment in new equipment which reduces the number of high speed inter-office trunks required to interconnect the switches. The analysis requires optimizing the mix of two types of customer port cards to determine the maximum customer port capacity of a switch. Simple approximations are used to estimate the number of inter-office trunks and trunk cards required. Based in large part on the costs computed by this model, an executive level decision was made to move to the new architecture. 相似文献

7.

混合缺失机制下两总体差异的半经验似然置信区间

白云霞王历容黎玲秦永松《数学研究》2009,42(1)

假定两个总体x与y均有数据缺失,它们的分布函数分别为F(·)与G_θ(·),其中F(·)未知,G_θ(·)的概率密度函数g_θ(·)形式已知,仅依赖于一些未知的参数,利用Fractional填补法填补缺失值,在一定的条件下证明了缺失数据下两总体差异指标的半经验似然比统计量的渐近分布为x_1~2,由此可构造两总体差异指标的经验似然置信区间. 相似文献

8.

On Some Techniques for Streaming Data: A Case Study of Internet Packet Headers

《Journal of computational and graphical statistics》2013,22(4):893-914

We consider the implications of streaming data for data analysis and data mining. Streaming data are becoming widely available from a variety of sources. In our case we consider the implications arising from Internet traffic data. By implication, streaming data are unlikely to be time homogeneous so that standard statistical and data mining procedures do not necessarily apply. Because it is essentially impossible to store streaming data, we consider recursive algorithms, algorithms which are adaptive and discount the past and also algorithms that create finite pseudo-samples. We also suggest some evolutionary graphics procedures that are suitable for streaming data. We begin our discussion with a discussion of Internet traffic in order to give the reader some sense of the time and data scale and visual resolution needed for such problems. 相似文献

9.

Technological leadership and variety: A Data Envelopment Analysis for the French machinery industry

Jean Bernard Uwe Cantner Georg Westermann 《Annals of Operations Research》1996,68(3):361-377

We investigate structure and structural change within the French machinery industry from 1984–1991 in order to detect the apparent technology leaders and to get an account of the technological variety within the sector. The theoretical background of the paper is found in modern approaches to the economics of innovation and technology, where the very nature of technological knowledge and the local character of technological change are seen as a fundamental reason for the use of different technologies and for the different performances of firms. We apply a procedure that allows us to take into account such different performances and variety, Data Envelopment Analysis (DEA). As one major result, we find several best-practice technologies as well as a measure of technical inefficiency, allowing us to classify firms with respect to their relative technical performance. Moreover, technology leaders can be assigned to specific technology fields. The change within and between these fields over time is investigated. 相似文献

10.

无失效数据情形可靠性参数的估计和调整 总被引：10，自引：0，他引：10

韩明《应用数学》2006,19(2):325-330

本文在无失效取样情形下,提出了产品可靠性参数的一种估计和调整的方法———加权多层Bayes估计法.在无失效数据情形下失效率的多层Bayes估计和引进失效信息后失效率的多层Bayes估计的基础上,对可靠性参数进行了估计和调整———给出了失效率和可靠度的加权多层Bayes估计.最后,结合发动机的实际问题进行了计算,结果表明本文提出的方法可行且便于应用. 相似文献

11.

A two-phase approach for setting targets in DEA with negative data

Reza Kazemi Matin Roza Azizi 《Applied Mathematical Modelling》2011

Conventional DEA models have been introduced to deal with non-negative data. In the real world, in some occasions, we have outputs and/or inputs, which can take negative data. In DEA literature some approaches have been presented for evaluating performance of units, which operate with negative data. In this paper, firstly, we give a brief review of these works, then we present a new additive based approach in this framework. The proposed model is designed to provide a target with non-negative value associated with negative components for each observed unit, failed by other methods. An empirical application in banking is then used to show the applicability of the proposed method and make a comparison with the other approaches in the literature. 相似文献

12.

消费函数的季节性协整分析-北京数据的实证研究

韩立岩《数理统计与管理》1999,18(2):22-26,39

本文对于１９８７－１９９６年的季度数据分别原数据和对数数据在３个频率处进行了季节性协整检验,分析了风险水平的特征,得出了季节性调整可能破坏协整性的结论相似文献

13.

A Strong Representation of the Product-Limit Estimator for Left Truncated and Right Censored Data 总被引：1，自引：0，他引：1

Yong Zhou Paul S. F. Yip 《Journal of multivariate analysis》1999,69(2):261

In this paper we consider the TJW product-limit estimatorF_n(x) of an unknown distribution functionFwhen the data are subject to random left truncation and right censorship. An almost sure representation of PL-estimatorF_n(x) is derived with an improved error bound under some weaker assumptions. We obtain the strong approximation ofF_n(x)−F(x) by Gaussian processes and the functional law of the iterated logarithm is proved for maximal derivation of the product-limit estimator toF. A sharp rate of convergence theorem concerning the smoothed TJW product-limit estimator is obtained. Asymptotic properties of kernel estimators of density function based on TJW product-limit estimator is given. 相似文献

14.

无失效数据情形失效率的估计及其应用 总被引：4，自引：1，他引：4

韩明《数学物理学报(A辑)》2000,20(3):364-369

该文对指数分布的无失效数据,在失效率　的先验密度的核为　ａ－１时,给出了失效率　的Ｂａｙｅｓ估计和多层Ｂａｙｅｓ估计．并对某液压电动机,在寿命眼从指数分布时,给出了该液压电动机无失效数据情形可靠度的估计．相似文献

15.

A Dual-Objective Evolutionary Algorithm for Rules Extraction in Data Mining 总被引：1，自引：0，他引：1

K. C. Tan Q. Yu J. H. Ang 《Computational Optimization and Applications》2006,34(2):273-294

This paper presents a dual-objective evolutionary algorithm (DOEA) for extracting multiple decision rule lists in data mining, which aims at satisfying the classification criteria of high accuracy and ease of user comprehension. Unlike existing approaches, the algorithm incorporates the concept of Pareto dominance to evolve a set of non-dominated decision rule lists each having different classification accuracy and number of rules over a specified range. The classification results of DOEA are analyzed and compared with existing rule-based and non-rule based classifiers based upon 8 test problems obtained from UCI Machine Learning Repository. It is shown that the DOEA produces comprehensible rules with competitive classification accuracy as compared to many methods in literature. Results obtained from box plots and t-tests further examine its invariance to random partition of datasets. An erratum to this article is available at . 相似文献

16.

Ⅰ型区间删失数据下产品可靠度的置信下限

魏中鹏陈家鼎《应用数学学报》2006,29(1):81-90

本文针对Weibull分布情形下的Ⅰ型区间删失数据,提出了产品的可靠度的优良的置信下限的理论与计算方法。相似文献

17.

无失效数据情形失效率的综合E-Bayes估计 总被引：1，自引：0，他引：1

徐天群刘焕彬陈跃鹏《数理统计与管理》2011,30(4):644-654,613

给出了无失效数据情形失效率的E-Bayes估计和引进失效信息后的E-Bayes估计,并在此基础上给出了失效率和可靠度的综合E-Bayes估计。最后,结合实际问题进行了计算,对估计结果进行了分析,并将它们与未假定先验信息时的非Bayes置信限进行比较,结果表明该方法是合理的。相似文献

18.

Productivity Bargaining: A Case Study in the Steel Industry

Harding A. S. 《The Journal of the Operational Research Society》1972,23(4):604-604

Journal of the Operational Research Society - 相似文献

19.

Qualitative factors in data envelopment analysis: A fuzzy number approach 总被引：1，自引：0，他引：1

Chiang Kao Pei-Huang Lin 《European Journal of Operational Research》2011,211(3):586-593

Qualitative factors are difficult to mathematically manipulate when calculating the efficiency in data envelopment analysis (DEA). The existing methods of representing the qualitative data by ordinal variables and assigning values to obtain efficiency measures only superficially reflect the precedence relationship of the ordinal data. This paper treats the qualitative data as fuzzy numbers, and uses the DEA multipliers associated with the decision making units (DMUs) being evaluated to construct the membership functions. Based on Zadeh’s extension principle, a pair of two-level mathematical programs is formulated to calculate the α-cuts of the fuzzy efficiencies. Fuzzy efficiencies contain more information for making better decisions. A performance evaluation of the chemistry departments of 52 UK universities is used for illustration. Since the membership functions are constructed from the opinion of the DMUs being evaluated, the results are more representative and persuasive. 相似文献

20.

XGobi: Interactive Dynamic Data Visualization in the X Window System

Deborah F. Swayne Dianne Cook Andreas Buja 《Journal of computational and graphical statistics》2013,22(1):113-130

Abstract

XGobi is a data visualization system with state-of-the-art interactive and dynamic methods for the manipulation of views of data. It implements 2-D displays of projections of points and lines in high-dimensional spaces, as well as parallel coordinate displays and textual views thereof. Projection tools include dotplots of single variables, plots of pairs of variables, 3-D data rotations, various grand tours, and interactive projection pursuit. Views of the data can be reshaped. Points can be labeled and brushed with glyphs and colors. Lines can be edited and colored. Several XGobi processes can be run simultaneously and linked for labeling, brushing, and sharing of projections. Missing data are accommodated and their patterns can be examined; multiple imputations can be given to XGobi for rapid visual diagnostics. XGobi includes an extensive online help facility. XGobi can be integrated in other software systems, as has been done for the data analysis language S, the geographic information system (GIS) Arc View?, and the interactive multidimensional scaling program XGvis. XGobi is implemented in the X Window System? for portability as well as the ability to run across a network. 相似文献