首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
In multivariate categorical data, models based on conditional independence assumptions, such as latent class models, offer efficient estimation of complex dependencies. However, Bayesian versions of latent structure models for categorical data typically do not appropriately handle impossible combinations of variables, also known as structural zeros. Allowing nonzero probability for impossible combinations results in inaccurate estimates of joint and conditional probabilities, even for feasible combinations. We present an approach for estimating posterior distributions in Bayesian latent structure models with potentially many structural zeros. The basic idea is to treat the observed data as a truncated sample from an augmented dataset, thereby allowing us to exploit the conditional independence assumptions for computational expediency. As part of the approach, we develop an algorithm for collapsing a large set of structural zero combinations into a much smaller set of disjoint marginal conditions, which speeds up computation. We apply the approach to sample from a semiparametric version of the latent class model with structural zeros in the context of a key issue faced by national statistical agencies seeking to disseminate confidential data to the public: estimating the number of records in a sample that are unique in the population on a set of publicly available categorical variables. The latent class model offers remarkably accurate estimates of population uniqueness, even in the presence of a large number of structural zeros.  相似文献   

2.
Estimation of nonbinary random response   总被引:1,自引:0,他引:1  
The paper treats a new approach to reducing the dimension of factors which affect the non-binary response variable Y. This is relevant in analysis of a number of stochastic models, for instance, in biological and medical studies. The quality of Y estimation by means of a function in those factors is described by a specified error functional. It involves a penalty function to take into account the importance of the forecast for different response values. The joint distribution of factors and response variable is unknown. Thus it is quite natural to employ for statistical inference the estimates of the error functional constructed by prediction algorithm and cross-validation procedure. One of our main results provides the criterion of strong consistency of such estimates as the number of observations tends to infinity. Due to this result one can identify the significant factors. We introduce also the regularized versions of estimates and establish for them the central limit theorem (CLT). The statistical variant of our CLT permits to construct the approximate confidence intervals for unknown error functional.  相似文献   

3.
A hierarchical model is developed for the joint mortality analysis of pension scheme datasets. The proposed model allows for a rigorous statistical treatment of missing data. While our approach works for any missing data pattern, we are particularly interested in a scenario where some covariates are observed for members of one pension scheme but not the other. Therefore, our approach allows for the joint modelling of datasets which contain different information about individual lives. The proposed model generalizes the specification of parametric models when accounting for covariates. We consider parameter uncertainty using Bayesian techniques. Model parametrization is analysed in order to obtain an efficient MCMC sampler, and address model selection. The inferential framework described here accommodates any missing-data pattern, and turns out to be useful to analyse statistical relationships among covariates. Finally, we assess the financial impact of using the covariates, and of the optimal use of the whole available sample when combining data from different mortality experiences.  相似文献   

4.
人体逆向运动学问题是人体运动合成、人体运动捕获和理解的基本问题.由于人体关节链式系统的复杂性,人体逆向运动学方程往往存在多解或无解的情形.传统的方法通常采用解析或数值迭代方法求解逆向运动学问题,在给定足够多约束的情形下能够得到比较好的解,但无法处理少量约束下生成自然的人体姿态问题.近年来,从大规模数据集中学习统计模型参数的思想被广泛运用,求解人体逆向运动学的机器学习方法中经典工作|混合Gauss逆向运动求解模型(Gaussian mixture model-inverse kinematics,GMM-IK)就提出利用混合Gauss模型建模人体姿态数据分布,并采用期望最大化方法求解参数.随着深度学习技术的发展,本文提出一种自编码神经网络与数值迭代融合的方法,在给定少量约束的情形下依然能够得到自然的人体姿态,相较于GMM-IK方法,本文所提出的方法通过神经网络自动学习姿态分布,省去了模型的假设和特征的设计,且量化实验显示本文方法的关节坐标和角度重建误差相较于GMM-IK模型平均减少了25%和39%.在应用方面,本文方法可处理光学运动捕获数据,也可用于图像视频的人体姿态估计等领域.  相似文献   

5.
近年来, 已有一些在半参数密度函数比模型下建立半参数统计分析方法的报道, 这些方法往往比参数方法稳健, 比非参数方法有效. 在本文里, 我们提出一种半参数的假设检验方法用于对两总体均值差进行假设检验. 该方法主要建立在对两总体均值差进行半参数估计的基础上. 我们报告了一些理论和统计模拟的结果, 得出该方法在数据符合正态性假设时, 比常用的参数和非参数方法略好; 而在数据不符合正态性假设时, 它的优势就非常明显. 我们还将提出的方法用到了两组真实数据的分析上.  相似文献   

6.
The motion of an absolutely rigid body attached to a fixed base by a two-degrees-of-freedom joint in a uniform gravitational field parallel to the fixed axis of the joint is studied qualitatively. Various kinds of motion are described and analysed, depending on the total mechanical energy and the projection of the angular momentum of the body onto the fixed axis of the joint as well as on the inertial parameters of the system.

This paper is a continuation of [1].  相似文献   


7.
Estimation of the Extreme Flow Distributions by Stochastic Models   总被引:1,自引:0,他引:1  
The t-year event is a commonly used characteristic to describe the extreme flood peak in hydrological designs. The annual maximum series (AMS) and the partial duration series (PDS) are two basic approaches in flood analyses. In this paper, we first derive the distribution of the maximum extreme or the joint distribution of two or more maximum extremes from historical records based on a stochastic model, and then estimate statistical characteristics, including the t-year event, from the distribution. In addition to the two classical approaches (AMS and PDS), two additional approaches are proposed for estimating the unknown parameters in this paper. The first one uses two or more annual maximums (MAMS) as the sample to estimate the distribution of the maximum extremes. The second one uses multi-variate shock model to estimate the distribution of the maximum extremes for a multi-modal streamflow. The distribution of the extreme streamflow and the associated characteristics in the Bird Creek in Avant, Oklahoma, in the St. Johns River in Deland, Florida, and in the West Walker River in Coleville, California are estimated by using the stochastic model. To investigate further the performance of the estimation, the stochastic models based on AMS, MAMS and PDS related are also applied to the simulated data. The results show that the stochastic model and the related methods are reliable.  相似文献   

8.
研究了动态面板数据模型的条件异方差性检验问题.对于n和T都很大的固定效应动态面板数据模型,通过残差的一阶差分的平方序列,建立一个人工自回归模型,并基于该人工自回归模型系数的最小二乘估计构造检验统计量,检验误差序列的条件异方差性.研究表明在一定的假设条件下,得到的检验渐近服从卡方分布,计算简单方便,通过一些模拟试验研究了检验的小样本性质.模拟研究表明该检验表现很好.  相似文献   

9.
In certain cancer chemoprevention experiments both the number of observed tumors per animal and their times to detection are used in subsequent statistical analyses. The mathematical models used to represent these experiments usually include the Poisson distribution to characterize the tumor multiplicity data. Very often however, there is excess variance due to interanimal heterogeneity of tumor response. Thus, the number of induced tumors is better characterized by the negative binomial distribution. In this paper we modify an existing statistical technique, which explicitly acknowledges the confounding inherent in these systems, in order to provide a more efficient procedure for utilizing the information in a sample and to more accurately assess treatment effects.  相似文献   

10.
Data are often affected by uncertainty. Uncertainty is usually referred to as randomness. Nonetheless, other sources of uncertainty may occur. In particular, the empirical information may also be affected by imprecision. Also in these cases it can be fruitful to analyze the underlying structure of the data. In this paper we address the problem of summarizing a sample of three-way imprecise data. In order to manage the different sources of uncertainty a twofold strategy is adopted. On the one hand, imprecise data are transformed into fuzzy sets by means of the so-called fuzzification process. The so-obtained fuzzy data are then analyzed by suitable generalizations of the Tucker3 and CANDECOMP/PARAFAC models, which are the two most popular three-way extensions of Principal Component Analysis. On the other hand, the statistical validity of the obtained underlying structure is evaluated by (nonparametric) bootstrapping. A simulation experiment is performed for assessing whether the use of fuzzy data is helpful in order to summarize three-way uncertain data. Finally, to show how our models work in practice, an application to real data is discussed.  相似文献   

11.
A new sample business survey for agriculture, the REA survey, and a project of integration with the FADN network (RICA in Italy) have significantly changed the production of statistical information nowadays available with reference to the agricultural sector. On the basis of this relevant information, new economic analyses are being developed on farms’ performance, agricultural households’ income and the Common Agricultural Policy (CAP). In this paper the authors estimate the relationship between the levels of variables of interest and their sampling errors using models in order to improve the accessibility of the information on estimates accuracy to the final users (agricultural analysts, policy makers). The paper is the result of a joint research of the three authors. Sections 1–3 and 6 by Pizzoli, Sects. 4 and 5 by Rondinelli, Sect. 7 by Filiberti, conclusions joint to the three authors.  相似文献   

12.
The aim of this paper is the study of some random probability distributions, called hyper-Dirichlet processes. In the simplest situation considered in the paper these distributions charge the product of three sample spaces, with the property that the first and the last component are independent conditional to the middle one. The law of the marginals on the first two and on the last two components are specified to be Dirichlet processes with the same marginal parameter measure on the common second component. The joint law is then obtained as the hyper-Markov combination, introduced in [A.P. Dawid, S.L. Lauritzen, Hyper-Markov laws in the statistical analysis of decomposable graphical models, Ann. Statist. 21 (3) (1993) 1272-1317], of these two Dirichlet processes. The processes constructed in this way in fact are in fact generalizations of the hyper-Dirichlet laws on contingency tables considered in the above paper. Our main result is the convergence to the hyper-Dirichlet process of the sequence of hyper-Dirichlet laws associated to finer and finer “discretizations” of the two parameter measures, which is proved by means of a suitable coupling construction.  相似文献   

13.
14.
In collecting clinical data, data would be censored due to competing risks or patient withdrawal. The statistical inference for censoring data is always based on the assumption that the failure time and censoring time is independent. But in practice the failure time and censoring time are often dependent. Dependent censoring make the job to deal with censoring data more complicated. In this paper, we assume that the joint distribution of the failure time variable and censoring time variable is a function of their marginal distributions. This function is called a copula. Under prespecified copulas, the maximum likelihood estimators for cox proportional hazards models are worked out. Statistical analysis results are carried by simulations. When dependent censoring happens, the proposed method will do better than the traditional method used in independent situations. Simulation results show that the proposed method can get efficient estimations.  相似文献   

15.
为了提高在用电梯监督抽查工作效率及有效性,在统计分析G市电梯安全监管抽查的大样本数据基础上,构建以电梯使用情况、电梯基本参数及制造维保相关情况等为指标的管理体系。根据电梯抽查数据的实质,先对数据进行变量筛选,然后构建风险分级,最后对前人的方法作出改进形成风险矩阵法并提出以Logistic方法为电梯整机风险建立量化模型,最终形成电梯整机风险评估体系。从理论的角度看,通过使用LIFT统计量和K-S统计量比较两种风险值计算模型,得出用Logistic方法进行风险分层更为准确。而实际的工程应用表明,利用Logistic回归法与基于平均风险值赋权比例法的组合为电梯安全监管抽样调查提供的筛选比例,比现有的方法更合理准确。  相似文献   

16.
ABSTRACT. There are three classes of forest model used to simulate forest productivity across large areas and over long periods: growth and yield models, based on statistical relationships derived from measurements on trees; the so-called gap models, concerned with species succession and dynamics, and carbon balance or biomass models. The characteristics of each type are discussed and illustrated by reference to some of the more important of the models in current use. The emphasis in this paper is on the carbon balance models, particularly on a new model (3-PG), developed in a deliberate attempt to bridge the gap between growth and yield and carbon balance models, and the companion model (3-PGS) derived from 3-PG to utilize satellite data as inputs to constrain the simulation calculations and improve estimates of growth over time. 3-PG/3-PGS run on monthly time steps, driven by weather data, and avoid the problems of over-parameterization and the requirements for a great deal of input data that limit the practical value of most carbon balance models. We present test results from 3-PG against experimental data, and against forest plot (mensuration) data from large areas; also test results from 3-PGS against estimates of average forest growth over large areas, and in plantations with different planting times, using AVHRR and Landsat MSS data to constrain the model outputs. The paper discusses the problems of the variability of natural forests and the difficulties this causes in validating models intended for use over large areas. The value of remote sensing as means of overcoming this problem is considered.  相似文献   

17.

Most statistical methods are based on models, but most practical applications ignore the fact that the results depend on the model as well as on the data. This paper examines the size of this model dependence, and finds that there can be very considerable variation between the results of fitting different models to the same data, even if the models being considered are restricted to those which give an acceptable fit to the data. Under reasonable regularity conditions, we show that different empirically acceptable models can give rise to non-overlapping confidence intervals for the same parameter. Application papers need to recognize that the validity of conventional statistical results rests on the assumption that the underlying model is known to be correct, and that this is a much stronger requirement than merely confirming that the model gives a good fit to the data. The problem of model dependence is only partially resolved by using formal methods of model selection or model averaging.

  相似文献   

18.
Popularity of nontraditional approaches to the statistical classification problem has resulted from the potential of these techniques to outperform the standard parametric procedures under conditions when nonnormality is present. Thus proponents of these nontraditional models have recommended these models when outliers are in the data. However, research showing that these nontraditional models' performances can vary widely depending on where the outlier data are located has not been fully illustrated. The research in this paper demonstrates how the mathematical programming approaches and the nearest neighbor discriminant models can be affected by the position of contaminated normal data and that each of the models studied in this paper may not be robust to all types of outliers in the data. The results of this paper are also important because the study compares two recently proposed mathematical programming models as well as two versions of the nearest neighbor model with the standard classical parametric models. This combination of classification models does not appear to have been studied together under conditions of contaminated normal data in which numerous positions of the outliers are considered.  相似文献   

19.
To evaluate consumer loan applications, loan officers use many techniques such as judgmental systems, statistical models, or simply intuitive experience. In recent years, fuzzy systems and neural networks have attracted the growing interest of researchers and practitioners. This study compares the performance of artificial neuro-fuzzy inference systems (ANFIS) and multiple discriminant analysis models to screen potential defaulters on consumer loans. Using a modeling sample and a test sample, we find that the neuro-fuzzy system performs better than the multiple discriminant analysis approach to identify bad credit applications. Further, neuro-fuzzy systems have many advantages over traditional computational methods. Neuro-fuzzy system models are flexible, more tolerant of imprecise data, and can model non-linear functions of arbitrary complexity.  相似文献   

20.
The discriminatory processor sharing queues with multiple classes of customers (abbreviated as DPS queues) are an important but difficult research direction in queueing theory, and it has many important practical applications in the fields of, such as, computer networks, manufacturing systems, transportation networks, and so forth. Recently, researchers have carried out some key work for the DPS queues. They gave the generating function of the steady-state joint queue lengths, which leads to the first two moments of the steady-state joint queue lengths. However, using the generating function to provide explicit expressions for the steady-state joint queue lengths has been a difficult and challenging problem for many years. Based on this, this paper applies the maximum entropy principle in the information theory to providing an approximate expression with high precision, and this approximate expression can have the same first three moments as those of its exact expression. On the other hand, this paper gives efficiently numerical computation by means of this approximate expression, and analyzes how the key variables of this approximate expression depend on the original parameters of this queueing system in terms of some numerical experiments. Therefore, this approximate expression has important theoretical significance to promote practical applications of the DPS queues. At the same time, not only do the methodology and results given in this paper provide a new line in the study of DPS queues, but they also provide the theoretical basis and technical support for how to apply the information theory to the study of queueing systems, queueing networks and more generally, stochastic models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号