首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Design-based and design-based model-assisted estimator of total for a variable having many zero values has high variance. The censored regression (tobit) model-based estimators of a finite-population total have been proposed earlier. The aim of the current research is to apply the semiparametric model to a variable with many zero values, to estimate the population total by model-based and model-assisted estimators, and to compare them with other known estimators by simulation.  相似文献   

2.
样本量的确定是抽样设计中的关键问题,传统方法利用总体方差和调查费用的有关信息来确定样本量可能产生两种结果,一种是样本量过低,无法保证希望的估计精度要求;一种是样本量过高,导致调查经费的浪费。计算机辅助电话调查中即时的数据运算和管理功能为序贯抽样的应用奠定了基础。利用前期抽取样本的计算结果,可以规定进一步需抽取的样本量,最终样本量是对真正期望样本量的一个最佳近似,它比传统方法更能保证以最少的费用满足预先设定的精度要求。  相似文献   

3.
An additive model-assisted nonparametric method is investigated to estimate the finite population totals of massive survey data with the aid of auxiliary information. A class of estimators is proposed to improve the precision of the well known Horvitz-Thompson estimators by combining the spline and local polynomial smoothing methods. These estimators are calibrated, asymptotically design-unbiased, consistent, normal and robust in the sense of asymptotically attaining the Godambe-Joshi lower bound to the anticipated variance. A consistent model selection procedure is further developed to select the significant auxiliary variables. The proposed method is sufficiently fast to analyze large survey data of high dimension within seconds. The performance of the proposed method is assessed empirically via simulation studies.  相似文献   

4.
Model-based search methods are a class of optimization techniques that search the solution space by sampling from an underlying probability distribution “model,” which is updated iteratively after evaluating the performance of the samples at each iteration. This paper aims to improve the sampling efficiency of model-based methods by considering a generalization where a population of distribution models is maintained and subsequently propagated from generation to generation. A key issue in the proposed approach is how to efficiently allocate the sampling budget among the population of models to maximize the algorithm performance. We formulate this problem as a generalized max k-armed bandit problem, and derive an efficient dynamic sample allocation scheme based on Markov decision theory to adaptively allocate computational resources. The proposed allocation scheme is then further used to update the current population to produce an improving population of models. Our preliminary numerical results indicate that the proposed procedure may considerably reduce the number of function evaluations needed to obtain high quality solutions, and thus further enhance the value of model-based methods for optimization problems that require expensive function evaluations for performance evaluation.  相似文献   

5.
本文研究长度偏差数据下剩余寿命分位数模型的估计方法,充分考虑有偏抽样机制对模型估计的影响.如果忽略这种有偏性会导致估计产生严重偏差甚至错误的结果.本文首先针对长度偏差右删失数据的剩余寿命分位数提出了对数形式的线性回归模型,对删失变量与协变量独立和不独立的两种情况利用估计方程给出了模型参数的估计.其次,通过经验过程和弱收敛理论给出了参数估计的相合性和渐近正态性.最后,本文对提出的估计方法进行了数值模拟并用该方法对奥斯卡奖数据进行分析.  相似文献   

6.
In this paper, we considered the inference problem on simple step-stress accelerated life test data from one-parameter exponential distribution under type-I censored ordered ranked set sample with cumulative exposure model. The Bayesian estimators and credible intervals for the model parameters are developed and compared with the corresponding estimators based on simple random sampling. Two real data sets and numerical simulation evaluations are presented to illustrate all the results developed here. The simulation study indicated that the proposed Bayes estimators and credible intervals based on ordered ranked set sampling performed better than their counterparts using simple random sampling.  相似文献   

7.
Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, model-based clustering techniques usually perform poorly when dealing with high-dimensional data streams, which are nowadays a frequent data type. To overcome this limitation of model-based clustering, we propose an online inference algorithm for the mixture of probabilistic PCA model. The proposed algorithm relies on an EM-based procedure and on a probabilistic and incremental version of PCA. Model selection is also considered in the online setting through parallel computing. Numerical experiments on simulated and real data demonstrate the effectiveness of our approach and compare it to state-of-the-art online EM-based algorithms.  相似文献   

8.
To get reliable information of the age structure of whale population, Japan conducted a feasibility study of scientific research in the Antarctic in 1987/88. Though the sample was not large enough, it was the first data free from the problem of selectivity and whaling ground bias. From the analysis, it was found that the biological characteristics are highly heterogeneous spatially or other ways. Considering this, we recognize that the survey should be designed to collect the sample from the whole research area uniformly to obtain unbiased estimates of population characteristics. However, in an actual biological field survey, it is difficult to keep the sampling fractions thecisely the same for each sampling units. Therefore, it is important to detect the heterogeneity in the sample, and poststratify the data corresponding to the heterogeneity. The methodology of the estimation and model evaluation presented here will be useful for the development of biological field survey in general.  相似文献   

9.
In the present paper we study switching state space models from a Bayesian point of view. We discuss various MCMC methods for Bayesian estimation, among them unconstrained Gibbs sampling, constrained sampling and permutation sampling. We address in detail the problem of unidentifiability, and discuss potential information available from an unidentified model. Furthermore the paper discusses issues in model selection such as selecting the number of states or testing for the presence of Markov switching heterogeneity. The model likelihoods of all possible hypotheses are estimated by using the method of bridge sampling. We conclude the paper with applications to simulated data as well as to modelling the U.S./U.K. real exchange rate.  相似文献   

10.
We propose a Bayesian approach for inference in the multivariate probit model, taking into account the association structure between binary observations. We model the association through the correlation matrix of the latent Gaussian variables. Conditional independence is imposed by setting some off-diagonal elements of the inverse correlation matrix to zero and this sparsity structure is modeled using a decomposable graphical model. We propose an efficient Markov chain Monte Carlo algorithm relying on a parameter expansion scheme to sample from the resulting posterior distribution. This algorithm updates the correlation matrix within a simple Gibbs sampling framework and allows us to infer the correlation structure from the data, generalizing methods used for inference in decomposable Gaussian graphical models to multivariate binary observations. We demonstrate the performance of this model and of the Markov chain Monte Carlo algorithm on simulated and real datasets. This article has online supplementary materials.  相似文献   

11.
This paper proposes an online surrogate model-assisted multiobjective optimization framework to identify optimal remediation strategies for groundwater contaminated with dense non-aqueous phase liquids. The optimization involves three objectives: minimizing the remediation cost and duration and maximizing the contamination removal rate. The proposed framework adopts a multiobjective feasibility-enhanced particle swarm optimization algorithm to solve the optimization model and uses an online surrogate model as a substitute for the time-consuming multiphase flow model for calculating contamination removal rates during the optimization process. The resulting approach allows decision makers to find a balance among the remediation cost, remediation duration and contamination removal rate for remediating contaminated groundwater. The new algorithm is compared with the nondominated sorting genetic algorithm II, which is an extensively applied and well-known algorithm. The results show that the Pareto solutions obtained by the new algorithm have greater diversity and stability than those obtained by the nondominated sorting genetic algorithm II, indicating that the new algorithm is more applicable than the nondominated sorting genetic algorithm II for optimizing remediation strategies for contaminated groundwater. Additionally, the surrogate model and Pareto optimal set obtained by the proposed framework are compared with those of the offline surrogate model-assisted multiobjective optimization framework. The results indicate that the surrogate model accuracy and Pareto front achieved by the proposed framework outperform those of the offline surrogate model-assisted optimization framework. Thus, we conclude that the proposed framework can effectively enhance the surrogate model accuracy and further extend the comprehensive performance of Pareto solutions.  相似文献   

12.
Data from most complex surveys are subject to selection bias and clustering due to the sampling design. Results developed for a random sample from a super-population model may not apply. Ignoring the survey sampling weights may cause biased estimators and erroneous confidence intervals. In this paper, we use the design approach for fitting the proportional hazards (PH) model and prove formally the asymptotic normality of the sample maximum partial likelihood (SMPL) estimators under the PH model for both stochastically independent and clustered failure times. In the first case, we use the central limit theorem for martingales in the joint design-model space, and this enables us to obtain results for a general multistage sampling design under mild and easily verifiable conditions. In the case of clustered failure times, we require asymptotic normality in the sampling design space directly, and this holds for fewer sampling designs than in the first case. We also propose a variance estimator of the SMPL estimator. A key property of this variance estimator is that we do not have to specify the second-stage correlation model.  相似文献   

13.
The mixture of factor analyzers model, which has been used successfully for the model-based clustering of high-dimensional data, is extended to generalized hyperbolic mixtures. The development of a mixture of generalized hyperbolic factor analyzers is outlined, drawing upon the relationship with the generalized inverse Gaussian distribution. An alternating expectation-conditional maximization algorithm is used for parameter estimation, and the Bayesian information criterion is used to select the number of factors as well as the number of components. The performance of our generalized hyperbolic factor analyzers model is illustrated on real and simulated data, where it performs favourably compared to its Gaussian analogue and other approaches.  相似文献   

14.
Ranked set sampling (RSS) is a sampling approach that can produce improved statistical inference when the ranking process is perfect. While some inferential RSS methods are robust to imperfect rankings, other methods may fail entirely or provide less efficiency. We develop a nonparametric procedure to assess whether the rankings of a given RSS are perfect. We generate pseudo-samples with a known ranking and use them to compare with the ranking of the given RSS sample. This is a general approach that can accommodate any type of raking, including perfect ranking. To generate pseudo-samples, we consider the given sample as the population and generate a perfect RSS. The test statistics can easily be implemented for balanced and unbalanced RSS. The proposed tests are compared using Monte Carlo simulation under different distributions and applied to a real data set.  相似文献   

15.
Clustering is often useful for analyzing and summarizing information within large datasets. Model-based clustering methods have been found to be effective for determining the number of clusters, dealing with outliers, and selecting the best clustering method in datasets that are small to moderate in size. For large datasets, current model-based clustering methods tend to be limited by memory and time requirements and the increasing difficulty of maximum likelihood estimation. They may fit too many clusters in some portions of the data and/or miss clusters containing relatively few observations. We propose an incremental approach for data that can be processed as a whole in memory, which is relatively efficient computationally and has the ability to find small clusters in large datasets. The method starts by drawing a random sample of the data, selecting and fitting a clustering model to the sample, and extending the model to the full dataset by additional EM iterations. New clusters are then added incrementally, initialized with the observations that are poorly fit by the current model. We demonstrate the effectiveness of this method by applying it to simulated data, and to image data where its performance can be assessed visually.  相似文献   

16.
利用分层抽样数据中完全辅助信息的模型校正方法   总被引:1,自引:0,他引:1  
伍长春  张润楚 《数学季刊》2006,21(2):309-316
In stratified survey sampling, sometimes we have complete auxiliary information. One of the fundamental questions is how to effectively use the complete auxiliary information at the estimation stage. In this paper, we extend the model-calibration method to obtain estimators of the finite population mean by using complete auxiliary information from stratified sampling survey data. We show that the resulting estimators effectively use auxiliary information at the estimation stage and possess a number of attractive features such as asymptotically design-unbiased irrespective of the working model and approximately model-unbiased under the model. When a linear working-model is used, the resulting estimators reduce to the usual calibration estimator(or GREG).  相似文献   

17.
Overdispersion in time series of counts is very common and has been well studied by many authors, but the opposite phenomenon of underdispersion may also be encountered in real applications and receives little attention. Based on popularity of the generalized Poisson distribution in regression count models and of Poisson INGARCH models in time series analysis, we introduce a generalized Poisson INGARCH model, which can account for both overdispersion and underdispersion. Compared with the double Poisson INGARCH model, conditions for the existence and ergodicity of such a process are easily given. We analyze the autocorrelation structure and also derive expressions for moments of order 1 and 2. We consider the maximum likelihood estimators for the parameters and establish their consistency and asymptotic normality. We apply the proposed model to one overdispersed real example and one underdispersed real example, respectively, which indicates that the proposed methodology performs better than other conventional model-based methods in the literature.  相似文献   

18.
A promising area of research in fuzzy control is the model-based fuzzy controller. At the heart of this approach is a fuzzy relational model of the process to be controlled. Since this model is identified directly from process input-output data it is likely that ‘holes’ will be present in the identified relational model. These holes are real problems when the model is incorporated into a model-based controller since the model will be unable to make any predictions whatsoever if the system drifts into an unknown region. The present work deals with the completeness of the fuzzy relational model which forms the core of the controller. This work proposes a scheme of post-processing to ‘fiil in’ the fuzzy relational model once it has been built and thereby improve its applicability for on-line control. A comparative study of the post-processed model and conventional relational model is presented for Box-Jenkins data identification system and a real-time, highly non-linear application of pH control identification.  相似文献   

19.
Bayesian approaches to prediction and the assessment of predictive uncertainty in generalized linear models are often based on averaging predictions over different models, and this requires methods for accounting for model uncertainty. When there are linear dependencies among potential predictor variables in a generalized linear model, existing Markov chain Monte Carlo algorithms for sampling from the posterior distribution on the model and parameter space in Bayesian variable selection problems may not work well. This article describes a sampling algorithm based on the Swendsen-Wang algorithm for the Ising model, and which works well when the predictors are far from orthogonality. In problems of variable selection for generalized linear models we can index different models by a binary parameter vector, where each binary variable indicates whether or not a given predictor variable is included in the model. The posterior distribution on the model is a distribution on this collection of binary strings, and by thinking of this posterior distribution as a binary spatial field we apply a sampling scheme inspired by the Swendsen-Wang algorithm for the Ising model in order to sample from the model posterior distribution. The algorithm we describe extends a similar algorithm for variable selection problems in linear models. The benefits of the algorithm are demonstrated for both real and simulated data.  相似文献   

20.
This paper proposes a unified semiparametric method for the additive risk model under general biased sampling. By using the estimating equation approach, we propose both estimators of the regression parameters and nonparametric function. An advantage is that our approach is still suitable for the lengthbiased data even without the information of the truncation variable. Meanwhile, large sample properties of the proposed estimators are established, including consistency and asymptotic normality. In addition, the finite sample behavior of the proposed methods and the analysis of three groups of real data are given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号