首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
针对现实生活中大量数据存在偏斜的情况,构建偏正态数据下的众数回归模型.又加之数据的缺失常有发生,采用插补方法处理缺失数据集,为比较插补效果,考虑对响应变量随机缺失情形进行统计推断研究.利用高斯牛顿迭代法给出众数回归模型参数的极大似然估计,比较该模型在均值插补,回归插补,众数插补三种插补条件下的插补效果.随机模拟和实例分...  相似文献   

2.
基于空间自回归模型的缺失值插补方法   总被引:2,自引:0,他引:2  
本文研究来自于区域的截面数据中缺失值的插补问题,讨论了当数据中存在空间相关时,空间自回归模型的建立以及利用其对缺失值进行插补的方法,并根据实际数据,通过建立模型给出插补结果。  相似文献   

3.
数据缺失是众多影响数据质量的因素中最常见的一种.若缺失数据处理不当,将直接影响分析结果的可靠性,进而达不到分析的目的.本文针对随机缺失偏正态数据,研究了偏正态众数混合专家模型的参数估计.将众数回归插补与聚类相结合,提出分层众数回归插补方法.利用机器学习插补和统计学插补的方法,进一步比较研究三种机器学习插补方法:支持向量机插补、随机森林插补和神经网络插补,三种统计学插补方法:分层均值插补、众数回归插补和分层众数回归插补的缺失数据处理效果.通过Monte Carlo模拟和实例分析结果表明,分层众数回归插补的优良性.  相似文献   

4.
在海量征信数据的背景下,为降低缺失数据插补的计算成本,提出收缩近邻插补方法.收缩近邻方法通过三阶段完成数据插补,第一阶段基于样本和变量的缺失比例计算入样概率,通过不等概抽样完成数据的收缩,第二阶段基于样本间距离,选取与缺失样本近邻的样本组成训练集,第三阶段建立随机森林模型进行迭代插补.利用Australian数据集和中国各银行数据集进行模拟研究,结果表明在确保一定插补精度的情况下,收缩近邻方法较大程度减少了计算量.  相似文献   

5.
本文研究缺失偏t正态数据下线性回归模型的参数估计问题,针对缺失偏t正态数据,为使样本分布更加接近真实分布,改善模型的回归系数、尺度参数、偏度参数和自由度参数的估计效果,提高参数估计的稳定性,提出一种适合缺失偏t正态数据下线性回归模型的修正随机回归插补方法.通过随机模拟和实例研究,同随机回归插补,多重随机回归插补方法比较,结果表明所提出的修正随机回归插补方法是有效可行的.  相似文献   

6.
在经济领域和工业产品质量改进试验中,对均值和散度同时建模十分必要;在数据采集过程中,时常会遇到数据缺失问题.文章基于上述两点,研究缺失数据下的双重广义线性模型的参数估计,采用最近距离插补和反距离加权插补对缺失数据进行处理,并应用最大扩展拟似然估计和最大伪似然估计两种估计方法对未知参数进行估计.随机模拟和实例结果表明,该模型和所应用的方法是有用和有效的.  相似文献   

7.
数据缺失是实际数据分析中一个常见的问题.文章将逆概率加权方法与插补方法结合,提出了一种Mallows模型平均方法以处理数据缺失问题,并证明了该方法得到的估计量在实现最小平方误差的意义下能渐近地达到最优.相比于传统的逆概率加权方法,文章的方法不仅可以充分利用观测信息,并且能够应用于非随机缺失的情形.相比于完全基于插补的方法,文章的方法继承了插补方法的一些优势,同时能够避免因错误地插补较大的数据块而产生的偏差.通过数值模拟,首先验证了三种简单的插补方法满足渐近最优性成立的条件,之后将文章提出的Mallows模型平均方法与已有的应用于缺失数据的模型平均方法进行比较,结果表明,所提出的新方法在大多数情况下优于已有的其它模型平均方法.最后,将新方法应用于平均寿命数据,实证结果进一步表明新方法较已有模型平均方法更为稳健.  相似文献   

8.
《数理统计与管理》2015,(4):621-627
基于正态分布提出了缺失数据下联合均值与方差模型,在响应变量随机缺失下研究了该模型均值插补、回归插补和随机回归插补三种插补方法的参数估计,通过数据模拟和实例研究结果比较表明,随机回归插补方法是三种插补方法中最有用和有效的。  相似文献   

9.
主要考虑线性模型在自变量测量含误差以及因变量缺失情况下的估计问题.对于模型中的回归系数,我们基于最小二乘方法提出了两类估计,其中一类估计只由完整观测数据构成,而另外一类估计利用的则是利用简单插补方法构造的完整数据.证明了这两类估计是渐近正态性的.  相似文献   

10.
本文主要讨论了响应数据缺失时基于无偏估计方程的分位数估计.本文提出了两种非参光滑技术的插补(imputation)方法,一种是整体非参核插补法,另一种是局部多重插补法.我们可以利用这两种方法构造渐近无偏估计方程.通过该缺失数据下的估计方程,我们可以利用常用的估计方法对未知分位数进行统计推断.本文证明了该方法下的分位数估计具有相合性和渐近正态性.  相似文献   

11.
针对指标数据残缺的动态评价问题,提出了一种基于残缺数据的动态随机算法。首先依据时间维度上的分布情况将残缺值分为两类:离散型与连续型,并提出了对应的补足方法;然后在数据补足完整的基础上,利用随机模拟技术,计算优胜度矩阵,并推导出被评价对象之间的可能性排序。该算法避免了评价对象之间排序的绝对性,在对实际问题的解释方面具有较大弹性。最后,通过一个算例对该算法进行详细说明。  相似文献   

12.
关菲  周艺  张晗 《运筹与管理》2022,31(11):9-14
协同过滤推荐算法是目前个性化推荐系统中应用比较广泛的一种算法。然而,它在处理数据稀疏性、可扩展性等方面存在一定不足。针对数据稀疏性问题,本文首先基于Slope One算法对初始的评分矩阵进行缺失值填充,其次利用基于K-means聚类的协同过滤算法预测目标用户的评分,并结合MovieLens数据集给出了相关对比实验;针对扩展性问题,本文首先提出了一种基于中心聚集参数的改进K-means算法,其次,给出了基于中心聚集参数改进K-means的协同过滤推荐算法流程,并结合MovieLens数据集设计了相关对比实验。实验结果表明,本文所提方法推荐精度均得到显著提高,数据稀疏性和扩展性问题得到了有效改善。因此,本文的研究结论不仅可进一步丰富协同过滤推荐算法的现有理论成果,还可以为提高推荐系统的精度提供理论依据和决策参考。  相似文献   

13.
Maximum likelihood estimation in finite mixture distributions is typically approached as an incomplete data problem to allow application of the expectation-maximization (EM) algorithm. In its general formulation, the EM algorithm involves the notion of a complete data space, in which the observed measurements and incomplete data are embedded. An advantage is that many difficult estimation problems are facilitated when viewed in this way. One drawback is that the simultaneous update used by standard EM requires overly informative complete data spaces, which leads to slow convergence in some situations. In the incomplete data context, it has been shown that the use of less informative complete data spaces, or equivalently smaller missing data spaces, can lead to faster convergence without sacrifying simplicity. However, in the mixture case, little progress has been made in speeding up EM. In this article we propose a component-wise EM for mixtures. It uses, at each iteration, the smallest admissible missing data space by intrinsically decoupling the parameter updates. Monotonicity is maintained, although the estimated proportions may not sum to one during the course of the iteration. However, we prove that the mixing proportions will satisfy this constraint upon convergence. Our proof of convergence relies on the interpretation of our procedure as a proximal point algorithm. For performance comparison, we consider standard EM as well as two other algorithms based on missing data space reduction, namely the SAGE and AECME algorithms. We provide adaptations of these general procedures to the mixture case. We also consider the ECME algorithm, which is not a data augmentation scheme but still aims at accelerating EM. Our numerical experiments illustrate the advantages of the component-wise EM algorithm relative to these other methods.  相似文献   

14.
多元$t$分布数据的局部影响分析   总被引:4,自引:0,他引:4       下载免费PDF全文
对于多元$t$分布数据, 直接应用其概率密度进行影响分析是困难的\bd 本文通过引入服从Gamma分布的权重, 将其表示为特定多元正态分布的混合\bd 在此基础上, 进而将权重视为缺失数据, 引入EM算法; 从而利用基于完全数据似然函数的条件期望进行局部影响分析\bd 本文进一步系统研究了加权扰动模型下的局部影响分析, 得到了相应的诊断统计量; 并通过两个实例说明了这种方法的有效性.  相似文献   

15.
通过添加部分缺失寿命变量数据,得到了删失截断情形下失效率变点模型相对简单的似然函数.讨论了所添加缺失数据变量的概率分布和随机抽样方法.利用Monte Carlo EM算法对未知参数进行了迭代.结合Metropolis-Hastings算法对参数的满条件分布进行了Gibbs抽样,基于Gibbs样本对参数进行估计,详细介绍了MCMC方法的实施步骤.随机模拟试验的结果表明各参数Bayes估计的精度较高.  相似文献   

16.
How to recover missing data from an incomplete samples is a fundamental problem in mathematics and it has wide range of applications in image analysis and processing. Although many existing methods, e.g. various data smoothing methods and PDE approaches, are available in the literature, there is always a need to find new methods leading to the best solution according to various cost functionals. In this paper, we propose an iterative algorithm based on tight framelets for image recovery from incomplete observed data. The algorithm is motivated from our framelet algorithm used in high-resolution image reconstruction and it exploits the redundance in tight framelet systems. We prove the convergence of the algorithm and also give its convergence factor. Furthermore, we derive the minimization properties of the algorithm and explore the roles of the redundancy of tight framelet systems. As an illustration of the effectiveness of the algorithm, we give an application of it in impulse noise removal.  相似文献   

17.
This paper presents a decomposition for the posterior distribution of the covarianee matrix of normal models under a family of prior distributions when missing data are ignorable and monotone. This decomposition is an extension of Bartlett′s decomposition of the Wishart distribution to monotone missing data. It is not only theoretically interesting but also practically useful. First, with monotone missing data, it allows more efficient drawing of parameters from the posterior distribution than the factorized likelihood approach. Furthermore, with nonmonotone missing data, it allows for a very efficient monotone date augmentation algorithm and thereby multiple imputation or the missing data needed to create a monotone pattern.  相似文献   

18.
In this paper, we carry out an in-depth theoretical investigation for inference with missing response and covariate data for general regression models. We assume that the missing data are missing at random (MAR) or missing completely at random (MCAR) throughout. Previous theoretical investigations in the literature have focused only on missing covariates or missing responses, but not both. Here, we consider theoretical properties of the estimates under three different estimation settings: complete case (CC) analysis, a complete response (CR) analysis that involves an analysis of those subjects with only completely observed responses, and the all case (AC) analysis, which is an analysis based on all of the cases. Under each scenario, we derive general expressions for the likelihood and devise estimation schemes based on the EM algorithm. We carry out a theoretical investigation of the three estimation methods in the normal linear model and analytically characterize the loss of information for each method, as well as derive and compare the asymptotic variances for each method assuming the missing data are MAR or MCAR. In addition, a theoretical investigation of bias for the CC method is also carried out. A simulation study and real dataset are given to illustrate the methodology.  相似文献   

19.
《Discrete Applied Mathematics》2007,155(6-7):788-805
Computational methods for inferring haplotype information from genotype data are used in studying the association between genomic variation and medical condition. Recently, Gusfield proposed a haplotype inference method that is based on perfect phylogeny principles. A fundamental problem arises when one tries to apply this approach in the presence of missing genotype data, which is common in practice. We show that the resulting theoretical problem is NP-hard even in very restricted cases. To cope with missing data, we introduce a variant of haplotyping via perfect phylogeny in which a path phylogeny is sought. Searching for perfect path phylogenies is strongly motivated by the characteristics of human genotype data: 70% of real instances that admit a perfect phylogeny also admit a perfect path phylogeny. Our main result is a fixed-parameter algorithm for haplotyping with missing data via perfect path phylogenies. We also present a simple linear-time algorithm for the problem on complete data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号