首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
Several techniques for resampling dependent data have already been proposed. In this paper we use missing values techniques to modify the moving blocks jackknife and bootstrap. More specifically, we consider the blocks of deleted observations in the blockwise jackknife as missing data which are recovered by missing values estimates incorporating the observation dependence structure. Thus, we estimate the variance of a statistic as a weighted sample variance of the statistic evaluated in a “complete” series. Consistency of the variance and the distribution estimators of the sample mean are established. Also, we apply the missing values approach to the blockwise bootstrap by including some missing observations among two consecutive blocks and we demonstrate the consistency of the variance and the distribution estimators of the sample mean. Finally, we present the results of an extensive Monte Carlo study to evaluate the performance of these methods for finite sample sizes, showing that our proposal provides variance estimates for several time series statistics with smaller mean squared error than previous procedures.  相似文献   

2.
The problem of missing values is common in statistical analysis. One approach to deal with missing values is to delete the incomplete cases from the data set. This approach may disregard valuable information, especially in small samples. An alternative approach is to reconstruct the missing values using the information in the data set. The major purpose of this paper is to investigate how a neural network approach performs compared to statistical techniques for reconstructing missing values. The backpropagation algorithm is used as the learning method to reconstruct missing values. The results of back-propagation are compared with results from two methods, viz., (1) using averages, and (2) using iterative regression analysis, to compute missing values. Experimental results show that backpropagation consistently outperforms other methods in both the training and the test data sets, and suggest that the neural network approach is a useful tool for reconstructing missing values in multivariate analysis.  相似文献   

3.
This paper formulates a nonlinear time series model which encompasses several standard nonlinear models for time series as special cases. It also offers two methods for estimating missing observations, one using prediction and fixed point smoothing algorithms and the other using optimal estimating equation theory. Recursive estimation of missing observations in an autoregressive conditionally heteroscedastic (ARCH) model and the estimation of missing observations in a linear time series model are shown to be special cases. Construction of optimal estimates of missing observations using estimating equation theory is discussed and applied to some nonlinear models.Authors were supported in part by a grant from the Natural Sciences and Engineering Research Council of Canada.  相似文献   

4.
We develop two methods for imputing missing values in regression situations. We examine the standard fixed-effects linear-regression model Y = X β + ?, where the regressors X are fixed and ? is the error term. This research focuses on the problem of missing X values. A particular component of market-share analysis has motivated this research where the price and other promotional instruments of each brand are allowed to have their own impact on the total sales volume in a consumer-products category. When a brand is not distributed in a particular week, only a few of the many measures occurring in that observation are missing. ‘What values should be imputed for the missing measures?’ is the central question this paper addresses. This context creates a unique problem in the missing-data literature, i.e. there is no true value for the missing measure. Using influence functions, from robust statistics we develop two loss functions, each of which is a function of the missing and existing X values. These loss functions turn out to be sums of ratios of low-order polynomials. The minimization of either loss function is an unconstrained non-linear-optimization problem. The solution to this non-linear optimization leads to imputed values that have minimal influence on the estimates of the parameters of the regression model. Estimates using the method for replacing missing values are compared with estimates obtained via some conventional methods.  相似文献   

5.
在时间序列建模过程中,数据的缺失会极大地影响模型的准确性,因此对缺失数据的填补尤为重要.选取北京市空气质量指数(AQI)数据。将其随机缺失10%.分别利用EM算法和polyfit直线拟合的方法对缺失值插补,补全数据后建立ARMA模型并作预测分析.结果表明,利用polyfit函数插补法具有较好的结果.  相似文献   

6.
Dealing with the missing values is an important object in the field of data mining. Besides, the properties of compositional data lead to that traditional imputation methods may get undesirable result if they are directly used in this type of data. As a result, the management of missing values in compositional data is of great significant. To solve this problem, this paper uses the relationship between compositional data and Euclidean data, and proposes a new method based on Random Forest for missing values in compositional data. This method has been implemented and evaluated using both simulated and real-world databases, then the experimental results reveal that the new imputation method can be widely used in various types of data sets and has good performance than other methods.  相似文献   

7.
Statistical Inference for Stochastic Processes - The problem of linear interpolation in the context of a multivariate time series having multiple (possibly non-consecutive) missing values is...  相似文献   

8.
A new approach is proposed for forecasting a time series with multiple seasonal patterns. A state space model is developed for the series using the innovations approach which enables us to develop explicit models for both additive and multiplicative seasonality. Parameter estimates may be obtained using methods from exponential smoothing. The proposed model is used to examine hourly and daily patterns in hourly data for both utility loads and traffic flows. Our formulation provides a model for several existing seasonal methods and also provides new options, which result in superior forecasting performance over a range of prediction horizons. In particular, seasonal components can be updated more frequently than once during a seasonal cycle. The approach is likely to be useful in a wide range of applications involving both high and low frequency data, and it handles missing values in a straightforward manner.  相似文献   

9.
The available methods to handle missing values in principal component analysis only provide point estimates of the parameters (axes and components) and estimates of the missing values. To take into account the variability due to missing values a multiple imputation method is proposed. First a method to generate multiple imputed data sets from a principal component analysis model is defined. Then, two ways to visualize the uncertainty due to missing values onto the principal component analysis results are described. The first one consists in projecting the imputed data sets onto a reference configuration as supplementary elements to assess the stability of the individuals (respectively of the variables). The second one consists in performing a principal component analysis on each imputed data set and fitting each obtained configuration onto the reference one with Procrustes rotation. The latter strategy allows to assess the variability of the principal component analysis parameters induced by the missing values. The methodology is then evaluated from a real data set.  相似文献   

10.
We establish computationally flexible methods and algorithms for the analysis of multivariate skew normal models when missing values occur in the data. To facilitate the computation and simplify the theoretic derivation, two auxiliary permutation matrices are incorporated into the model for the determination of observed and missing components of each observation. Under missing at random mechanisms, we formulate an analytically simple ECM algorithm for calculating parameter estimation and retrieving each missing value with a single-valued imputation. Gibbs sampling is used to perform a Bayesian inference on model parameters and to create multiple imputations for missing values. The proposed methodologies are illustrated through a real data set and comparisons are made with those obtained from fitting the normal counterparts.  相似文献   

11.
Generalized canonical correlation analysis is a versatile technique that allows the joint analysis of several sets of data matrices. The generalized canonical correlation analysis solution can be obtained through an eigenequation and distributional assumptions are not required. When dealing with multiple set data, the situation frequently occurs that some values are missing. In this paper, two new methods for dealing with missing values in generalized canonical correlation analysis are introduced. The first approach, which does not require iterations, is a generalization of the Test Equating method available for principal component analysis. In the second approach, missing values are imputed in such a way that the generalized canonical correlation analysis objective function does not increase in subsequent steps. Convergence is achieved when the value of the objective function remains constant. By means of a simulation study, we assess the performance of the new methods. We compare the results with those of two available methods; the missing-data passive method, introduced in Gifi’s homogeneity analysis framework, and the GENCOM algorithm developed by Green and Carroll. An application using world bank data is used to illustrate the proposed methods.  相似文献   

12.
Exploring incomplete data using visualization techniques   总被引:1,自引:0,他引:1  
Visualization of incomplete data allows to simultaneously explore the data and the structure of missing values. This is helpful for learning about the distribution of the incomplete information in the data, and to identify possible structures of the missing values and their relation to the available information. The main goal of this contribution is to stress the importance of exploring missing values using visualization methods and to present a collection of such visualization techniques for incomplete data, all of which are implemented in the ${{\sf R}}$ package VIM. Providing such functionality for this widely used statistical environment, visualization of missing values, imputation and data analysis can all be done from within ${{\sf R}}$ without the need of additional software.  相似文献   

13.
Summary Kernel estimators of conditional expectations are adapted for use in the analysis of stationary time series containing missing observations. Estimators of conditional expectations at fixed points are shown to have an asymptotic distribution with a relatively simple variance-covariance structure. The kernel method is also used to interpolate missing observations, and is shown to converge in probability to the least squares predictor. The results are established under the strong mixing condition and moment conditions, and the methods are applied to a real data set.  相似文献   

14.
Selecting relevant features to make a decision and expressing the relationships between these features is not a simple task. The decision maker must precisely define the alternatives and criteria which are more important for the decision making process. The Analytic Hierarchy Process (AHP) uses hierarchical structures to facilitate this process. The comparison is realized using pairwise matrices, which are filled in according to the decision maker judgments. Subsequently, matrix consistency is tested and priorities are obtained by calculating the matrix principal eigenvector. Given an incomplete pairwise matrix, two procedures must be performed: first, it must be completed with suitable values for the missing entries and, second, the matrix must be improved until a satisfactory level of consistency is reached. Several methods are used to fill in missing entries for incomplete pairwise matrices with correct comparison values. Additionally, once pairwise matrices are complete and if comparison judgments between pairs are not consistent, some methods must be used to improve the matrix consistency and, therefore, to obtain coherent results. In this paper a model based on the Multi-Layer Perceptron (MLP) neural network is presented. Given an AHP pairwise matrix, this model is capable of completing missing values and improving the matrix consistency at the same time.  相似文献   

15.
Multi-step prediction is still an open challenge in time series prediction. Moreover, practical observations are often incomplete because of sensor failure or outliers causing missing data. Therefore, it is very important to carry out research on multi-step prediction of time series with random missing data. Based on nonlinear filters and multilayer perceptron artificial neural networks (ANNs), one novel approach for multi-step prediction of time series with random missing data is proposed in the study. With the basis of original nonlinear filters which do not consider the missing data, first we obtain the generalized nonlinear filters by using a sequence of independent Bernoulli random variables to model random interruptions. Then the multi-step prediction model of time series with random missing data, which can be fit for the online training of generalized nonlinear filters, is established by using the ANN’s weights to present the state vector and the ANN’s outputs to present the observation equation. The performance between the original nonlinear filters based ANN model for multi-step prediction of time series with missing data and the generalized nonlinear filters based ANN model for multi-step prediction of time series with missing data is compared. Numerical results have demonstrated that the generalized nonlinear filters based ANN are proportionally superior to the original nonlinear filters based ANN for multi-step prediction of time series with missing data.  相似文献   

16.
Many modern approaches of time series analysis belong to the class of methods based on approximating high‐dimensional spaces by low‐dimensional subspaces. A typical method would embed a given time series into a structured matrix and find a low‐dimensional approximation to this structured matrix. The purpose of this paper is twofold: (i) to establish a correspondence between a class of SVD‐compatible matrix norms on the space of Hankel matrices and weighted vector norms (and provide methods to construct this correspondence) and (ii) to motivate the importance of this for problems in time series analysis. Examples are provided to demonstrate the merits of judiciously selecting weights on imputing missing data and forecasting in time series. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

17.
Multiple imputation (MI) methods have been widely applied in economic applications as a robust statistical way to incorporate data where some observations have missing values for some variables. However in stochastic frontier analysis (SFA), application of these techniques has been sparse and the case for such models has not received attention in the appropriate academic literature. This paper fills this gap and explores the robust properties of MI within the stochastic frontier context. From a methodological perspective, we depart from the standard MI literature by demonstrating, conceptually and through simulation, that it is not appropriate to use imputations of the dependent variable within the SFA modelling, although they can be useful to predict the values of missing explanatory variables. Fundamentally, this is because efficiency analysis involves decomposing a residual into noise and inefficiency and as a result any imputation of a dependent variable would be imputing efficiency based on some concept of average inefficiency in the sample. A further contribution that we discuss and illustrate for the first time in the SFA literature, is that using auxiliary variables (outside of those contained in the SFA model) can enhance the imputations of missing values. Our empirical example neatly articulates that often the source of missing data is only a sub-set of components comprising a part of a composite (or complex) measure and that the other parts that are observed are very useful in predicting the value.  相似文献   

18.
A first systematic attempt to use data containing missing values in data envelopment analysis (DEA) is presented. It is formally shown that allowing missing values into the data set can only improve estimation of the best-practice frontier. Technically, DEA can automatically exclude the missing data from the analysis if blank data entries are coded by appropriate numerical values.  相似文献   

19.
This article presents new computational techniques for multivariate longitudinal or clustered data with missing values. Current methodology for linear mixed-effects models can accommodate imbalance or missing data in a single response variable, but it cannot handle missing values in multiple responses or additional covariates. Applying a multivariate extension of a popular linear mixed-effects model, we create multiple imputations of missing values for subsequent analyses by a straightforward and effective Markov chain Monte Carlo procedure. We also derive and implement a new EM algorithm for parameter estimation which converges more rapidly than traditional EM algorithms because it does not treat the random effects as “missing data,” but integrates them out of the likelihood function analytically. These techniques are illustrated on models for adolescent alcohol use in a large school-based prevention trial.  相似文献   

20.
A clustering method is presented for analysing multivariate binary data with missing values. When not all values are observed, Govaert3 has studied the relations between clustering methods and statistical models. The author has shown how the identification of a mixture of Bernoulli distributions with the same parameter for all clusters and for all variables corresponds to a clustering criterion which uses L1 distance characterizing the MNDBIN method (Marchetti8). He first generalized this model by selecting parameters which can depend on variables and finally by selecting parameters which can depend both on variables and on clusters. We use the previous models to derive a clustering method adapted to missing data. This method optimizes a criterion by a standard iterative partitioning algorithm which removes the necessity either to ignore objects or to substitute the missing data. We study several versions of this algorithm and, finally, a brief account is given of the application of this method to some simulated data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号