首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
成分数据具有非常复杂的数学性质,很多传统的统计分析方法对其是失效的,因此,在研究中必须采用特殊处理和专门技术.着重讨论了成分数据相关系数的计算方法,由于普通数据的相关系数计算方法只适用于两组单变量数据,而传统的典型相关分析又鉴于成分数据的特殊性质而不能直接使用,故结合logratio变换和典型相关分析技术,提出了一种针对成分数据的相关系数计算方法,成功地解决了这一问题.  相似文献   

2.
This paper presents an overview of methods for the analysis of data structured in blocks of variables or in groups of individuals. More specifically, regularized generalized canonical correlation analysis (RGCCA), which is a unifying approach for multiblock data analysis, is extended to be also a unifying tool for multigroup data analysis. The versatility and usefulness of our approach is illustrated on two real datasets.  相似文献   

3.
The problem of missing values is common in statistical analysis. One approach to deal with missing values is to delete the incomplete cases from the data set. This approach may disregard valuable information, especially in small samples. An alternative approach is to reconstruct the missing values using the information in the data set. The major purpose of this paper is to investigate how a neural network approach performs compared to statistical techniques for reconstructing missing values. The backpropagation algorithm is used as the learning method to reconstruct missing values. The results of back-propagation are compared with results from two methods, viz., (1) using averages, and (2) using iterative regression analysis, to compute missing values. Experimental results show that backpropagation consistently outperforms other methods in both the training and the test data sets, and suggest that the neural network approach is a useful tool for reconstructing missing values in multivariate analysis.  相似文献   

4.
Distribution of the canonical correlation matrix   总被引:1,自引:0,他引:1  
Summary Generalized canonical correlation matrix is associated with canonical correlation analysis, multivariate analysis of variance, a large variety of statistical tests and regression problems. In this paper two methods of deriving the distribution are, given and the exact distribution is given in an elegant form. The techniques of derivation are applicable to all versions of the generalized canonical correlation matrices, nonnull distributions in generalized analysis of variance problems and also they give rise to a simpler derivation of the distribution, of the multiple correlation coefficient.  相似文献   

5.
Multiple imputation (MI) has become a standard statistical technique for dealing with missing values. The CDC Anthrax Vaccine Research Program (AVRP) dataset created new challenges for MI due to the large number of variables of different types and the limited sample size. A common method for imputing missing data in such complex studies is to specify, for each of J variables with missing values, a univariate conditional distribution given all other variables, and then to draw imputations by iterating over the J conditional distributions. Such fully conditional imputation strategies have the theoretical drawback that the conditional distributions may be incompatible. When the missingness pattern is monotone, a theoretically valid approach is to specify, for each variable with missing values, a conditional distribution given the variables with fewer or the same number of missing values and sequentially draw from these distributions. In this article, we propose the “multiple imputation by ordered monotone blocks” approach, which combines these two basic approaches by decomposing any missingness pattern into a collection of smaller “constructed” monotone missingness patterns, and iterating. We apply this strategy to impute the missing data in the AVRP interim data. Supplemental materials, including all source code and a synthetic example dataset, are available online.  相似文献   

6.
Missing data mechanism often depends on the values of the responses, which leads to nonignorable nonresponses. In such a situation, inference based on approaches that ignore the missing data mechanism could not be valid. A crucial step is to model the nature of missingness. We specify a parametric model for missingness mechanism, and then propose a conditional score function approach for estimation. This approach imputes the score function by taking the conditional expectation of the score function for the missing data given the available information. Inference procedure is then followed by replacing unknown terms with the related nonparametric estimators based on the observed data. The proposed score function does not suffer from the non-identifiability problem, and the proposed estimator is shown to be consistent and asymptotically normal. We also construct a confidence region for the parameter of interest using empirical likelihood method. Simulation studies demonstrate that the proposed inference procedure performs well in many settings. We apply the proposed method to a data set from research in a growth hormone and exercise intervention study.  相似文献   

7.
This article presents new computational techniques for multivariate longitudinal or clustered data with missing values. Current methodology for linear mixed-effects models can accommodate imbalance or missing data in a single response variable, but it cannot handle missing values in multiple responses or additional covariates. Applying a multivariate extension of a popular linear mixed-effects model, we create multiple imputations of missing values for subsequent analyses by a straightforward and effective Markov chain Monte Carlo procedure. We also derive and implement a new EM algorithm for parameter estimation which converges more rapidly than traditional EM algorithms because it does not treat the random effects as “missing data,” but integrates them out of the likelihood function analytically. These techniques are illustrated on models for adolescent alcohol use in a large school-based prevention trial.  相似文献   

8.
Dynamics of a system of hard spheres with inelastic collisions is investigated. This system is a model for granular flow. The map induced by a shift along the trajectory does not preserve the volume of the phase space, and the corresponding Jacobian is different from one. A special distribution function is defined as the product of the usual distribution function and the squared Jacobian. For this distribution function, the Liouville equation with boundary condition is derived. A sequence of correlation functions is defined for canonical and grand canonical ensemble. The generalized BBGKY hierarchy and boundary condition are deduced for correlation functions. __________ Published in Ukrains'kyi Matematychnyi Zhurnal, Vol. 57, No. 6, pp. 818–839, June, 2005.  相似文献   

9.
Typically, exact information of the whole subdifferential is not available for intrinsically nonsmooth objective functions such as for marginal functions. Therefore, the semismoothness of the objective function cannot be proved or is even violated. In particular, in these cases standard nonsmooth methods cannot be used. In this paper, we propose a new approach to develop a converging descent method for this class of nonsmooth functions. This approach is based on continuous outer subdifferentials introduced by us. Further, we introduce on this basis a conceptual optimization algorithm and prove its global convergence. This leads to a constructive approach enabling us to create a converging descent method. Within the algorithmic framework, neither semismoothness nor calculation of exact subgradients are required. This is in contrast to other approaches which are usually based on the assumption of semismoothness of the objective function.  相似文献   

10.
An approach to dealing with missing data, both during the design and normal operation of a neuro-fuzzy classifier is presented in this paper. Missing values are processed within a general fuzzy min–max neural network architecture utilising hyperbox fuzzy sets as input data cluster prototypes. An emphasis is put on ways of quantifying the uncertainty which missing data might have caused. This takes a form of classification procedure whose primary objective is the reduction of a number of viable alternatives rather than attempting to produce one winning class without supporting evidence. If required, the ways of selecting the most probable class among the viable alternatives found during the primary classification step, which are based on utilising the data frequency information, are also proposed. The reliability of the classification and the completeness of information is communicated by producing upper and lower classification membership values similar in essence to plausibility and belief measures to be found in the theory of evidence or possibility and necessity values to be found in the fuzzy sets theory. Similarities and differences between the proposed method and various fuzzy, neuro-fuzzy and probabilistic algorithms are also discussed. A number of simulation results for well-known data sets are provided in order to illustrate the properties and performance of the proposed approach.  相似文献   

11.
The nonparametric technique of Data Envelopment Analysis (DEA) has been used to measure technical efficiency. This approach has proven useful because, unlike regression analyses, it allows multiple outputs and does not require a priori functional form specification. DEA does, however, require correct model specification; inclusion of inappropriate variables or omission of relevant variables leads to distortions. The purpose of this paper is to develop an alternative methodology based on canonical correlation to measure technical efficiency for multiple output production correspondences. Using simulated data, the new methodology is compared with DEA. The results indicate that the canonical regression approach outperforms DEA in most cases.  相似文献   

12.
Robust techniques for multivariate statistical methods—such as principal component analysis, canonical correlation analysis, and factor analysis—have been recently constructed. In contrast to the classical approach, these robust techniques are able to resist the effect of outliers. However, there does not yet exist a graphical tool to identify in a comprehensive way the data points that do not obey the model assumptions. Our goal is to construct such graphics based on empirical influence functions. These graphics not only detect the influential points but also classify the observations according to their robust distances. In this way the observations are divided into four different classes which are regular points, nonoutlying influential points, influential outliers, and noninfluential outliers. We thus gain additional insight in the data by detecting different types of deviating observations. Some real data examples will be given to show how these plots can be used in practice.  相似文献   

13.
Mixed-integer quadratic programming   总被引:5,自引:0,他引:5  
This paper considers mixed-integer quadratic programs in which the objective function is quadratic in the integer and in the continuous variables, and the constraints are linear in the variables of both types. The generalized Benders' decomposition is a suitable approach for solving such programs. However, the program does not become more tractable if this method is used, since Benders' cuts are quadratic in the integer variables. A new equivalent formulation that renders the program tractable is developed, under which the dual objective function is linear in the integer variables and the dual constraint set is independent of these variables. Benders' cuts that are derived from the new formulation are linear in the integer variables, and the original problem is decomposed into a series of integer linear master problems and standard quadratic subproblems. The new formulation does not introduce new primary variables or new constraints into the computational steps of the decomposition algorithm.The author wishes to thank two anonymous referees for their helpful comments and suggestions for revising the paper.  相似文献   

14.
In data analysis problems where the data are represented by vectors of real numbers, it is often the case that some of the data-points will have “missing values”, meaning that one or more of the entries of the vector that describes the data-point is not observed. In this paper, we propose a new approach to the imputation of missing binary values. The technique we introduce employs a “similarity measure” introduced by Anthony and Hammer (2006) [1]. We compare experimentally the performance of our technique with ones based on the usual Hamming distance measure and multiple imputation.  相似文献   

15.
Correspondence analysis, a data analytic technique used to study two‐way cross‐classifications, is applied to social relational data. Such data are frequently termed “sociometric” or “network” data. The method allows one to model forms of relational data and types of empirical relationships not easily analyzed using either standard social network methods or common scaling or clustering techniques. In particular, correspondence analysis allows one to model:

—two‐mode networks (rows and columns of a sociomatrix refer to different objects)

—valued relations (e.g. counts, ratings, or frequencies).

In general, the technique provides scale values for row and column units, visual presentation of relationships among rows and columns, and criteria for assessing “dimensionality” or graphical complexity of the data and goodness‐of‐fit to particular models. Correspondence analysis has recently been the subject of research by Goodman, Haberman, and Gilula, who have termed their approach to the problem “canonical analysis” to reflect its similarity to canonical correlation analysis of continuous multivariate data. This generalization links the technique to more standard categorical data analysis models, and provides a much‐needed statistical justificatioa

We review both correspondence and canonical analysis, and present these ideas by analyzing relational data on the 1980 monetary donations from corporations to nonprofit organizations in the Minneapolis St. Paul metropolitan area. We also show how these techniques are related to dyadic independence models, first introduced by Holland, Leinhardt, Fienberg, and Wasserman in the early 1980's. The highlight of this paper is the relationship between correspondence and canonical analysis, and these dyadic independence models, which are designed specifically for relational data. The paper concludes with a discussion of this relationship, and some data analyses that illustrate the fart that correspondence analysis models can be used as approximate dyadic independence models.  相似文献   

16.
The available methods to handle missing values in principal component analysis only provide point estimates of the parameters (axes and components) and estimates of the missing values. To take into account the variability due to missing values a multiple imputation method is proposed. First a method to generate multiple imputed data sets from a principal component analysis model is defined. Then, two ways to visualize the uncertainty due to missing values onto the principal component analysis results are described. The first one consists in projecting the imputed data sets onto a reference configuration as supplementary elements to assess the stability of the individuals (respectively of the variables). The second one consists in performing a principal component analysis on each imputed data set and fitting each obtained configuration onto the reference one with Procrustes rotation. The latter strategy allows to assess the variability of the principal component analysis parameters induced by the missing values. The methodology is then evaluated from a real data set.  相似文献   

17.
A novel interval set approach is proposed in this paper to induce classification rules from incomplete information table, in which an interval-set-based model to represent the uncertain concepts is presented. The extensions of the concepts in incomplete information table are represented by interval sets, which regulate the upper and lower bounds of the uncertain concepts. Interval set operations are discussed, and the connectives of concepts are represented by the operations on interval sets. Certain inclusion, possible inclusion, and weak inclusion relations between interval sets are presented, which are introduced to induce strong rules and weak rules from incomplete information table. The related properties of the inclusion relations are proved. It is concluded that the strong rules are always true whatever the missing values may be, while the weak rules may be true when missing values are replaced by some certain known values. Moreover, a confidence function is defined to evaluate the weak rule. The proposed approach presents a new view on rule induction from incomplete data based on interval set.  相似文献   

18.
This research attempts to solve the problem of dealing with missing data via the interface of Data Envelopment Analysis (DEA) and human behavior. Missing data is under continuing discussion in various research fields, especially those highly dependent on data. In practice and research, some necessary data may not be obtained in many cases, for example, procedural factors, lack of needed responses, etc. Thus the question of how to deal with missing data is raised. In this paper, modified DEA models are developed to estimate the appropriate value of missing data in its interval, based on DEA and Inter-dimensional Similarity Halo Effect. The estimated value of missing data is determined by the General Impression of original DEA efficiency. To evaluate the effectiveness of this method, the impact factor is proposed. In addition, the advantages of the proposed approach are illustrated in comparison with previous methods.  相似文献   

19.
In this paper, we propose a new controller design approach for a special class of nonlinear systems. The controller design is simple and systematic and is based on the construction of a similarity transformation which is used in finding canonical forms for linear controllable systems. In addition, the design does not require any coordinate transformation and is directly applied on the original structure of the system. The performance of the proposed controller is shown via a simulation example dealing with a typical synchronous generator  相似文献   

20.
Summary A method is described for fitting cubic smoothing splines to samples of equally spaced data. The method is based on the canonical decomposition of the linear transformation from the data to the fitted values. Techniques for estimating the required amount of smoothing, including generalized cross validation, may easily be integrated into the calculations. For large samples the method is fast and does not require prohibitively large data storage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号