共查询到20条相似文献,搜索用时 15 毫秒
1.
Annals of the Institute of Statistical Mathematics - In this paper, we propose improved statistical inference and variable selection methods for generalized linear models based on empirical... 相似文献
2.
With high-dimensional data, the number of covariates is considerably larger than the sample size. We propose a sound method for analyzing these data. It performs simultaneously clustering and variable selection. The method is inspired by the plaid model. It may be seen as a multiplicative mixture model that allows for overlapping clustering. Unlike conventional clustering, within this model an observation may be explained by several clusters. This characteristic makes it specially suitable for gene expression data. Parameter estimation is performed with the Monte Carlo expectation maximization algorithm and importance sampling. Using extensive simulations and comparisons with competing methods, we show the advantages of our methodology, in terms of both variable selection and clustering. An application of our approach to the gene expression data of kidney renal cell carcinoma taken from The Cancer Genome Atlas validates some previously identified cancer biomarkers. 相似文献
3.
Extremes - The distributed Hill estimator is a divide-and-conquer algorithm for estimating the extreme value index when data are stored in multiple machines. In applications, estimates based on the... 相似文献
4.
Sándor Bozóki Linda Dezső Attila Poesz József Temesi 《Annals of Operations Research》2013,211(1):511-528
Pairwise comparison (PC) matrices are used in multi-attribute decision problems (MADM) in order to express the preferences of the decision maker. Our research focused on testing various characteristics of PC matrices. In a controlled experiment with university students (N=227) we have obtained 454 PC matrices. The cases have been divided into 18 subgroups according to the key factors to be analyzed. Our team conducted experiments with matrices of different size given from different types of MADM problems. Additionally, the matrix elements have been obtained by different questioning procedures differing in the order of the questions. Results are organized to answer five research questions. Three of them are directly connected to the inconsistency of a PC matrix. Various types of inconsistency indices have been applied. We have found that the type of the problem and the size of the matrix had impact on the inconsistency of the PC matrix. However, we have not found any impact of the questioning order. Incomplete PC matrices played an important role in our research. The decision makers behavioral consistency was as well analyzed in case of incomplete matrices using indicators measuring the deviation from the final order of alternatives and from the final score vector. 相似文献
5.
In this paper, we propose a robust empirical likelihood (REL) inference for the parametric component in a generalized partial linear model (GPLM) with longitudinal data. We make use of bounded scores and leverage-based weights in the auxiliary random vectors to achieve robustness against outliers in both the response and covariates. Simulation studies demonstrate the good performance of our proposed REL method, which is more accurate and efficient than the robust generalized estimating equation (GEE) method (X. He, W.K. Fung, Z.Y. Zhu, Robust estimation in generalized partial linear models for clustered data, Journal of the American Statistical Association 100 (2005) 1176-1184). The proposed robust method is also illustrated by analyzing a real data set. 相似文献
6.
Empirical likelihood inference for censored median regression with weighted empirical hazard functions 总被引:1,自引:0,他引:1
In recent years, median regression models have been shown to be useful for analyzing a variety of censored survival data in clinical trials. For inference on the regression parameter, there have been a variety of semiparametric procedures. However, the accuracy of such procedures in terms of coverage probability can be quite low when the censoring rate is heavy. In this paper, based on weighted empirical hazard functions, we apply an empirical likelihood (EL) ratio method to the median regression model with censoring data and derive the limiting distribution of EL ratio. Confidence region for the regression parameter can then be obtained accordingly. Furthermore, we compared the proposed method with the standard method through extensive simulation studies. The proposed method almost always outperformed the existing method. 相似文献
7.
Dirk Temme 《Computational Statistics》2006,21(1):151-182
Summary Graphical methods for the discovery of structural models from observational data provide interesting tools for applied researchers.
A problem often faced in empirical studies is the presence of latent confounders which produce associations between the observed
variables. Although causal inference algorithms exist which can cope with latent confounders, empirical applications assessing
the performance of such algorithms are largely lacking. In this study, we apply the constraint based Fast Causal Inference
algorithm implemented in the software program TETRAD on a data set containing strategy and performance information about 608
business units. In contrast to the informative and reasonable results for the impirical data, simulation findings reveal problems
in recovering some of the structural relations. 相似文献
8.
《Mathematical and Computer Modelling》1995,21(7):29-42
Two models for the dynamics of an epidemic of S-I-R type are described in which the active population is randomly screened. Infectivity is not required to be constant in one of them. The positive screened individuals move into the class of “removed” together with the immune. Global existence and uniqueness results are established. 相似文献
9.
In this paper, we consider the semiparametric regression model for longitudinal data. Due to the correlation within groups, a generalized empirical log-likelihood ratio statistic for the unknown parameters in the model is suggested by introducing the working covariance matrix. It is proved that the proposed statistic is asymptotically standard chi-squared under some suitable conditions, and hence it can be used to construct the confidence regions of the parameters. A simulation study is conducted to compare the proposed method with the generalized least squares method in terms of coverage accuracy and average lengths of the confidence intervals. 相似文献
10.
Julien Villemonteix Emmanuel Vazquez Maryan Sidorkiewicz Eric Walter 《Journal of Global Optimization》2009,43(2-3):373-389
In many global optimization problems motivated by engineering applications, the number of function evaluations is severely limited by time or cost. To ensure that each of these evaluations usefully contributes to the localization of good candidates for the role of global minimizer, a stochastic model of the function can be built to conduct a sequential choice of evaluation points. Based on Gaussian processes and Kriging, the authors have recently introduced the informational approach to global optimization (IAGO) which provides a one-step optimal choice of evaluation points in terms of reduction of uncertainty on the location of the minimizers. To do so, the probability density of the minimizers is approximated using conditional simulations of the Gaussian process model behind Kriging. In this paper, an empirical comparison between the underlying sampling criterion called conditional minimizer entropy (CME) and the standard expected improvement sampling criterion (EI) is presented. Classical test functions are used as well as sample paths of the Gaussian model and an industrial application. They show the interest of the CME sampling criterion in terms of evaluation savings. 相似文献
11.
12.
关于高维、相依和不完全数据的统计分析 总被引:4,自引:0,他引:4
本文包括3部分:第1部分简要讲述统计学的发展和面临的挑战,说明高维,相依和不完全数据的统计分析是在现代科学技术和社会经济中普遍存在的困难问题,第2部分概述我国学者在相关领域所取得的成果:最后谈谈对该领域当前研究趋势的个人认识。 相似文献
13.
This paper proposes an estimator combining empirical likelihood (EL) and the generalized method of moments (GMM) by allowing the sample average moment vector to deviate from zero and the sample weights to deviate from n−1. The new estimator may be adjusted through free parameter δ∈(0,1) with GMM behavior attained as δ?0 and EL as δ?1. When the sample size is small and the number of moment conditions is large, the parameter space under which the EL estimator is defined may be restricted at or near the population parameter value. The support of the parameter space for the new estimator may be adjusted through δ. The new estimator performs well in Monte Carlo simulations. 相似文献
14.
R. A. Bandaliev 《Mathematical Notes》2008,84(3-4):303-313
The main goal in this paper is to obtain an analog of the generalized Minkowski inequality and an embedding between the Lebesgue spaces with mixed norm and with variable summability exponent. 相似文献
15.
16.
Variational Bayesian methods aim to address some of the weaknesses (computation time, storage costs and convergence monitoring) of mainstream Markov chain Monte Carlo based inference at the cost of a biased but more tractable approximation to the posterior distribution. We investigate the performance of variational approximations in the context of the mixed logit model, which is one of the most used models for discrete choice data. A typical treatment using the variational Bayesian methodology is hindered by the fact that the expectation of the so called log-sum-exponential function has no explicit expression. Therefore additional approximations are required to maintain tractability. In this paper we compare seven different possible bounds or approximations. We found that quadratic bounds are not sufficiently accurate. A recently proposed non-quadratic bound did perform well. We also found that the Taylor series approximation used in a previous study of variational Bayes for mixed logit models is only accurate for specific settings. Our proposed approximation based on quasi Monte Carlo sampling performed consistently well across all simulation settings while remaining computationally tractable. 相似文献
17.
18.
《European Journal of Operational Research》2005,164(3):760-777
We consider the multiattribute design problem (MADP) which contains a considerable number of alternatives, resulting from the combination of a limited number of discrete levels of several quantitative and/or qualitative attributes. In order to solve such problems, the preferences of individual decision makers have to be measured. Though a considerable number of methods is available from different research areas, only a subset is applicable to MADP.In this paper, we report on an empirical study which considered the problem of designing a university and involved more than 300 respondents. Because of this large-scale design, we performed a paper-and-pencil investigation and selected methods which could concisely be applied in such a setting: the analytic hierarchy process (AHP) and the conjoint analysis (CA).The results show that both methods give useful models of the respondents' preferences. However, inspecting the utility functions determined in detail reveals considerable discrepancies between them. Most of the measures used for comparison indicate AHP to be the better choice for the special decision situation considered. In order to get a more general recommendation, we categorize different types of MADP and discuss the applicability of AHP and CA. 相似文献
19.
Many researchers see the need for reject inference in credit scoring models to come from a sample selection problem whereby a missing variable results in omitted variable bias. Alternatively, practitioners often see the problem as one of missing data where the relationship in the new model is biased because the behaviour of the omitted cases differs from that of those who make up the sample for a new model. To attempt to correct for this, differential weights are applied to the new cases. The aim of this paper is to see if the use of both a Heckman style sample selection model and the use of sampling weights, together, will improve predictive performance compared with either technique used alone. This paper will use a sample of applicants in which virtually every applicant was accepted. This allows us to compare the actual performance of each model with the performance of models which are based only on accepted cases. 相似文献
20.
We give conditions onH, a continuous and bounded real function inR
3, to obtain at least two solutions for the problem (Dir) below.H can be far from being constant in the sense of [9]. Our motivation is a better understanding of the Plateau problem for the
prescribed mean curvature equation. 相似文献