首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Causal relationships among variables can be depicted by a causal network of these variables. We propose a local structure learning approach for discovering the direct causes and the direct effects of a given target variable. In the approach, we first find the variable set of parents, children, and maybe some descendants (PCD) of the target variable, but generally we cannot distinguish the parents from the children in the PCD of the target variable. Next, to distinguish the causes from the effects of the target variable, we find the PCD of each variable in the PCD of the target variable, and we repeat the process of finding PCDs along the paths starting from the target variable. Without constructing a whole network over all variables, we find only a local structure around the target variable. Theoretically, we show the correctness of the proposed approach under the assumptions of faithfulness, causal sufficiency, and that conditional independencies are correctly checked.  相似文献   

3.
Latent trait models such as item response theory (IRT) hypothesize a functional relationship between an unobservable, or latent, variable and an observable outcome variable. In educational measurement, a discrete item response is usually the observable outcome variable, and the latent variable is associated with an examinee’s trait level (e.g., skill, proficiency). The link between the two variables is called an item response function. This function, defined by a set of item parameters, models the probability of observing a given item response, conditional on a specific trait level. Typically in a measurement setting, neither the item parameters nor the trait levels are known, and so must be estimated from the pattern of observed item responses. Although a maximum likelihood approach can be taken in estimating these parameters, it usually cannot be employed directly. Instead, a method of marginal maximum likelihood (MML) is utilized, via the expectation-maximization (EM) algorithm. Alternating between an expectation (E) step and a maximization (M) step, the EM algorithm assures that the marginal log likelihood function will not decrease after each EM cycle, and will converge to a local maximum. Interestingly, the negative of this marginal log likelihood function is equal to the relative entropy, or Kullback-Leibler divergence, between the conditional distribution of the latent variables given the observable variables and the joint likelihood of the latent and observable variables. With an unconstrained optimization for the M-step proposed here, the EM algorithm as minimization of Kullback-Leibler divergence admits the convergence results due to Csiszár and Tusnády (Statistics & Decisions, 1:205–237, 1984), a consequence of the binomial likelihood common to latent trait models with dichotomous response variables. For this unconstrained optimization, the EM algorithm converges to a global maximum of the marginal log likelihood function, yielding an information bound that permits a fixed point of reference against which models may be tested. A likelihood ratio test between marginal log likelihood functions obtained through constrained and unconstrained M-steps is provided as a means for testing models against this bound. Empirical examples demonstrate the approach.  相似文献   

4.
Summary New Bayesian cohort models designed to resolve the identification problem in cohort analysis are proposed in this paper. At first, the basic cohort model which represents the statistical structure of time-series social survey data in terms of age, period and cohort effects is explained. The logit cohort model for qualitative data from a binomial distribution and the normal-type cohort model for quantitative data from a normal distribution are considered as two special cases of the basic model. In order to overcome the identification problem in cohort analysis, a Bayesian approach is adopted, based on the assumption that the effect parameters change gradually. A Bayesian information criterion ABIC is introduced for the selection of the optimal model. This approach is so flexible that both the logit and the normal-type cohort models can be made applicable, not only to standard cohort tables but also to general cohort tables in which the range of age group is not equal to the interval between periods. The practical utility of the proposed models is demonstrated by analysing two data sets from the literature on cohort analysis. The Institute of Statistical Mathematics  相似文献   

5.
The selection of the branching variable can greatly affect the speed of the branch and bound solution of a mixed-integer or integer linear program. Traditional approaches to branching variable selection rely on estimating the effect of the candidate variables on the objective function. We present a new approach that relies on estimating the impact of the candidate variables on the active constraints in the current LP relaxation. We apply this method to the problem of finding the first feasible solution as quickly as possible. Empirical experiments demonstrate a significant improvement compared to a state-of-the art commercial MIP solver.  相似文献   

6.
The lexicographically-ordered CSP (“lexicographic CSP” or “LO-CSP” for short) combines a simple representation of preferences with the feasibility constraints of ordinary CSPs. Preferences are defined by a total ordering across all assignments, such that a change in assignment to a given variable is more important than any change in assignment to any less important variable. In this paper, we show how this representation can be extended to handle conditional preferences in two ways. In the first, for each conditional preference relation, the parents have higher priority than the children in the original lexicographic ordering. In the second, the relation between parents and children need not correspond to the importance ordering of variables. In this case, by obviating the “overwhelming advantage” effect with respect to the original variables and values, the representational capacity is significantly enhanced. For problems of the first type, any of the algorithms originally devised for ordinary LO-CSPs can also be used when some of the domain orderings are dependent on assignments to “parent” variables. For problems of the second type, algorithms based on lexical orders can be used if the representation is augmented by variables and constraints that link preference orders to assignments. In addition, the branch-and-bound algorithm originally devised for ordinary LO-CSPs can be extended to handle CSPs with conditional domain orderings.  相似文献   

7.
This paper provides an estimation procedure for average treatment effect through a random coefficient dummy endogenous variable model. A leading example of the model is estimating the effect of a training program on earnings. The model is composed of two equations: an outcome equation and a decision equation. Given the linear restriction in outcome and decision equations, Chen (1999) provided a distribution-free estimation procedure under conditional symmetric error distributions. In this paper we extend Chen’s estimator by relaxing the linear index into a nonparametric function, which greatly reduces the risk of model misspecification. A two-step approach is proposed: the first step uses a nonparametric regression estimator for the decision variable, and the second step uses an instrumental variables approach to estimate average treatment effect in the outcome equation. The proposed estimator is shown to be consistent and asymptotically normally distributed. Furthermore, we investigate the finite performance of our estimator by a Monte Carlo study and also use our estimator to study the return of college education in different periods of China. The estimates seem more reasonable than those of other commonly used estimators.  相似文献   

8.
The time minimising assignment problem is the problem of finding an assignment of n jobs to n facilities, one to each, which minimises the total time for completing all the jobs. The usual assumption made in these problems is that all the jobs are commenced simultaneously. In this paper two generalisations of this assumption are considered, and algorithms are presented to solve these general problems. Numerical examples are worked out illustrating the algorithms.  相似文献   

9.
We extend the least angle regression algorithm using the information geometry of dually flat spaces. The extended least angle regression algorithm is used for estimating parameters in generalized linear regression, and it can be also used for selecting explanatory variables. We use the fact that a model manifold of an exponential family is a dually flat space. In estimating parameters, curves corresponding to bisectors in the Euclidean space play an important role. Originally, the least angle regression algorithm is used for estimating parameters and selecting explanatory variables in linear regression. It is an efficient algorithm in the sense that the number of iterations is the same as the number of explanatory variables. We extend the algorithm while keeping this efficiency. However, the extended least angle regression algorithm differs significantly from the original algorithm. The extended least angle regression algorithm reduces one explanatory variable in each iteration while the original algorithm increases one explanatory variable in each iteration. We show results of the extended least angle regression algorithm for two types of datasets. The behavior of the extended least angle regression algorithm is shown. Especially, estimates of parameters become smaller and smaller, and vanish in turn.  相似文献   

10.
This article presents a Markov chain Monte Carlo algorithm for both variable and covariance selection in the context of logistic mixed effects models. This algorithm allows us to sample solely from standard densities with no additional tuning. We apply a stochastic search variable approach to select explanatory variables as well as to determine the structure of the random effects covariance matrix.

Prior determination of explanatory variables and random effects is not a prerequisite because the definite structure is chosen in a data-driven manner in the course of the modeling procedure. To illustrate the method, we give two bank data examples.  相似文献   

11.
The focal problem for centralized multisensor multitarget tracking is the data association problem of partitioning the observations into tracks and false alarms so that an accurate estimate of the true tracks can be recovered. Large classes of these association problems can be formulated as multidimensional assignment problems, which are known to be NP-hard for three dimensions or more. The assignment problems that result from tracking are large scale, sparse and noisy. Solution methods must execute in real-time. The Greedy Randomized Adaptive Local Search Procedure (GRASP) has proven highly effective for solving many classes NP-hard optimization problems. This paper introduces four GRASP implementations for the multidimensional assignment problem, which are combinations of two constructive methods (randomized reduced cost greedy and randomized max regret) and two local search methods (two-assignment-exchange and variable depth exchange). Numerical results are shown for a two random problem classes and one tracking problem class.  相似文献   

12.
This paper discusses the relationship among the total causal effect and local causal effects in a causal chain and identifiability of causal effects. We show a transmission relationship of causal effects in a causal chain. According to the relationship, we give an approach to eliminating confounding bias through controlling for intermediate variables in a causal chain.  相似文献   

13.
Researchers have long struggled to identify causal effects in nonexperimental settings. Many recently proposed strategies assume ignorability of the treatment assignment mechanism and require fitting two models—one for the assignment mechanism and one for the response surface. This article proposes a strategy that instead focuses on very flexibly modeling just the response surface using a Bayesian nonparametric modeling procedure, Bayesian Additive Regression Trees (BART). BART has several advantages: it is far simpler to use than many recent competitors, requires less guesswork in model fitting, handles a large number of predictors, yields coherent uncertainty intervals, and fluidly handles continuous treatment variables and missing data for the outcome variable. BART also naturally identifies heterogeneous treatment effects. BART produces more accurate estimates of average treatment effects compared to propensity score matching, propensity-weighted estimators, and regression adjustment in the nonlinear simulation situations examined. Further, it is highly competitive in linear settings with the “correct” model, linear regression. Supplemental materials including code and data to replicate simulations and examples from the article as well as methods for population inference are available online.  相似文献   

14.
An extension of probabilistic PERT/CPM is proposed as a framework for soliciting expert opinion to characterize random variables for stochastic treatment in simulation models. By eliciting minimum, modal, ninetieth percentile, and maximum estimates, the distribution of variables with probability density functions of beta form can be explicitly characterized without relying on the traditional, but empirically unverified, assumption of a standard deviation equal to one-sixth of the range. This practical and inexpensive technique is illustrated by application to a wildfire protection planning problem – estimating the time required to produce a given length of fireline by different firefighting resources under diverse conditions. The estimated production times are an essential input to a planning model of initial attack on wildland fires used by the California Department of Forestry and Fire Protection, and provide that agency with useful rules-of-thumb for use in firefighter training.  相似文献   

15.
A computer experiment-based optimization approach employs design of experiments and statistical modeling to represent a complex objective function that can only be evaluated pointwise by running a computer model. In large-scale applications, the number of variables is huge, and direct use of computer experiments would require an exceedingly large experimental design and, consequently, significant computational effort. If a large portion of the variables have little impact on the objective, then there is a need to eliminate these before performing the complete set of computer experiments. This is a variable selection task. The ideal variable selection method for this task should handle unknown nonlinear structure, should be computationally fast, and would be conducted after a small number of computer experiment runs, likely fewer runs (N) than the number of variables (P). Conventional variable selection techniques are based on assumed linear model forms and cannot be applied in this “large P and small N” problem. In this paper, we present a framework that adds a variable selection step prior to computer experiment-based optimization, and we consider data mining methods, using principal components analysis and multiple testing based on false discovery rate, that are appropriate for our variable selection task. An airline fleet assignment case study is used to illustrate our approach.  相似文献   

16.
We apply Bayesian methods to a model involving a binary nonrandom treatment intake variable and an instrumental variable in which the functional forms of some of the covariates in both the treatment intake and outcome distributions are unknown. Continuous and binary response variables are considered. Under the assumption that the functional form is additive in the covariates, we develop efficient Markov chain Monte Carlo-based approaches for summarizing the posterior distribution and for comparing various alternative models via marginal likelihoods and Bayes factors. We show in a simulation experiment that the methods are capable of recovering the unknown functions and are sensitive neither to the sample size nor to the degree of confounding as measured by the correlation between the errors in the treatment and response equations. In the binary response case, however, estimation of the average treatment effect requires larger sample sizes, especially when the degree of confounding is high. The methods are applied to an example dealing with the effect on wages of more than 12 years of education.  相似文献   

17.
This paper proposes a new approach for variable selection in partially linear errors-in-variables (EV) models for longitudinal data by penalizing appropriate estimating functions. We apply the SCAD penalty to simultaneously select significant variables and estimate unknown parameters. The rate of convergence and the asymptotic normality of the resulting estimators are established. Furthermore, with proper choice of regularization parameters, we show that the proposed estimators perform as well as the oracle procedure. A new algorithm is proposed for solving penalized estimating equation. The asymptotic results are augmented by a simulation study.  相似文献   

18.
The aggregation of financial and economic time series occurs in a number of ways. Temporal aggregation or systematic sampling is the commonly used approach. In this paper, we investigate the time interval effect of multiple regression models in which the variables are additive or systematically sampled. The correlation coefficient changes with the selected time interval when one is additive and the other is systematically sampled. It is shown that the squared correlation coefficient decreases monotonically as the differencing interval increases, approaching zero in the limit. When two random variables are both added or systematically sampled, the correlation coefficient is invariant with time and equal to the one-period values. We find that the partial regression and correlation coefficients between two additive or systematically sampled variables approach one-period values as n increases. When one of the variables is systematically sampled, they will approach zero in the limit. The time interval for the association analyses between variables is not selected arbitrarily or the statistical results are likely affected.  相似文献   

19.
This paper deals with estimation of production technology where endogeneous choice of input and output variables is explicitly recognized. To address this endogeneity issue, we assume that producers maximize return to the outlay. We start from a flexible (translog) transformation function with a single output and multiple inputs and show how the first-order conditions of maximizing return to the outlay can be used to come up with an ‘estimating equation’ that does not suffer from the econometric endogeneity problem although the output and input variables are chosen endogenously. This is because the regressors in this estimating equation are in ratio forms which are uncorrelated with the error term under the assumption that producers maximize return to the outlay. The analysis is then extended to the multiple outputs and multiple inputs case with technical inefficiency. Although the estimating equations in both single and multiple output cases are neither production nor distance functions, they can be estimated in a straightforward manner using the standard stochastic frontier technique without worrying about endogeneity of the regressors. Thus, we provide a rationale for estimating the technology parameters consistently using an econometric model which requires data on only input and output quantities.  相似文献   

20.
A Monte Carlo study is conducted to compare the stochastic frontier method and the data envelopment analysis (DEA) method in measuring efficiency in situations where firms are subject to the effects of factors which are beyond managerial control. In making efficiency measurements and comparisons, one must separate the effects of the environment (the exogenous factors) and the effects of the productive efficiency. There are two basic approaches to account for the effects of exogenous variables: (1) an one-step procedure which includes the exogenous variables directly in estimating the efficiency measures, and (2) a two-step procedure which first estimates the relative ‘gross’ efficiencies using inputs and outputs, then analyzes the effects of the exogenous variables on the ‘gross’ efficiency. The results show that the magnitude of exogenous variables does not appear to have any significant effect on the performance of the one-step stochastic frontier method as long as the exogenous variables are correctly identified and accounted for. However, the effects of exogenous variables are significant for the two-step approach, especially for the DEA methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号