首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract

A comparison and evaluation is made of recent proposals for multivariate matched sampling in observational studies, where the following three questions are answered: (1) Algorithms: In current statistical practice, matched samples are formed using “nearest available” matching, a greedy algorithm. Greedy matching does not minimize the total distance within matched pairs, though good algorithms exist for optimal matching that do minimize the total distance. How much better is optimal matching than greedy matching? We find that optimal matching is sometimes noticeably better than greedy matching in the sense of producing closely matched pairs, sometimes only marginally better, but it is no better than greedy matching in the sense of producing balanced matched samples. (2) Structures: In common practice, treated units are matched to one control, called pair matching or 1–1 matching, or treated units are matched to two controls, called 1–2 matching, and so on. It is known, however, that the optimal structure is a full matching in which a treated unit may have one or more controls or a control may have one or more treated units. Optimal 1 — k matching is compared to optimal full matching, finding that optimal full matching is often much better. (3) Distances: Matching involves defining a distance between covariate vectors, and several such distances exist. Three recent proposals are compared. Practical advice is summarized in a final section.  相似文献   

2.
Abstract

Observational or nonrandomized studies of treatment effects are often constructed with the aid of polynomial-time algorithms that optimally form matched treatment-control pairs or matched sets. Because each observational comparison may potentially be affected by bias, investigators often reinforce a single comparison with an additional comparison that is unlikely to be affected by the same biases, for instance using multiple control groups or evidence factors or control?+?instrument designs. Use of two comparisons affected by different biases may detect bias if the two comparisons disagree, or may show that two comparisons with different weakness concur in their conclusions. Even this simplest addition—a second comparison—creates design problems without polynomial-time solutions. Faced with a problem that no polynomial-time algorithm can solve, a so-called approximation algorithm is a type of compromise: it provides a solution in polynomial time that is provably not much worse than the unattainable optimal solution. Building upon existing techniques for related problems in operations research, we develop an approximation algorithm for minimum distance matching with near-fine balance for three comparison groups. This algorithm is a practical approach to most observational designs that add a second comparison. The method is applied to an observational study of the effects of side airbags on injury severity in the U.S. Fatality Analysis Reporting System. For many car makes and models, side airbags were initially unavailable, then later available as optional equipment for an additional fee, then still later provided as standard equipment. Within sets matched for make and model of car, for safety belt use, for direction of impact, and other covariates, we compare crashes in these three periods, where each comparison has different limitations. The method is implemented in the R package approxmatch, whose example reproduces some of the calculations. Supplementary materials for this article are available online.  相似文献   

3.
In a tapered matched comparison, one group of individuals, called the focal group, is compared to two or more nonoverlapping matched comparison groups constructed from one population in such a way that successive comparison groups increasingly resemble the focal group. An optimally tapered matching solves two problems simultaneously: it optimally divides the single comparison population into nonoverlapping comparison groups and optimally pairs members of the focal group with members of each comparison group. We show how to use the optimal assignment algorithm in a new way to solve the optimally tapered matching problem, with implementation in R. This issue often arises in studies of groups defined by race, gender, or other categorizations such that equitable public policy might require an understanding of the mechanisms that produce disparate outcomes, where certain specific mechanisms would be judged illegitimate, necessitating reform. In particular, we use data from Medicare and the SEER Program of the National Cancer Institute as part of an ongoing study of black-white disparities in survival among women with endometrial cancer.  相似文献   

4.
The choice of covariates values for a given block design attaining minimum variance for estimation of each of the regression parameters of the model has attracted attention in recent times. In this article, we consider the problem of finding the optimum covariate design (OCD) for the estimation of covariate parameters in a binary proper equi-replicate block (BPEB) design model with covariates, which cover a large class of designs in common use. The construction of optimum designs is based mainly on Hadamard matrices.  相似文献   

5.
The traditional Cox model assumes a log-linear relationship between covariates and the underlying hazard function. However, the linearity may be invalid in real data. We study a Cox model which employs unknown parametric covariate transformations. This model is applicable to observational studies or randomized trials when a treatment effect is investigated after controlling for a confounding variable that may have non-log-linear relationship with the underlying hazard function. While the proposed generalization is simple, the inferential issues are challenging due to the loss of identifiability under no effects of transformed covariates. Optimal tests are derived for certain alternatives. Rigorous parametric inference is established under regularity conditions and non-zero transformed covariate effects. The estimates perform well in simulation studies with realistic sample size, and the proposed tests are more powerful than the usual partial likelihood ratio test, which is no longer optimal. Data from a breast cancer trial are used to illustrate the model building strategy and the better fit of the proposed model, comparing to the traditional Cox model.  相似文献   

6.
In this article, we propose and explore a multivariate logistic regression model for analyzing multiple binary outcomes with incomplete covariate data where auxiliary information is available. The auxiliary data are extraneous to the regression model of interest but predictive of the covariate with missing data. Horton and Laird [N.J. Horton, N.M. Laird, Maximum likelihood analysis of logistic regression models with incomplete covariate data and auxiliary information, Biometrics 57 (2001) 34–42] describe how the auxiliary information can be incorporated into a regression model for a single binary outcome with missing covariates, and hence the efficiency of the regression estimators can be improved. We consider extending the method of [9] to the case of a multivariate logistic regression model for multiple correlated outcomes, and with missing covariates and completely observed auxiliary information. We demonstrate that in the case of moderate to strong associations among the multiple outcomes, one can achieve considerable gains in efficiency from estimators in a multivariate model as compared to the marginal estimators of the same parameters.  相似文献   

7.
This paper discusses the problem of testing the equality of two autoregressive functions against one-sided alternatives. The heteroscedastic error and stationary densities of the two independent strongly mixing strictly stationary time series can possibly be different. This paper adapts the covariate matching idea used in regression settings to construct a class of lag matched tests and derives their asymptotic normality under general one-sided local non-parametric alternatives. The paper also discusses asymptotically optimal tests against these alternatives within the proposed class of tests. MS Mathematics Subject Classifications: Primary 62M10, Secondary 62F03.  相似文献   

8.
Modern methods construct a matched sample by minimizing the total cost of a flow in a network, finding a pairing of treated and control individuals that minimizes the sum of within-pair covariate distances subject to constraints that ensure distributions of covariates are balanced. In aggregate, these methods work well; however, they can exhibit a lack of interest in a small number of pairs with large covariate distances. Here, a new method is proposed for imposing a minimax constraint on a minimum total distance matching. Such a match minimizes the total within-pair distance subject to various constraints including the constraint that the maximum pair difference is as small as possible. In an example with 1391 matched pairs, this constraint eliminates dozens of pairs with moderately large differences in age, but otherwise exhibits the same excellent covariate balance found without this additional constraint. A minimax constraint eliminates edges in the network, and can improve the worst-case time bound for the performance of the minimum cost flow algorithm, that is, a better match from a practical perspective may take less time to construct. The technique adapts ideas for a different problem, the bottleneck assignment problem, whose sole objective is to minimize the maximum within-pair difference; however, here, that objective becomes a constraint on the minimum cost flow problem. The method generalizes. Rather than constrain the maximum distance, it can constrain an order statistic. Alternatively, the method can minimize the maximum difference in propensity scores, and subject to doing that, minimize the maximum robust Mahalanobis distance. An example from labor economics is used to illustrate. Supplementary materials for this article are available online.  相似文献   

9.
We address a generalization of the classical one-dimensional bin packing problem with unequal bin sizes and costs. We investigate lower bounds for this problem as well as exact algorithms. The main contribution of this paper is to show that embedding a tight network flow-based lower bound, dominance rules, as well as an effective knapsack-based heuristic in a branch-and-bound algorithm yields very good performance. In addition, we show that the particular case with all weight items larger than a third the largest bin capacity can be restated and solved in polynomial-time as a maximum-weight matching problem in a nonbipartite graph. We report the results of extensive computational experiments that provide evidence that large randomly generated instances are optimally solved within moderate CPU times.  相似文献   

10.
张君 《应用概率统计》2012,28(3):319-330
本文考虑了部分线性模型中,线性部分协变量含有测量误差,并且线性部分的参数随着样本量的增大而发散的估计问题.我们考虑了用可观测的替代变量来替代不可观察到的真实变量,这种替代变量的期望与真实变量存在线性关系.我们提出了估计方法,并研究了估计量的相合性与渐进正态性.此外,我们研究了发散参数的发散速度.我们通过模拟来说明该估计的实际效果.  相似文献   

11.
When two groups of individuals are to be compared with respect to gene expression there will often be some potentially confounding variables that differ between the groups. Matching is an established approach for obtaining comparable groups and enabling subsequent univariate tests for each gene. Alternatively, the confounders might be incorporated directly into a multivariable regression model for adjustment. In contrast to univariate tests, such models can consider all genes simultaneously. Aiming to combine the advantages of both approaches, matching and multivariable modeling, we consider a matching-based boosting procedure for fitting risk prediction models in two-group settings. This possibly allows to identify and automatically remove problematic observations that might negatively affect the regression model. Therefore, we compare the ability to identify important covariates for this combination of matching and boosting with only boosting for different covariate correlation structures in a simulation study. Furthermore, we analyze the prediction performance of these approaches on two gene expression microarray studies. The first study comprises patients with B-cell and T-cell type acute lymphoblastic leukemia and the second patients with acute megakaryoblastic leukemia. While the matching component can in principle guard against problematic observations, the combined approach is seen to neither improve identification of important covariates nor to improve prediction performance. Therefore, a combination of the two approaches cannot be recommended. Adjustment for potential confounders is seen to provide the best performance, i.e. a pure multivariable regression modeling strategy seems to be promising even in presence of considerable heterogeneity.  相似文献   

12.
The use of maximum likelihood methods in analysing times to failure in the presence of unobserved randomly changing covariates requires constrained optimization procedures. An alternative approach using a generalized version of the EM-algorithm requires smoothed estimates of covariate values. Similar estimates are needed in evaluating past exposures to hazardous chemicals, radiation or other toxic materials when health effects only become evident long after their use. In this paper, two kinds of equation for smoothing estimates of unobserved covariates in survival problems are derived. The first shows how new information may be used to update past estimates of the covariates' values. The second can be used to project the covariates' trajectory from the present to the past. If the hazard function is quadratic in form, both types of smoothing equation can be derived in a closed analytical form. Examples of both types of equation are presented. Use of these equations in the extended EM-algorithm, and in estimating past exposures to hazardous materials, are discussed. © 1997 by John Wiley & Sons, Ltd.  相似文献   

13.
Pattern matching is a fundamental feature in many applications such as functional programming, logic programming, theorem proving, term rewriting and rule-based expert systems. Usually, patterns are pre-processed into a deterministic finite automaton. Using such an automaton allows one to determine the matched pattern(s) by a single scan of the input term. The matching automaton is typically based on left-to-right traversal of patterns. In this paper, we propose a method to build such an automaton. Then, we propose an incremental method to build a deterministic concise automaton for non-necessarily sequential rewriting systems. With ambiguous patterns a subject term may be an instance of more than one pattern. To select the pattern to use, a priority rule is usually engaged. The pre-processing of the patterns adds new patterns, which are instances of the original ones. When the original patterns are ambiguous, some of the instances supplied may be irrelevant for the matching process. They may cause an unnecessary increase in the space requirements of the automaton and may also reduce the time efficiency of the matching process. Here, we devise a new pre-processing operation that recognises and avoids such irrelevant instances. Hence improves space and time requirements for the matching automaton.  相似文献   

14.
In this paper we present a discrete survival model with covariates and random effects, where the random effects may depend on the observed covariates. The dependence between the covariates and the random effects is modelled through correlation parameters, and these parameters can only be identified for time-varying covariates. For time-varying covariates, however, it is possible to separate regression effects and selection effects in the case of a certain dependene structure between the random effects and the time-varying covariates that are assumed to be conditionally independent given the initial level of the covariate. The proposed model is equivalent to a model with independent random effects and the initial level of the covariates as further covariates. The model is applied to simulated data that illustrates some identifiability problems, and further indicate how the proposed model may be an approximation to retrospectively collected data with incorrect specification of the waiting times. The model is fitted by maximum likelihood estimation that is implemented as iteratively reweighted least squares. © 1998 John Wiley & Sons, Ltd.  相似文献   

15.
A binary disease outcome is commonly modeled with continuous covariates (e.g., biochemical concentration) in medical research, and the corresponding exploration may employ a normal discrimination approach. The covariate relationship affects the estimated association between binary outcome and the interesting covariate. The method of value deviated from a fitted value (fractional polynomial), which is abbreviated as VDFV, may reduce the estimation bias especially when the relationship between the covariates is nonlinear. However, when the extraneous variable relates to the outcome, the pooled data (cases and controls) are replaced by the control data only for the purpose of fitting values. Based on two association patterns, the extraneous variable unrelated to the outcome (I) and that related to the outcome (II), the simulation study reveals that VDFV-p (using pooled data) is reliable, with less bias and a smaller mean square error (MSE) in pattern (I) and that VDFV-c (using control data) shows less bias in pattern (II). The conventional covariate adjustment performs worse in (I) but fairly well in (II). Note that a huge MSE is never observed in VDFV-p or VDFV-c, while this is a common issue related to small sample size or sparse data in logistic regression. Two fetal studies are illustrated—one for pattern (I) and one for pattern (II).  相似文献   

16.
When basic necessary conditions for the existence of a balanced incomplete block design are satisfied, the design may still not exist or it may not be known whether it exists. In either case, other designs may be considered for the same parameters. In this article we introduce a class of alternative designs, which we will call virtually balanced incomplete block designs. From a statistical point of view these designs provide efficient alternatives to balanced incomplete block designs, and from a combinatorial point of view they offer challenging new questions. © 1995 John Wiley & Sons, Inc.  相似文献   

17.
In this paper we study alternating cycles in graphs embedded in a surface. We observe that 4-vertex-colorability of a triangulation on a surface can be expressed in terms of spanninq quadrangulations, and we establish connections between spanning quadrangulations and cycles in the dual graph which are noncontractible and alternating with respect to a perfect matching. We show that the dual graph of an Eulerian triangulation of an orientable surface other than the sphere has a perfect matching M and an M-alternating noncontractible cycle. As a consequence, every Eulerian triangulation of the torus has a nonbipartite spanning quadrangulation. For an Eulerian triangulation G of the projective plane the situation is different: If the dual graph \(G^*\) is nonbipartite, then \(G^*\) has no noncontractible alternating cycle, and all spanning quadrangulations of G are bipartite. If the dual graph \(G^*\) is bipartite, then it has a noncontractible, M-alternating cycle for some (and hence any) perfect matching, G has a bipartite spanning quadrangulation and also a nonbipartite spanning quadrangulation.  相似文献   

18.
We generalize the bandit process with a covariate introduced by Woodroofe in several significant directions: a linear regression model characterizing the unknown arm, an unknown variance for regression residuals and general discounting sequence for a non-stationary model. With the Bayesian regression approach, we assume a normal-gamma conjugate prior distribution of the unknown parameters. It is shown that the optimal strategy is determined by a sequence of index values which are monotonic and determined by the observed value of the covariate and updated posterior distributions. We further show that the myopic strategy is not optimal in general. Such structural properties help to understand the tradeoff between information gathering and immediate expected payoff and may provide certain insight for covariate adjusted response adaptive design of clinical trials.  相似文献   

19.
A graph of order n is p ‐factor‐critical, where p is an integer of the same parity as n, if the removal of any set of p vertices results in a graph with a perfect matching. 1‐factor‐critical graphs and 2‐factor‐critical graphs are factor‐critical graphs and bicritical graphs, respectively. It is well known that every connected vertex‐transitive graph of odd order is factor‐critical and every connected nonbipartite vertex‐transitive graph of even order is bicritical. In this article, we show that a simple connected vertex‐transitive graph of odd order at least five is 3‐factor‐critical if and only if it is not a cycle.  相似文献   

20.
The maximum integer skew-symmetric flow problem (MSFP) generalizes both the maximum flow and maximum matching problems. It was introduced by Tutte [28] in terms of self-conjugate flows in antisymmetrical digraphs. He showed that for these objects there are natural analogs of classical theoretical results on usual network flows, such as the flow decomposition, augmenting path, and max-flow min-cut theorems. We give unified and shorter proofs for those theoretical results. We then extend to MSFP the shortest augmenting path method of Edmonds and Karp [7] and the blocking flow method of Dinits [4], obtaining algorithms with similar time bounds in general case. Moreover, in the cases of unit arc capacities and unit node capacities our blocking skew-symmetric flow algorithm has time bounds similar to those established in [8, 21] for Dinits algorithm. In particular, this implies an algorithm for finding a maximum matching in a nonbipartite graph in time, which matches the time bound for the algorithm of Micali and Vazirani [25]. Finally, extending a clique compression technique of Feder and Motwani [9] to particular skew-symmetric graphs, we speed up the implied maximum matching algorithm to run in time, improving the best known bound for dense nonbipartite graphs. Also other theoretical and algorithmic results on skew-symmetric flows and their applications are presented.Mathematics Subject Classification (1991): 90C27, 90B10, 90C10, 05C85  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号