首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
This paper presents an extension of the standard regression tree method to clustered data. Previous works extending tree methods to accommodate correlated data are mainly based on the multivariate repeated-measures approach. We propose a “mixed effects regression tree” method where the correlated observations are viewed as nested within clusters rather than as vectors of multivariate repeated responses. The proposed method can handle unbalanced clusters, allows observations within clusters to be split, and can incorporate random effects and observation-level covariates. We implemented the proposed method using a standard tree algorithm within the framework of the expectation-maximization (EM) algorithm. The simulation results show that the proposed regression tree method provides substantial improvements over standard trees when the random effects are non negligible. A real data example is used to illustrate the method.  相似文献   

3.
Game trees are an important model of decision-making situations, both in artificial intelligence and decision analysis. The model most frequently investigated in theoretical research consists of a uniform tree of heighh and a constant branching factorb, where the terminal positions are assigned the values of independent, identically distributed random variables [1, 3–10]. Our paper investigates two generalizations:
1.  Different levels of the tree may have different branching factors.
2.  The preferences of the two players may no longer be totally opposite.
  相似文献   

4.
We present an average case analysis of the minimum spanning tree heuristic for the power assignment problem. The worst‐case approximation ratio of this heuristic is 2. We show that in Euclidean d‐dimensional space, when the vertex set consists of a set of i.i.d. uniform random independent, identically distributed random variables in [0,1]d, and the distance power gradient equals the dimension d, the minimum spanning tree‐based power assignment converges completely to a constant depending only on d.  相似文献   

5.
Treed Regression     
Abstract

Given a data set consisting of n observations on p independent variables and a single dependent variable, treed regression creates a binary tree with a simple linear regression function at each of the leaves. Each node of the tree consists of an inequality condition on one of the independent variables. The tree is generated from the training data by a recursive partitioning algorithm. Treed regression models are more parsimonious than CART models because there are fewer splits. Additionally, monotonicity in some or all of the variables can be imposed.  相似文献   

6.
Scenario tree modeling for multistage stochastic programs   总被引:2,自引:0,他引:2  
An important issue for solving multistage stochastic programs consists in the approximate representation of the (multivariate) stochastic input process in the form of a scenario tree. In this paper, we develop (stability) theory-based heuristics for generating scenario trees out of an initial set of scenarios. They are based on forward or backward algorithms for tree generation consisting of recursive scenario reduction and bundling steps. Conditions are established implying closeness of optimal values of the original process and its tree approximation, respectively, by relying on a recent stability result in Heitsch, Römisch and Strugarek (SIAM J Optim 17:511–525, 2006) for multistage stochastic programs. Numerical experience is reported for constructing multivariate scenario trees in electricity portfolio management.  相似文献   

7.
Most everyday reasoning and decision making is based on uncertain premises. The premises or attributes, which we must take into consideration, are random variables, therefore we often have to deal with a high dimensional multivariate random vector. A multivariate random vector can be represented graphically as a Markov network. Usually the structure of the Markov network is unknown. In this paper we construct special type of junction trees, in order to obtain good approximations of the real probability distribution. These junction trees are capable of revealing some of the conditional independences of the network. We have already introduced the concept of the t-cherry junction tree (E. Kovács and T. Szántai in Proceedings of the IFIP/IIASA//GAMM Workshop on Coping with Uncertainty, 2010), based on the t-cherry tree graph structure. This approximation uses only two and three dimensional marginal probability distributions. Now we use k-th order t-cherry trees, also called simplex multitrees to introduce the concept of the k-th order t-cherry junction tree. We prove that the k-th order t-cherry junction tree gives the best approximation among the family of k-width junction trees. Then we give a method which starting from a k-th order t-cherry junction tree constructs a (k+1)-th order t-cherry junction tree which gives at least as good approximation. In the last part we present some numerical results and some possible applications.  相似文献   

8.
This paper considers the problem of simple linear regression with interval-censored data. That is, \(n\) pairs of intervals are observed instead of the \(n\) pairs of precise values for the two variables (dependent and independent). Each of these intervals is closed but possibly unbounded, and contains the corresponding (unobserved) value of the dependent or independent variable. The goal of the regression is to describe the relationship between (the precise values of) these two variables by means of a linear function. Likelihood-based Imprecise Regression (LIR) is a recently introduced, very general approach to regression for imprecisely observed quantities. The result of a LIR analysis is in general set-valued: it consists of all regression functions that cannot be excluded on the basis of likelihood inference. These regression functions are said to be undominated. Since the interval data can be unbounded, a robust regression method is necessary. Hence, we consider the robust LIR method based on the minimization of the residuals’ quantiles. For this method, we prove that the set of all the intercept-slope pairs corresponding to the undominated regression functions is the union of finitely many polygons. We give an exact algorithm for determining this set (i.e., for determining the set-valued result of the robust LIR analysis), and show that it has worst-case time complexity \(O(n^{3}\log n)\) . We have implemented this exact algorithm as part of the R package linLIR.  相似文献   

9.
We consider classes of stochastic linear programming problems which can be efficiently solved by deterministic algorithms. For two–stage recourse problems we identify two such classes. The first one consists of problems where the number of stochastically independent random variables is relatively low; the second class is the class of simple recourse problems. The proposed deterministic algorithm is successive discrete approximation. We also illustrate the impact of required accuracy on the efficiency of this algorithm. For jointly chance constrained problems with a random right–hand–side and multivariate normal distribution we demonstrate the increase in efficiency when lower accuracy is required, for a central cutting plane method. We support our argumentation and findings with computational results.  相似文献   

10.
This article introduces graphical tools for visualizing multivariate functions, specializing to the case of visualizing multivariate density estimates. We visualize a density estimate by visualizing a series of its level sets. From each connected part of a level set a shape tree is formed. A shape tree is a tree whose nodes are associated with regions of the level set. With the help of a shape tree we define a transformation of a multivariate set to a univariate function. The shape trees are visualized with the shape plots and the location plot. By studying these plots one may identify the regions of the Euclidean space where the probability mass is concentrated. An application of shape trees to visualize the distribution of stock index returns is presented.  相似文献   

11.
We consider the random fragmentation process introduced by Kolmogorov, where a particle having some mass is broken into pieces and the mass is distributed among the pieces at random in such a way that the proportions of the mass shared among different daughters are specified by some given probability distribution (the dislocation law); this is repeated recursively for all pieces. More precisely, we consider a version where the fragmentation stops when the mass of a fragment is below some given threshold, and we study the associated random tree. Dean and Majumdar found a phase transition for this process: the number of fragmentations is asymptotically normal for some dislocation laws but not for others, depending on the position of roots of a certain characteristic equation. This parallels the behavior of discrete analogues with various random trees that have been studied in computer science. We give rigorous proofs of this phase transition, and add further details. The proof uses the contraction method. We extend some previous results for recursive sequences of random variables to families of random variables with a continuous parameter; we believe that this extension has independent interest.  相似文献   

12.
The elastic generalized assignment problem (eGAP) is a natural extension of the generalized assignment problem (GAP) where the capacities are not fixed but can be adjusted; this adjustment can be expressed by continuous variables. These variables might be unbounded or restricted by a lower or upper bound, respectively. This paper concerns techniques aiming at reducing several variants of eGAP to GAP, which enables us to employ standard approaches for the GAP. This results in a heuristic, which can be customized in order to provide solutions having an objective value arbitrarily close to the optimal.  相似文献   

13.
The range over standard deviation of a set of univariate data points is given a natural multivariate extension through the Mahalanobis distance. The problem of finding extrema of this multivariate extension of “range over standard deviation” is investigated. The supremum (maximum) is found using Lagrangian methods and an interval is given for the infinimum. The independence of optimizing the Mahalanobis distance and the multivariate extension of range is demonstrated and connections are explored in several examples using an analogue of the “hat” matrix of linear regression.  相似文献   

14.
Models of complex phenomena often consist of hypothetical entities called “hidden causes,” which cannot be observed directly and yet play a major role in understanding those phenomena. This paper examines the computational roles of these constructs, and addresses the question of whether they can be discovered from empirical observations. Causal models are treated as trees of binary random variables where the leaves are accessible to direct observation, and the internal nodes—representing hidden causes—account for interleaf dependencies. In probabilistic terms, every two leaves are conditionally independent given the value of some internal node between them. We show that if the mechanism which drives the visible variables is indeed tree structured, then it is possible to uncover the topology of the tree uniquely by observing pairwise dependencies among the leaves. The entire tree structure, including the strengths of all internal relationships, can be reconstructed in time proportional to n log n, where n is the number of leaves.  相似文献   

15.
In this paper we study exact distributions of runs on directed trees. On the assumption that the collection of random variables indexed by the vertices of a directed tree has a directed Markov distribution, the exact distribution theory of runs is extended from based on random sequences to based on directed trees. The distribution of the number of success runs of a specified length on a directed tree along the direction is derived. A consecutive-k-out-of-n:F system on a directed tree is introduced and investigated. By assuming that the lifetimes of the components are independent and identically distributed, we give the exact distribution of the lifetime of the consecutive system. The results are not only theoretical but also suitable for computation.  相似文献   

16.
在协变量和反映变量都缺失下,构造了线性模型中反映变量均值的经验似然置信区间,数据模拟表明调整的经验似然置信区间有较好的覆盖率和精度,进一步完善了缺失数据下对线性模型的研究.  相似文献   

17.
A massive amount of data about individual electrical consumptions are now provided with new metering technologies and smart grids. These new data are especially useful for load profiling and load modeling at different scales of the electrical network. A new methodology based on mixture of high‐dimensional regression models is used to perform clustering of individual customers. It leads to uncovering clusters corresponding to different regression models. Temporal information is incorporated in order to prepare the next step, the fit of a forecasting model in each cluster. Only the electrical signal is involved, slicing the electrical signal into consecutive curves to consider it as a discrete time series of curves. Interpretation of the models is given on a real smart meter dataset of Irish customers.  相似文献   

18.
This article presents a method for visualization of multivariate functions. The method is based on a tree structure—called the level set tree—built from separated parts of level sets of a function. The method is applied for visualization of estimates of multivarate density functions. With different graphical representations of level set trees we may visualize the number and location of modes, excess masses associated with the modes, and certain shape characteristics of the estimate. Simulation examples are presented where projecting data to two dimensions does not help to reveal the modes of the density, but with the help of level set trees one may detect the modes. I argue that level set trees provide a useful method for exploratory data analysis.  相似文献   

19.
We analyze the eigenvalues of the adjacency matrices of a wide variety of random trees. Using general, broadly applicable arguments based on the interlacing inequalities for the eigenvalues of a principal submatrix of a Hermitian matrix and a suitable notion of local weak convergence for an ensemble of random trees that we call probability fringe convergence, we show that the empirical spectral distributions for many random tree models converge to a deterministic (model-dependent) limit as the number of vertices goes to infinity. Moreover, the masses assigned by the empirical spectral distributions to individual points also converge in distribution to constants. We conclude for ensembles such as the linear preferential attachment models, random recursive trees, and the uniform random trees that the limiting spectral distribution has a set of atoms that is dense in the real line. We obtain lower bounds on the mass assigned to zero by the empirical spectral measures via the connection between the number of zero eigenvalues of the adjacency matrix of a tree and the cardinality of a maximal matching on the tree. In particular, we employ a simplified version of an algorithm due to Karp and Sipser to construct maximal matchings and understand their properties. Moreover, we show that the total weight of a weighted matching is asymptotically equivalent to a constant multiple of the number of vertices when the edge weights are independent, identically distributed, nonnegative random variables with finite expected value, thereby significantly extending a result obtained by Aldous and Steele in the special case of uniform random trees. We greatly generalize a celebrated result obtained by Schwenk for the uniform random trees by showing that if any ensemble converges in the probability fringe sense and a very mild further condition holds, then, with probability converging to one, the spectrum of a realization is shared by at least one other (nonisomorphic) tree. For the linear preferential attachment model with parameter a>?1, we show that for any fixed k, the k largest eigenvalues jointly converge in distribution to a nontrivial limit when rescaled by $n^{1/2\gamma_{a}}$ , where ?? a =a+2 is the Malthusian rate of growth parameter for an associated continuous-time branching process.  相似文献   

20.
Estimating phylogenetic trees is an important problem in evolutionary biology, environmental policy, and medicine. Although trees are estimated, their uncertainties are generally discarded in statistical models for tree-valued data. Here, we explicitly model the multivariate uncertainty of tree estimates. We consider both the cases where uncertainty information arises extrinsically (through covariate information) and intrinsically (through the tree estimates themselves). The latter case is applicable to any procedure for tree estimation, and thus has broad relevance to the entire field of phylogenetics. The importance of accounting for tree uncertainty in tree space is demonstrated in two case studies. In the first instance, differences between gene trees are small relative to their uncertainties, while in the second, the differences are relatively large. Our main goal is visualization of tree uncertainty, and we demonstrate advantages of our method with respect to reproducibility, speed, and preservation of topological differences compared to visualization based on multidimensional scaling. The proposal highlights that phylogenetic trees are estimated in an extremely high-dimensional space, resulting in uncertainty information that cannot be discarded. Most importantly, it is a method that allows biologists to diagnose whether differences between gene trees are biologically meaningful or due to uncertainty in estimation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号