首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This article introduces graphical tools for visualizing multivariate functions, specializing to the case of visualizing multivariate density estimates. We visualize a density estimate by visualizing a series of its level sets. From each connected part of a level set a shape tree is formed. A shape tree is a tree whose nodes are associated with regions of the level set. With the help of a shape tree we define a transformation of a multivariate set to a univariate function. The shape trees are visualized with the shape plots and the location plot. By studying these plots one may identify the regions of the Euclidean space where the probability mass is concentrated. An application of shape trees to visualize the distribution of stock index returns is presented.  相似文献   

2.
Abstract

The mode tree of Minnotte and Scott provides a valuable method of investigating features such as modes and bumps in a unknown density. By examining kernel density estimates for a range of bandwidths, we can learn a lot about the structure of a data set. Unfortunately, the basic mode tree can be strongly affected by small changes in the data, and gives no way to differentiate between important modes and those caused, for example, by outliers. The mode forest overcomes these difficulties by looking simultaneously at a large collection of mode trees, all based on some variation of the original data, by means such as resampling or jittering. The resulting graphic tool is both visually appealing and informative.  相似文献   

3.
Estimating phylogenetic trees is an important problem in evolutionary biology, environmental policy, and medicine. Although trees are estimated, their uncertainties are generally discarded in statistical models for tree-valued data. Here, we explicitly model the multivariate uncertainty of tree estimates. We consider both the cases where uncertainty information arises extrinsically (through covariate information) and intrinsically (through the tree estimates themselves). The latter case is applicable to any procedure for tree estimation, and thus has broad relevance to the entire field of phylogenetics. The importance of accounting for tree uncertainty in tree space is demonstrated in two case studies. In the first instance, differences between gene trees are small relative to their uncertainties, while in the second, the differences are relatively large. Our main goal is visualization of tree uncertainty, and we demonstrate advantages of our method with respect to reproducibility, speed, and preservation of topological differences compared to visualization based on multidimensional scaling. The proposal highlights that phylogenetic trees are estimated in an extremely high-dimensional space, resulting in uncertainty information that cannot be discarded. Most importantly, it is a method that allows biologists to diagnose whether differences between gene trees are biologically meaningful or due to uncertainty in estimation.  相似文献   

4.
5.
The goal of clustering is to detect the presence of distinct groups in a dataset and assign group labels to the observations. Nonparametric clustering is based on the premise that the observations may be regarded as a sample from some underlying density in feature space and that groups correspond to modes of this density. The goal then is to find the modes and assign each observation to the domain of attraction of a mode. The modal structure of a density is summarized by its cluster tree; modes of the density correspond to leaves of the cluster tree. Estimating the cluster tree is the primary goal of nonparametric cluster analysis. We adopt a plug-in approach to cluster tree estimation: estimate the cluster tree of the feature density by the cluster tree of a density estimate. For some density estimates the cluster tree can be computed exactly; for others we have to be content with an approximation. We present a graph-based method that can approximate the cluster tree of any density estimate. Density estimates tend to have spurious modes caused by sampling variability, leading to spurious branches in the graph cluster tree. We propose excess mass as a measure for the size of a branch, reflecting the height of the corresponding peak of the density above the surrounding valley floor as well as its spatial extent. Excess mass can be used as a guide for pruning the graph cluster tree. We point out mathematical and algorithmic connections to single linkage clustering and illustrate our approach on several examples. Supplemental materials for the article, including an R package implementing generalized single linkage clustering, all datasets used in the examples, and R code producing the figures and numerical results, are available online.  相似文献   

6.
The analysis of local changes in sequence data is of interest for various applications such as the segmentation of DNA and other genetic sequences, or financial data sequences. Patterns of change that can be characterized as local jump change or slope change are of special interest. We propose simple graphical tools to visualize such patterns of local change. The concept of mode trees—developed for the visualization of local patterns in densities—is adapted to visualize patterns of local change in dependency on a threshold parameter by means of a change tree . The simultaneous visualization of scale effects, in analogy to SiZer, motivates another graphical device, the mutagram . We illustrate these concepts with several sets of sequence data.  相似文献   

7.
This article introduces a classification tree algorithm that can simultaneously reduce tree size, improve class prediction, and enhance data visualization. We accomplish this by fitting a bivariate linear discriminant model to the data in each node. Standard algorithms can produce fairly large tree structures because they employ a very simple node model, wherein the entire partition associated with a node is assigned to one class. We reduce the size of our trees by letting the discriminant models share part of the data complexity. Being themselves classifiers, the discriminant models can also help to improve prediction accuracy. Finally, because the discriminant models use only two predictor variables at a time, their effects are easily visualized by means of two-dimensional plots. Our algorithm does not simply fit discriminant models to the terminal nodes of a pruned tree, as this does not reduce the size of the tree. Instead, discriminant modeling is carried out in all phases of tree growth and the misclassification costs of the node models are explicitly used to prune the tree. Our algorithm is also distinct from the “linear combination split” algorithms that partition the data space with arbitrarily oriented hyperplanes. We use axis-orthogonal splits to preserve the interpretability of the tree structures. An extensive empirical study with real datasets shows that, in general, our algorithm has better prediction power than many other tree or nontree algorithms.  相似文献   

8.
We study multi-pattern matching in a scenario where the pattern set is to be matched to several texts and hence indexing the pattern set is affordable. This kind of scenarios arise, for example, in metagenomics, where pattern set represents DNA of several species and the goal is to find out which species are represented in the sample and in which quantity. We develop a generic search method that exploits bidirectional indexes both for the pattern set and texts, and analyze the best and worst case running time of the method on worst case text. We show that finding the instance of the search method with minimum best case running time on worst case text is NP-hard. The positive result is that an instance with logarithm factor approximation to minimum best case running time can be found in polynomial time using a bidirectional index called affix tree. We further show that affix trees can be simulated, in reduced space, using bidirectional variant of compressed suffix trees. Lastly, we provide a practical implementation of this approach. We show that one can obtain 3-fold speed up against the basic scenario of searching each pattern independently with data sets typical in high-throughput DNA sequencing.  相似文献   

9.
We present a simple, efficient, and computationally cheap sampling method for exploring an un-normalized multivariate density on ?(d), such as a posterior density, called the Polya tree sampler. The algorithm constructs an independent proposal based on an approximation of the target density. The approximation is built from a set of (initial) support points - data that act as parameters for the approximation - and the predictive density of a finite multivariate Polya tree. In an initial "warming-up" phase, the support points are iteratively relocated to regions of higher support under the target distribution to minimize the distance between the target distribution and the Polya tree predictive distribution. In the "sampling" phase, samples from the final approximating mixture of finite Polya trees are used as candidates which are accepted with a standard Metropolis-Hastings acceptance probability. Several illustrations are presented, including comparisons of the proposed approach to Metropolis-within-Gibbs and delayed rejection adaptive Metropolis algorithm.  相似文献   

10.
Multidimensional multivariate data have been studied in different areas for quite some time. Commonly, the analysis goal is not to look into individual records but to understand the distribution of the records at large and to find clusters of records that exhibit correlations between dimensions or variables. We propose a visualization method that operates on density rather than individual records. To not restrict our search for clusters, we compute density in the given multidimensional space. Clusters are formed by areas of high density. We present an approach that automatically computes a hierarchical tree of high density clusters. For visualization purposes, we propose a method to project the multidimensional clusters to a 2D or 3D layout. The projection method uses an optimized star coordinates layout. The optimization procedure minimizes the overlap of projected clusters and maximally maintains the cluster shapes, compactness, and distribution. The star coordinate visualization allows for an interactive analysis of the distribution of clusters and comprehension of the relations between clusters and the original dimensions. Clusters are being visualized using nested sequences of density level sets leading to a quantitative understanding of information content, patterns, and relationships.  相似文献   

11.
Spanning tree problems defined in a preference-based environment are addressed. In this approach, optimality conditions for the minimum-weight spanning tree problem (MST) are generalized for use with other, more general preference orders. The main goal of this paper is to determine which properties of the preference relations are sufficient to assure that the set of ‘most-preferred’ trees is the set of spanning trees verifying the optimality conditions. Finally, algorithms for the construction of the set of spanning trees fulfilling the optimality conditions are designed, improving the methods in previous papers.  相似文献   

12.
The aim of this paper is to present a method for selecting the optimal tree among the possible trees that can be generated starting from the a data set. Analysis a quantity criterion is used through the linear combination of the quality measurements of the tree, namely, resubstitution error and linearity. The application of the method leads to a succession of optimal trees, in such a way, that an element of the succession is associated with each possible value of the linear combination's parameter α.  相似文献   

13.
We introduce methods for visualization of data structured along trees, especially hierarchically structured collections of time series. To this end, we identify questions that often emerge when working with hierarchical data and provide an R package to simplify their investigation. Our key contribution is the adaptation of the visualization principles of focus-plus-context and linking to the study of tree-structured data. Our motivating application is to the analysis of bacterial time series, where an evolutionary tree relating bacteria is available a priori. However, we have identified common problem types where, if a tree is not directly available, it can be constructed from data and then studied using our techniques. We perform detailed case studies to describe the alternative use cases, interpretations, and utility of the proposed visualization methods.  相似文献   

14.
On spanning tree problems with multiple objectives   总被引:4,自引:0,他引:4  
We investigate two versions of multiple objective minimum spanning tree problems defined on a network with vectorial weights. First, we want to minimize the maximum ofQ linear objective functions taken over the set of all spanning trees (max-linear spanning tree problem, ML-ST). Secondly, we look for efficient spanning trees (multi-criteria spanning tree problem, MC-ST).Problem ML-ST is shown to be NP-complete. An exact algorithm which is based on ranking is presented. The procedure can also be used as an approximation scheme. For solving the bicriterion MC-ST, which in the worst case may have an exponential number of efficient trees, a two-phase procedure is presented. Based on the computation of extremal efficient spanning trees we use neighbourhood search to determine a sequence of solutions with the property that the distance between two consecutive solutions is less than a given accuracy.Partially supported by Deutsche Forschungsgemeinschaft and HCº Contract no. ERBCHRXCT 930087.Partially supported by Alexander von Humboldt-Stiftung.  相似文献   

15.
16.
17.
不同密度对杉木中龄林生长的影响   总被引:2,自引:0,他引:2  
本文通过随机区组试验设计,研究杉木中龄林不同密度和年份对蓄积量、树高、胸径、冠幅的影响。运用SAS统计软件进行多元协方差分析、二次回归模型的建立及偏相关分析,结果表明:株数、胸径对蓄积量呈正相关,但株数与胸径为负相关,不同的密度杉木中龄林的生长有极显著的影响,而且以每亩70株的林分密度更有利于杉木的生长。  相似文献   

18.
This paper provides a method of finding the optimal expansion process and discusses the marginal analysis for expansion of the competence set when the cost functions are asymmetric. The concept of tree expansion process is introduced, and a method of finding the optimal tree expansion process is given. The paper also shows a way to identify the optimal competence set when both the expected return and cost are considered.  相似文献   

19.
Increasingly, biologists are constructing evolutionary trees on large numbers of overlapping sets of taxa, and then combining them into a ‘supertree’ that classifies all the taxa. In this paper, we ask how much coverage of the total set of taxa is required by these subsets in order to ensure that we have enough information to reconstruct the supertree uniquely. We describe two results — a combinatorial characterization of the covering subsets to ensure that at most one supertree can be constructed from the smaller trees (whichever trees these may be) and a more liberal analysis that asks only that the supertree is highly likely to be uniquely specified by the tree structure on the covering subsets.  相似文献   

20.
Fault tree analysis (FTA) is a powerful technique that is widely used for evaluating system safety and reliability. It can be used to assess the effects of combinations of failures on system behaviour but is unable to capture sequence dependent dynamic behaviour. A number of extensions to fault trees have been proposed to overcome this limitation. Pandora, one such extension, introduces temporal gates and temporal laws to allow dynamic analysis of temporal fault trees (TFTs). It can be easily integrated in model-based design and analysis techniques. The quantitative evaluation of failure probability in Pandora TFTs is performed using exact probabilistic data about component failures. However, exact data can often be difficult to obtain. In this paper, we propose a method that combines expert elicitation and fuzzy set theory with Pandora TFTs to enable dynamic analysis of complex systems with limited or absent exact quantitative data. This gives Pandora the ability to perform quantitative analysis under uncertainty, which increases further its potential utility in the emerging field of model-based design and dependability analysis. The method has been demonstrated by applying it to a fault tolerant fuel distribution system of a ship, and the results are compared with the results obtained by other existing techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号