首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Graphs are powerful and versatile data structures that can be used to represent a wide range of different types of information. In this article, we introduce a method to analyze and then visualize an important class of data described over a graph—namely, ensembles of paths. Analysis of such path ensembles is useful in a variety of applications, in diverse fields such as transportation, computer networks, and molecular dynamics. The proposed method generalizes the concept of band depth to an ensemble of paths on a graph, which provides a center-outward ordering on the paths. This ordering is, in turn, used to construct a generalization of the conventional boxplot or whisker plot, called a path boxplot, which applies to paths on a graph. The utility of path boxplot is demonstrated for several examples of path ensembles including paths defined over computer networks and roads. Supplementary materials for this article are available online.  相似文献   

2.
We propose a novel “tree-averaging” model that uses the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian ensemble trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplementary materials for this article are available online.  相似文献   

3.
The global distribution and climatology of ice clouds are among the main uncertainties in climate modeling and prediction. In order to retrieve ice cloud properties from remote sensing measurements, the scattering properties of all cloud ice particle types must be known. The discrete dipole approximation (DDA) simulates scattering of radiation by arbitrarily shaped particles and is thus suitable for cloud ice crystals. The DDA models the particle as a collection of equal dipoles on a lattice, and is computationally much more expensive than approximations restricted to more regularly shaped particles. On a single computer the calculation for an ice particle of a specific size, for a given scattering plane at one specific wavelength can take several days. We have ported the core routines of the scattering suite “ADDA” to the open computing language (OpenCL), a framework for programming parallel devices like PC graphics cards (graphics processing units, GPUs) or multi-core CPUs. In a typical case we can achieve a speed-up on a GPU as compared to a CPU by a factor of 5 in double precision and a factor of 15 in single precision. Spreading the work load over multiple GPUs will allow calculating the scattering properties even of large cloud ice particles.  相似文献   

4.
The study of asymptotic behavior of minimizing trajectories on the Wasserstein space ��(��d) has so far been limited to the case d = 1 as all prior studies heavily relied on the isometric identification of ��(��) with a subset of the Hilbert space L2(0,1). There is no known analogue isometric identification when d > 1. In this article we propose a new approach, intrinsic to the Wasserstein space, which allows us to prove a weak KAM theorem on ��(��d), the space of probability measures on the torus, for any d ≥ 1. This space is analyzed in detail, facilitating the study of the asymptotic behavior/invariant measures associated with minimizing trajectories of a class of Lagrangians of practical importance. © 2014 Wiley Periodicals, Inc.  相似文献   

5.
Data often comes in the form of a point cloud sampled from an unknown compact subset of Euclidean space. The general goal of geometric inference is then to recover geometric and topological features (e.g., Betti numbers, normals) of this subset from the approximating point cloud data. It appears that the study of distance functions allows one to address many of these questions successfully. However, one of the main limitations of this framework is that it does not cope well with outliers or with background noise. In this paper, we show how to extend the framework of distance functions to overcome this problem. Replacing compact subsets by measures, we introduce a notion of distance function to a probability distribution in ℝ d . These functions share many properties with classical distance functions, which make them suitable for inference purposes. In particular, by considering appropriate level sets of these distance functions, we show that it is possible to reconstruct offsets of sampled shapes with topological guarantees even in the presence of outliers. Moreover, in settings where empirical measures are considered, these functions can be easily evaluated, making them of particular practical interest.  相似文献   

6.
In the presence of huge losses from unsuccessful new product introductions, companies often seek forecast information from various sources. As the information can be costly, companies need to determine how much effort to put into acquiring the information. Such a decision is strategically important because an insufficient investment may cause lack of knowledge of product profitability, which in turn may lead to introducing a loss-making product or scrapping a potentially profitable one. In this paper, we use decision analytical models to study information acquisition for new product introduction. Specifically, we consider a decision maker (DM) who, prior to introducing a new product, can purchase forecasts and use the information to update his knowledge of the market demand. We analyze and compare two approaches: The first approach is to determine the total amount of forecasts to purchase all at once. The second one is to purchase forecasts sequentially and, based on the purchased forecasts, determine whether those forecasts are informative enough for making an introduction decision or an additional forecast is needed. We present dynamic programming formulations for both approaches and derive the optimal policies. Via a numerical study, we find the second approach, i.e., purchasing forecasts sequentially, can generate a significant profit advantage over the first one when (1) the cost of acquiring forecasts is neither too high nor too low, (2) the precision of the forecasts is of a moderate level, and (3) the profit margin of the new product is small.  相似文献   

7.
The Kantorovich–Rubinstein theorem provides a formula for the Wasserstein metric W1 on the space of regular probability Borel measures on a compact metric space. Dudley and de Acosta generalized the theorem to measures on separable metric spaces. Kellerer, using his own work on Monge–Kantorovich duality, obtained a rapid proof for Radon measures on an arbitrary metric space. The object of the present expository article is to give an account of Kellerer’s generalization of the Kantorovich–Rubinstein theorem, together with related matters. It transpires that a more elementary version of Monge–Kantorovich duality than that used by Kellerer suffices for present purposes. The fundamental relations that provide two characterizations of the Wasserstein metric are obtained directly, without the need for prior demonstration of density or duality theorems. The latter are proved, however, and used in the characterization of optimal measures and functions for the Kantorovich–Rubinstein linear programme. A formula of Dobrushin is proved.  相似文献   

8.
S. L. Dance  D. M. Livings  N. K. Nichols 《PAMM》2007,7(1):1026505-1026506
Ensemble square root filters are a method of data assimilation, where model forecasts are combined with observations to produce an improved state estimate, or analysis. There are a number of different algorithms in the literature and it is not clear which of these is the best for any given application. This work shows that in some implementations there can be a systematic bias in the analysis ensemble mean and consequently an accompanying shortfall in the spread of the analysis ensemble as expressed by the ensemble covariance matrix. We have established a set of necessary and sufficient conditions for the scheme to be unbiased. While these conditions are not a cure-all and cannot deal with independent sources of bias such as model and observation errors, they should be useful to designers of ensemble square root filters in the future. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

9.
We study aspects of the Wasserstein distance in the context of self‐similar measures. Computing this distance between two measures involves minimising certain moment integrals over the space of couplings, which are measures on the product space with the original measures as prescribed marginals. We focus our attention on self‐similar measures associated to equicontractive iterated function systems consisting of two maps on the unit interval and satisfying the open set condition. We are particularly interested in understanding the restricted family of self‐similar couplings and our main achievement is the explicit computation of the 1st and 2nd moment integrals for such couplings. We show that this family is enough to yield an explicit formula for the 1st Wasserstein distance and provide non‐trivial upper and lower bounds for the 2nd Wasserstein distance for these self‐similar measures.  相似文献   

10.
For more than a decade, the number of research works that deal with ensemble methods applied to bankruptcy prediction has been increasing. Ensemble techniques present some characteristics that, in most situations, allow them to achieve better forecasts than those estimated with single models. However, the difference between the performance of an ensemble and that of its base classifier but also between that of ensembles themselves, is often low. This is the reason why we studied a way to design an ensemble method that might achieve better forecasts than those calculated with traditional ensembles. It relies on a quantification process of data that characterize the financial situation of a sample of companies using a set of self-organizing neural networks, where each network has two main characteristics: its size is randomly chosen and the variables used to estimate its weights are selected based on a criterion that ensures the fit between the structure of the network and the data used over the learning process. The results of our study show that this technique makes it possible to significantly reduce both the type I and type II errors that can be obtained with conventional methods.  相似文献   

11.
Forecasting methods are routinely employed to predict the outcome of competitive events (CEs) and to shed light on the factors that influence participants’ winning prospects (e.g., in sports events, political elections). Combining statistical models’ forecasts, shown to be highly successful in other settings, has been neglected in CE prediction. Two particular difficulties arise when developing model-based composite forecasts of CE outcomes: the intensity of rivalry among contestants, and the strength/diversity trade-off among individual models. To overcome these challenges we propose a range of surrogate measures of event outcome to construct a heterogeneous set of base forecasts. To effectively extract the complementary information concealed within these predictions, we develop a novel pooling mechanism which accounts for competition among contestants: a stacking paradigm integrating conditional logit regression and log-likelihood-ratio-based forecast selection. Empirical results using data related to horseracing events demonstrate that: (i) base model strength and diversity are important when combining model-based predictions for CEs; (ii) average-based pooling, commonly employed elsewhere, may not be appropriate for CEs (because average-based pooling exclusively focuses on strength); and (iii) the proposed stacking ensemble provides statistically and economically accurate forecasts. These results have important implications for regulators of betting markets associated with CEs and in particular for the accurate assessment of market efficiency.  相似文献   

12.
There exist many simple tools for jointly capturing variability and incomplete information by means of uncertainty representations. Among them are random sets, possibility distributions, probability intervals, and the more recent Ferson’s p-boxes and Neumaier’s clouds, both defined by pairs of possibility distributions. In the companion paper, we have extensively studied a generalized form of p-box and situated it with respect to other models. This paper focuses on the links between clouds and other representations. Generalized p-boxes are shown to be clouds with comonotonic distributions. In general, clouds cannot always be represented by random sets, in fact not even by two-monotone (convex) capacities.  相似文献   

13.
We investigate here the optimal transportation problem on configuration space for the quadratic cost. It is shown that, as usual, provided that the corresponding Wasserstein is finite, there exists one unique optimal measure and that this measure is supported by the graph of the derivative (in the sense of the Malliavin calculus) of a “concave” (in a sense to be defined below) function. For finite point processes, we give a necessary and sufficient condition for the Wasserstein distance to be finite.   相似文献   

14.
High angular resolution diffusion imaging (HARDI) has recently been of great interest in mapping the orientation of intravoxel crossing fibers, and such orientation information allows one to infer the connectivity patterns prevalent among different brain regions and possible changes in such connectivity over time for various neurodegenerative and neuropsychiatric diseases. The aim of this article is to propose a penalized multiscale adaptive regression model (PMARM) framework to spatially and adaptively infer the orientation distribution function (ODF) of water diffusion in regions with complex fiber configurations. In PMARM, we reformulate the HARDI imaging reconstruction as a weighted regularized least-square regression (WRLSR) problem. Similarity and distance weights are introduced to account for spatial smoothness of HARDI, while preserving the unknown discontinuities (e.g., edges between white matter and gray matter) of HARDI. The L1 penalty function is introduced to ensure the sparse solutions of ODFs, while a scaled L1 weighted estimator is calculated to correct the bias introduced by the L1 penalty at each voxel. In PMARM, we integrate the multiscale adaptive regression models, the propagation-separation method, and Lasso (least absolute shrinkage and selection operator) to adaptively estimate ODFs across voxels. Experimental results indicate that PMARM can reduce the angle detection errors on fiber crossing area and provide more accurate reconstruction than standard voxel-wise methods. Supplementary materials for this article are available online.  相似文献   

15.
When using linguistic approaches to solve decision problems, we need linguistic representation models. The symbolic model, the 2-tuple fuzzy linguistic representation model and the continuous linguistic model are three existing linguistic representation models based on position indexes. Together with these three linguistic models, the corresponding ordered weighted averaging operators, such as the linguistic ordered weighted averaging operator, the 2-tuple ordered weighted averaging operator and the extended ordered weighted averaging operator, have been developed, respectively. In this paper, we analyze the internal relationship among these operators, and propose a consensus operator under the continuous linguistic model (or the 2-tuple fuzzy linguistic representation model). The proposed consensus operator is based on the use of the ordered weighted averaging operator and the deviation measures. Some desired properties of the consensus operator are also presented. In particular, the consensus operator provides an alternative consensus model for group decision making. This consensus model preserves the original preference information given by the decision makers as much as possible, and supports consensus process automatically, without moderator.  相似文献   

16.
This article is devoted to providing a theoretical underpinning for ensemble forecasting with rapid fluctuations in body forcing and in boundary conditions. Ensemble averaging principles are proved under suitable “mixing” conditions on random boundary conditions and on random body forcing. The ensemble averaged model is a nonlinear stochastic partial differential equation, with the deviation process (i.e., the approximation error process) quantified as the solution of a linear stochastic partial differential equation.  相似文献   

17.
Breaking of ensemble equivalence between the microcanonical ensemble and the canonical ensemble may occur for random graphs whose size tends to infinity, and is signaled by a non-zero specific relative entropy between the two ensembles. In Garlaschelli et al. (2017) and Garlaschelli et al. (0000) it was shown that breaking occurs when the constraint is put on the degree sequence (configuration model). It is not known what is the effect on the relative entropy when the number of constraints is reduced, i.e., when only part of the nodes are constrained in their degree (and the remaining nodes are left unconstrained). Intuitively, the relative entropy is expected to decrease. However, this is not a trivial issue because when constraints are removed both the microcanonical ensemble and the canonical ensemble change. In this paper a formula for the relative entropy valid for generic discrete random structures, recently formulated by Squartini and Garlaschelli, is used to prove that the relative entropy is monotone in the number of constraints when the constraint is on the degrees of the nodes. It is further shown that the expression for the relative entropy corresponds, in the dense regime, to the degrees in the microcanonical ensemble being asymptotically multivariate Dirac and in the canonical ensemble being asymptotically Gaussian.  相似文献   

18.
In this paper, the dimensional-free Harnack inequalities are established on infinite-dimensional spaces. More precisely, we establish Harnack inequalities for heat semigroup on based loop group and for Ornstein-Uhlenbeck semigroup on the abstract Wiener space. As an application, we establish the HWI inequality on the abstract Wiener space, which contains three important quantities in one inequality, the relative entropy “H”, Wasserstein distance “W”, and Fisher information “I”.  相似文献   

19.
Discrete approximation, which has been the prevailing scheme in stochastic programming in the past decade, has been extended to distributionally robust optimization (DRO) recently. In this paper, we conduct rigorous quantitative stability analysis of discrete approximation schemes for DRO, which measures the approximation error in terms of discretization sample size. For the ambiguity set defined through equality and inequality moment conditions, we quantify the discrepancy between the discretized ambiguity sets and the original set with respect to the Wasserstein metric. To establish the quantitative convergence, we develop a Hoffman error bound theory with Hoffman constant calculation criteria in a infinite dimensional space, which can be regarded as a byproduct of independent interest. For the ambiguity set defined by Wasserstein ball and moment conditions combined with Wasserstein ball, we present similar quantitative stability analysis by taking full advantage of the convex property inherently admitted by Wasserstein metric. Efficient numerical methods for specifically solving discrete approximation DRO problems with thousands of samples are also designed. In particular, we reformulate different types of discrete approximation problems into a class of saddle point problems with completely separable structures. The stochastic primal-dual hybrid gradient (PDHG) algorithm where in each iteration we update a random subset of the sampled variables is then amenable as a solution method for the reformulated saddle point problems. Some preliminary numerical tests are reported.  相似文献   

20.
A random forest (RF) predictor is an ensemble of individual tree predictors. As part of their construction, RF predictors naturally lead to a dissimilarity measure between the observations. One can also define an RF dissimilarity measure between unlabeled data: the idea is to construct an RF predictor that distinguishes the “observed” data from suitably generated synthetic data. The observed data are the original unlabeled data and the synthetic data are drawn from a reference distribution. Here we describe the properties of the RF dissimilarity and make recommendations on how to use it in practice.

An RF dissimilarity can be attractive because it handles mixed variable types well, is invariant to monotonic transformations of the input variables, and is robust to outlying observations. The RF dissimilarity easily deals with a large number of variables due to its intrinsic variable selection; for example, the Addcl 1 RF dissimilarity weighs the contribution of each variable according to how dependent it is on other variables.

We find that the RF dissimilarity is useful for detecting tumor sample clusters on the basis of tumor marker expressions. In this application, biologically meaningful clusters can often be described with simple thresholding rules.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号