首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
SiZer (significant zero crossing of the derivatives) is a multiscale smoothing method for exploring trends, maxima, and minima in data. In this article, a regression spline version of SiZer is proposed in a nonparametric regression setting by the fiducial method. The number of knots for spline interpolation is used as the scale parameter of the new SiZer, which controls the smoothness of estimate. In the construction of the new SiZer, multiple testing adjustment is made to control the row-wise false discovery rate (FDR) of SiZer. This adjustment is appealing for exploratory data analysis and has potential to increase the power. A special map is also produced on a continuous scale using p-values to assess the significance of features. Simulations and a real data application are carried out to investigate the performance of the proposed SiZer, in which several comparisons with other existing SiZers are presented. Supplementary materials for this article are available online.  相似文献   

2.
The SiZer methodology proposed by Chaudhuri and Marron (1999) is a valuable tool for conducting exploratory data analysis. Since its inception different versions of SiZer have been proposed in the literature. Most of these SiZer variants are targeting the mean structure of the data, and are incapable of providing any information about the quantile composition of the data. To fill this need, this article proposes a quantile version of SiZer for the regression setting. By inspecting the SiZer maps produced by this new SiZer, real quantile structures hidden in a dataset can be more effectively revealed, while at the same time spurious features can be filtered out. The utility of this quantile SiZer is illustrated via applications to both real data and simulated examples. This article has supplementary material online.  相似文献   

3.
This work develops a Bayesian approach to perform inference and prediction in Gaussian random fields based on spatial censored data. These type of data occur often in the earth sciences due either to limitations of the measuring device or particular features of the sampling process used to collect the data. Inference and prediction on the underlying Gaussian random field is performed, through data augmentation, by using Markov chain Monte Carlo methods. Previous approaches to deal with spatial censored data are reviewed, and their limitations pointed out. The proposed Bayesian approach is applied to a spatial dataset of depths of a geologic horizon that contains both left- and right-censored data, and comparisons are made between inferences based on the censored data and inferences based on “complete data” obtained by two imputation methods. It is seen that the differences in inference between the two approaches can be substantial.  相似文献   

4.
A Bayesian solution is given to the problem of making inferences about an unknown number of structural changes in a sequence of observations. Inferences are based on the posterior distribution of the number of change points and on the posterior probabilities of possible change points. Detailed analyses are given for binomial data and some regression problems, and numerical illustrations are provided. In addition, an approximation procedure to compute the posterior probabilities is presented.  相似文献   

5.
We consider the problem of deciding the best action time when observations are made sequentially. Specifically we address a special type of optimal stopping problem where observations are made from state-contingent distributions and there exists uncertainty on the state. In this paper, the decision-maker's belief on state is revised sequentially based on the previous observations. By using the independence property of the observations from a given distribution, the sequential Bayesian belief revision process is represented as a simple recursive form. The methodology developed in this paper provides a new theoretical framework for addressing the uncertainty on state in the action-timing problem context. By conducting a simulation analysis, we demonstrate the value of applying Bayesian strategy which uses sequential belief revision process. In addition, we evaluate the value of perfect information to gain more insight on the effects of using Bayesian strategy in the problem.  相似文献   

6.
Testing for nonindependence among the residuals from a regression or time series model is a common approach to evaluating the adequacy of a fitted model. This idea underlies the familiar Durbin–Watson statistic, and previous works illustrate how the spatial autocorrelation among residuals can be used to test a candidate linear model. We propose here that a version of Moran's I statistic for spatial autocorrelation, applied to residuals from a fitted model, is a practical general tool for selecting model complexity under the assumption of iid additive errors. The “space” is defined by the independent variables, and the presence of significant spatial autocorrelation in residuals is evidence that a more complex model is needed to capture all of the structure in the data. An advantage of this approach is its generality, which results from the fact that no properties of the fitted model are used other than consistency. The problem of smoothing parameter selection in nonparametric regression is used to illustrate the performance of model selection based on residual spatial autocorrelation (RSA). In simulation trials comparing RSA with established selection criteria based on minimizing mean square prediction error, smooths selected by RSA exhibit fewer spurious features such as minima and maxima. In some cases, at higher noise levels, RSA smooths achieved a lower average mean square error than smooths selected by GCV. We also briefly describe a possible modification of the method for non-iid errors having short-range correlations, for example, time-series errors or spatial data. Some other potential applications are suggested, including variable selection in regression models.  相似文献   

7.
Summary New Bayesian cohort models designed to resolve the identification problem in cohort analysis are proposed in this paper. At first, the basic cohort model which represents the statistical structure of time-series social survey data in terms of age, period and cohort effects is explained. The logit cohort model for qualitative data from a binomial distribution and the normal-type cohort model for quantitative data from a normal distribution are considered as two special cases of the basic model. In order to overcome the identification problem in cohort analysis, a Bayesian approach is adopted, based on the assumption that the effect parameters change gradually. A Bayesian information criterion ABIC is introduced for the selection of the optimal model. This approach is so flexible that both the logit and the normal-type cohort models can be made applicable, not only to standard cohort tables but also to general cohort tables in which the range of age group is not equal to the interval between periods. The practical utility of the proposed models is demonstrated by analysing two data sets from the literature on cohort analysis. The Institute of Statistical Mathematics  相似文献   

8.
Conditional autoregressive (CAR) models have been extensively used for the analysis of spatial data in diverse areas, such as demography, economy, epidemiology and geography, as models for both latent and observed variables. In the latter case, the most common inferential method has been maximum likelihood, and the Bayesian approach has not been used much. This work proposes default (automatic) Bayesian analyses of CAR models. Two versions of Jeffreys prior, the independence Jeffreys and Jeffreys-rule priors, are derived for the parameters of CAR models and properties of the priors and resulting posterior distributions are obtained. The two priors and their respective posteriors are compared based on simulated data. Also, frequentist properties of inferences based on maximum likelihood are compared with those based on the Jeffreys priors and the uniform prior. Finally, the proposed Bayesian analysis is illustrated by fitting a CAR model to a phosphate dataset from an archaeological region.  相似文献   

9.
Feature Selection (FS) is an important pre-processing step in data mining and classification tasks. The aim of FS is to select a small subset of most important and discriminative features. All the traditional feature selection methods assume that the entire input feature set is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with time as new features stream in. A critical challenge for online streaming feature selection (OSFS) is the unavailability of the entire feature set before learning starts. Several efforts have been made to address the OSFS problem, however they all need some prior knowledge about the entire feature space to select informative features. In this paper, the OSFS problem is considered from the rough sets (RS) perspective and a new OSFS algorithm, called OS-NRRSAR-SA, is proposed. The main motivation for this consideration is that RS-based data mining does not require any domain knowledge other than the given dataset. The proposed algorithm uses the classical significance analysis concepts in RS theory to control the unknown feature space in OSFS problems. This algorithm is evaluated extensively on several high-dimensional datasets in terms of compactness, classification accuracy, run-time, and robustness against noises. Experimental results demonstrate that the algorithm achieves better results than existing OSFS algorithms, in every way.  相似文献   

10.
Smoothing splines are an attractive method for scatterplot smoothing. The SiZer approach to statistical inference is adapted to this smoothing method, named SiZerSS. This allows quick and sure inference as to “which features in the smooth are really there” as opposed to “which are due to sampling artifacts”, when using smoothing splines for data analysis. Applications of SiZerSS to mode, linearity, quadraticity and monotonicity tests are illustrated using a real data example. Some small scale simulations are presented to demonstrate that the SiZerSS and the SiZerLL (the original local linear version of SiZer) often give similar performance in exploring data structure but they can not replace each other completely. Marron’s research was supported by the Dept. of Stat. and Appl. Prob., National Univ. of Singapore, and by the National Science Foundation Grant DMS-9971649. Zhang’s research was supported by the National Univ. of Singapore Academic Research grant R-155-000-023-112. The Editor, the Associate Editor, and the referees are appreciated for their invaluable comments and suggestions that help improve the article significantly.  相似文献   

11.
To down-weight the influence of the distributional deviations and outliers, in this paper, we carry out robust Bayesian analysis for general factor analytic model combined with normal scale mixture model. Gibbs sampler is used to draw random observations from the posterior. Statistical inferences are carried out based on the empirical distribution of these observations. Two real data sets are analyzed to illustrate the effectiveness of the proposed method.  相似文献   

12.
In this paper, we address the problem of learning discrete Bayesian networks from noisy data. A graphical model based on a mixture of Gaussian distributions with categorical mixing structure coming from a discrete Bayesian network is considered. The network learning is formulated as a maximum likelihood estimation problem and performed by employing an EM algorithm. The proposed approach is relevant to a variety of statistical problems for which Bayesian network models are suitable—from simple regression analysis to learning gene/protein regulatory networks from microarray data.  相似文献   

13.
We examine a contracting problem with asymmetric information in a monopoly pricing setting. Traditionally, the problem is modeled as a one-period Bayesian game, where the incomplete information about the buyers’ preferences is handled with some subjective probability distribution. Here we suggest an iterative online method to solve the problem. We show that, when the buyers behave myopically, the seller can learn the optimal tariff by selling the product repeatedly. In a practical modification of the method, the seller offers linear tariffs and adjusts them until optimality is reached. The adjustment can be seen as gradient adjustment, and it can be done with limited information and so that it benefits both the seller and the buyers. Our method uses special features of the problem and it is easily implementable.  相似文献   

14.
The coefficient of variation (CV) of a population is defined as the ratio of the population standard deviation to the population mean. It is regarded as a measure of stability or uncertainty, and can indicate the relative dispersion of data in the population to the population mean. CV is a dimensionless measure of scatter or dispersion and is readily interpretable, as opposed to other commonly used measures such as standard deviation, mean absolute deviation or error factor, which are only interpretable for the lognormal distribution. CV is often estimated by the ratio of the sample standard deviation to the sample mean, called the sample CV. Even for the normal distribution, the exact distribution of the sample CV is difficult to obtain, and hence it is difficult to draw inferences regarding the population CV in the frequentist frame. Different methods of estimating the sample standard deviation as well as the sample mean result in different shapes of the sampling distribution of the sample CV, from which inferences about the population CV can be made. In this paper we propose a simulation-based Bayesian approach to tackle this problem. A set of real data is used to generate the sampling distribution of the CV under the assumption that the data follow the three-parameter Gamma distribution. A probability interval is then constructed. The method also applies easily to lognormal and Weibull distributions.  相似文献   

15.
Bayesian networks (BNs) are widely used graphical models usable to draw statistical inference about directed acyclic graphs. We presented here Graph_sampler a fast free C language software for structural inference on BNs. Graph_sampler uses a fully Bayesian approach in which the marginal likelihood of the data and prior information about the network structure are considered. This new software can handle both the continuous as well as discrete data and based on the data type two different models are formulated. The software also provides a wide variety of structure prior which can depict either the global or local properties of the graph structure. Now based on the type of structure prior selected, we considered a wide range of possible values for the prior making it either informative or uninformative. We proposed a new and much faster jumping kernel strategy in the Metropolis–Hastings algorithm. The source C code distributed is very compact, fast, uses low memory and disk storage. We performed out several analyses based on different simulated data sets and synthetic as well as real networks to discuss the performance of Graph_sampler.  相似文献   

16.
The relationship between viral load and CD4 cell count is one of the interesting questions in AIDS research. Statistical models are powerful tools for clarifying this important problem. Partially linear mixed-effects (PLME) model which accounts for the unknown function of time effect is one of the important models for this purpose. Meanwhile, the mixed-effects modeling approach is suitable for the longitudinal data analysis. However, the complex process of data collection in clinical trials has made it impossible to rely on one particular model to address the issues. Asymmetric distribution, measurement error and left censoring are features commonly arisen in longitudinal studies. It is crucial to take into account these features in the modeling process to achieve reliable estimation and valid conclusion. In this article, we establish a joint model that accounts for all these features in the framework of PLME models. A Bayesian inferential procedure is proposed to estimate parameters in the joint model. A real data example is analyzed to demonstrate the proposed modeling approach for inference and the results are reported by comparing various scenarios-based models.  相似文献   

17.
Geospatial reasoning has been an essential aspect of military planning since the invention of cartography. Although maps have always been a focal point for developing situational awareness, the dawning era of network-centric operations brings the promise of unprecedented battlefield advantage due to improved geospatial situational awareness. Geographic information systems (GIS) and GIS-based decision support systems are ubiquitous within current military forces, as well as civil and humanitarian organizations. Understanding the quality of geospatial data is essential to using it intelligently. A systematic approach to data quality requires: estimating and describing the quality of data as they are collected; recording the data quality as metadata; propagating uncertainty through models for data processing; exploiting uncertainty appropriately in decision support tools; and communicating to the user the uncertainty in the final product. There are shortcomings in the state-of-the-practice in GIS applications in dealing with uncertainty. No single point solution can fully address the problem. Rather, a system-wide approach is necessary. Bayesian reasoning provides a principled and coherent framework for representing knowledge about data quality, drawing inferences from data of varying quality, and assessing the impact of data quality on modeled effects. Use of a Bayesian approach also drives a requirement for appropriate probabilistic information in geospatial data quality metadata. This paper describes our research on data quality for military applications of geospatial reasoning, and describes model views appropriate for model builders, analysts, and end users.  相似文献   

18.
When the data has heavy tail feature or contains outliers, conventional variable selection methods based on penalized least squares or likelihood functions perform poorly. Based on Bayesian inference method, we study the Bayesian variable selection problem for median linear models. The Bayesian estimation method is proposed by using Bayesian model selection theory and Bayesian estimation method through selecting the Spike and Slab prior for regression coefficients, and the effective posterior Gibbs sampling procedure is also given. Extensive numerical simulations and Boston house price data analysis are used to illustrate the effectiveness of the proposed method.  相似文献   

19.
在贝叶斯库存控制研究中一个著名的结论是:当缺货需求不能被观测到时,最优贝叶斯库存水平总会高于短视策略库存水平,原因是决策者需要通过多订货来获取对需求分布的认识. 这是基于风险中性的研究,然后现实中决策者都期望规避风险. 基于贝叶斯信息更新研究了风险规避背景下需求部分可观测的多周期报童问题,决策者的周期内效用函数满足独立可加性公理. 通过引入非正规化概率,研究发现,对风险规避的决策者,当其效用函数具有不变绝对风险规避特征时,最优贝叶斯库存水平也会高于短视策略库存水平. 非正规化概率简化了动态规划方程与结果的证明.  相似文献   

20.
Credal nets are probabilistic graphical models which extend Bayesian nets to cope with sets of distributions. An algorithm for approximate credal network updating is presented. The problem in its general formulation is a multilinear optimization task, which can be linearized by an appropriate rule for fixing all the local models apart from those of a single variable. This simple idea can be iterated and quickly leads to accurate inferences. A transformation is also derived to reduce decision making in credal networks based on the maximality criterion to updating. The decision task is proved to have the same complexity of standard inference, being NPPP-complete for general credal nets and NP-complete for polytrees. Similar results are derived for the E-admissibility criterion. Numerical experiments confirm a good performance of the method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号