首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Model checking is a topic of special interest in statistics. When data are censored, the problem becomes more difficult. This paper employs the relative belief ratio and the beta-Stacy process to develop a method for model checking in the presence of right-censored data. The proposed method for the given model of interest compares the concentration of the posterior distribution to the concentration of the prior distribution using a relative belief ratio. We propose a computational algorithm for the method and then illustrate the method through several data analysis examples.  相似文献   

2.
In a host of business applications, biomedical and epidemiological studies, the problem of multicollinearity among predictor variables is a frequent issue in longitudinal data analysis for linear mixed models (LMM). We consider an efficient estimation strategy for high-dimensional data application, where the dimensions of the parameters are larger than the number of observations. In this paper, we are interested in estimating the fixed effects parameters of the LMM when it is assumed that some prior information is available in the form of linear restrictions on the parameters. We propose the pretest and shrinkage estimation strategies using the ridge full model as the base estimator. We establish the asymptotic distributional bias and risks of the suggested estimators and investigate their relative performance with respect to the ridge full model estimator. Furthermore, we compare the numerical performance of the LASSO-type estimators with the pretest and shrinkage ridge estimators. The methodology is investigated using simulation studies and then demonstrated on an application exploring how effective brain connectivity in the default mode network (DMN) may be related to genetics within the context of Alzheimer’s disease.  相似文献   

3.
Time-varying autoregressive (TVAR) models are widely used for modeling of non-stationary signals. Unfortunately, online joint adaptation of both states and parameters in these models remains a challenge. In this paper, we represent the TVAR model by a factor graph and solve the inference problem by automated message passing-based inference for states and parameters. We derive structured variational update rules for a composite “AR node” with probabilistic observations that can be used as a plug-in module in hierarchical models, for example, to model the time-varying behavior of the hyper-parameters of a time-varying AR model. Our method includes tracking of variational free energy (FE) as a Bayesian measure of TVAR model performance. The proposed methods are verified on a synthetic data set and validated on real-world data from temperature modeling and speech enhancement tasks.  相似文献   

4.
Access to healthcare data such as electronic health records (EHR) is often restricted by laws established to protect patient privacy. These restrictions hinder the reproducibility of existing results based on private healthcare data and also limit new research. Synthetically-generated healthcare data solve this problem by preserving privacy and enabling researchers and policymakers to drive decisions and methods based on realistic data. Healthcare data can include information about multiple in- and out- patient visits of patients, making it a time-series dataset which is often influenced by protected attributes like age, gender, race etc. The COVID-19 pandemic has exacerbated health inequities, with certain subgroups experiencing poorer outcomes and less access to healthcare. To combat these inequities, synthetic data must “fairly” represent diverse minority subgroups such that the conclusions drawn on synthetic data are correct and the results can be generalized to real data. In this article, we develop two fairness metrics for synthetic data, and analyze all subgroups defined by protected attributes to analyze the bias in three published synthetic research datasets. These covariate-level disparity metrics revealed that synthetic data may not be representative at the univariate and multivariate subgroup-levels and thus, fairness should be addressed when developing data generation methods. We discuss the need for measuring fairness in synthetic healthcare data to enable the development of robust machine learning models to create more equitable synthetic healthcare datasets.  相似文献   

5.
Zhongwei Huang  Zhenwei Shi  Zhen Qin 《Optik》2013,124(24):6594-6598
Target detection in hyperspectral images is an important task. In this paper, we propose a sparsity based algorithm for target detection in hyperspectral images. In sparsity model, each hyperspectral pixel is represented by a linear combination of a few samples from an overcomplete dictionary, and the weighted vector for such reconstruction is sparse. This model has been applied in hyperspectral target detection and solved with several greedy algorithms. As conventional greedy algorithms may be trapped into a local optimum, we consider an alternative way to regularize the model and find a more accurate solution to the model. The proposed method is based on convex relaxation technique. The original sparse representation problem is regularized with a properly designed weighted ?1 minimization and effectively solved with existing solver. The experiments on synthetic and real hyperspectral data suggest that the proposed algorithm outperforms the classical sparsity-based detection algorithms, such as Simultaneous Orthogonal Matching Pursuit (SOMP) and Simultaneous Subspace Pursuit (SSP) and conventional ?1 minimization.  相似文献   

6.
In this paper, we study the phase transition property of an Ising model defined on a special random graph—the stochastic block model (SBM). Based on the Ising model, we propose a stochastic estimator to achieve the exact recovery for the SBM. The stochastic algorithm can be transformed into an optimization problem, which includes the special case of maximum likelihood and maximum modularity. Additionally, we give an unbiased convergent estimator for the model parameters of the SBM, which can be computed in constant time. Finally, we use metropolis sampling to realize the stochastic estimator and verify the phase transition phenomenon thfough experiments.  相似文献   

7.
基于基追踪去噪的水声正交频分复用稀疏信道估计   总被引:1,自引:0,他引:1       下载免费PDF全文
尹艳玲  乔钢  刘凇佐  周锋 《物理学报》2015,64(6):64301-064301
针对传统的l2-范数信道估计精度低的问题, 提出了一种基于基追踪去噪(BPDN)的水声正交频分复用稀疏信道估计方法, 该方法针对水声信道的稀疏特性, 利用少量的观测值即可以很高的精度估计出信道冲激响应. 与贪婪追踪类算法相比, 基于BPDN算法的稀疏信号估计具有全局最优解, 采用l2-l1范数准则估计信号, 同时考虑了观测值含噪情况, 通过调整正则化参数控制估计信号稀疏度和残余误差之间的平衡. 仿真分析了导频分布、正则化参数等对BPDN 算法的影响以及BPDN算法与最小平方(LS)、正交匹配追踪(OMP)信道估计算法的性能. 湖试结果表明, 在稀疏信道下, 基于BPDN的信道估计方法明显优于LS和OMP信道估计方法.  相似文献   

8.
The research concerns data collected in independent sets—more specifically, in local decision tables. A possible approach to managing these data is to build local classifiers based on each table individually. In the literature, many approaches toward combining the final prediction results of independent classifiers can be found, but insufficient efforts have been made on the study of tables’ cooperation and coalitions’ formation. The importance of such an approach was expected on two levels. First, the impact on the quality of classification—the ability to build combined classifiers for coalitions of tables should allow for the learning of more generalized concepts. In turn, this should have an impact on the quality of classification of new objects. Second, combining tables into coalitions will result in reduced computational complexity—a reduced number of classifiers will be built. The paper proposes a new method for creating coalitions of local tables and generating an aggregated classifier for each coalition. Coalitions are generated by determining certain characteristics of attribute values occurring in local tables and applying the Pawlak conflict analysis model. In the study, the classification and regression trees with Gini index are built based on the aggregated table for one coalition. The system bears a hierarchical structure, as in the next stage the decisions generated by the classifiers for coalitions are aggregated using majority voting. The classification quality of the proposed system was compared with an approach that does not use local data cooperation and coalition creation. The structure of the system is parallel and decision trees are built independently for local tables. In the paper, it was shown that the proposed approach provides a significant improvement in classification quality and execution time. The Wilcoxon test confirmed that differences in accuracy rate of the results obtained for the proposed method and results obtained without coalitions are significant, with a p level = 0.005. The average accuracy rate values obtained for the proposed approach and the approach without coalitions are, respectively: 0.847 and 0.812; so the difference is quite large. Moreover, the algorithm implementing the proposed approach performed up to 21-times faster than the algorithm implementing the approach without using coalitions.  相似文献   

9.
In this paper, the parameter estimation problem of a truncated normal distribution is discussed based on the generalized progressive hybrid censored data. The desired maximum likelihood estimates of unknown quantities are firstly derived through the Newton–Raphson algorithm and the expectation maximization algorithm. Based on the asymptotic normality of the maximum likelihood estimators, we develop the asymptotic confidence intervals. The percentile bootstrap method is also employed in the case of the small sample size. Further, the Bayes estimates are evaluated under various loss functions like squared error, general entropy, and linex loss functions. Tierney and Kadane approximation, as well as the importance sampling approach, is applied to obtain the Bayesian estimates under proper prior distributions. The associated Bayesian credible intervals are constructed in the meantime. Extensive numerical simulations are implemented to compare the performance of different estimation methods. Finally, an authentic example is analyzed to illustrate the inference approaches.  相似文献   

10.
Described here is a path integral, sampling-based approach for data assimilation, of sequential data and evolutionary models. Since it makes no assumptions on linearity in the dynamics, or on Gaussianity in the statistics, it permits consideration of very general estimation problems. The method can be used for such tasks as computing a smoother solution, parameter estimation, and data/model initialization.Speedup in the Monte Carlo sampling process is essential if the path integral method has any chance of being a viable estimator on moderately large problems. Here a variety of strategies are proposed and compared for their relative ability to improve the sampling efficiency of the resulting estimator. Provided as well are details useful for its implementation and testing.The method is applied to a problem in which standard methods are known to fail, an idealized flow/drifter problem, which has been used as a testbed for assimilation strategies involving Lagrangian data. It is in this kind of context that the method may prove to be a useful assimilation tool in oceanic studies.  相似文献   

11.
The ever-increasing travel demand has brought great challenges to the organization, operation, and management of the subway system. An accurate estimation of passenger flow distribution can help subway operators design corresponding operation plans and strategies scientifically. Although some literature has studied the problem of passenger flow distribution by analyzing the passengers’ path choice behaviors based on AFC (automated fare collection) data, few studies focus on the passenger flow distribution while considering the passenger–train matching probability, which is the key problem of passenger flow distribution. Specifically, the existing methods have not been applied to practical large-scale subway networks due to the computational complexity. To fill this research gap, this paper analyzes the relationship between passenger travel behavior and train operation in the space and time dimension and formulates the passenger–train matching probability by using multi-source data including AFC, train timetables, and network topology. Then, a reverse derivation method, which can reduce the scale of possible train combinations for passengers, is proposed to improve the computational efficiency. Simultaneously, an estimation method of passenger flow distribution is presented based on the passenger–train matching probability. Finally, two sets of experiments, including an accuracy verification experiment based on synthetic data and a comparison experiment based on real data from the Beijing subway, are conducted to verify the effectiveness of the proposed method. The calculation results show that the proposed method has a good accuracy and computational efficiency for a large-scale subway network.  相似文献   

12.
The estimation of the Individual Treatment Effect (ITE) on survival time is an important research topic in clinics-based causal inference. Various representation learning methods have been proposed to deal with its three key problems, i.e., reducing selection bias, handling censored survival data, and avoiding balancing non-confounders. However, none of them consider all three problems in a single method. In this study, by combining the Counterfactual Survival Analysis (CSA) model and Dragonnet from the literature, we first propose a CSA–Dragonnet to deal with the three problems simultaneously. Moreover, we found that conclusions from traditional Randomized Controlled Trials (RCTs) or Retrospective Cohort Studies (RCSs) can offer valuable bound information to the counterfactual learning of ITE, which has never been used by existing ITE estimation methods. Hence, we further propose a CSA–Dragonnet with Embedded Prior Knowledge (CDNEPK) by formulating a unified expression of the prior knowledge given by RCTs or RCSs, inserting counterfactual prediction nets into CSA–Dragonnet and defining loss items based on the bounds for the ITE extracted from prior knowledge. Semi-synthetic data experiments showed that CDNEPK has superior performance. Real-world experiments indicated that CDNEPK can offer meaningful treatment advice.  相似文献   

13.
Although commercial motion-capture systems have been widely used in various applications, the complex setup limits their application scenarios for ordinary consumers. To overcome the drawbacks of wearability, human posture reconstruction based on a few wearable sensors have been actively studied in recent years. In this paper, we propose a deep-learning-based sparse inertial sensor human posture reconstruction method. This method uses bidirectional recurrent neural network (Bi-RNN) to build an a priori model from a large motion dataset to build human motion, thereby the low-dimensional motion measurements are mapped to whole-body posture. To improve the motion reconstruction performance for specific application scenarios, two fundamental problems in the model construction are investigated: training data selection and sparse sensor placement. The problem of deep-learning training data selection is to select independent and identically distributed (IID) data for a certain scenario from the accumulated imbalanced motion dataset with sufficient information. We formulate the data selection into an optimization problem to obtain continuous and IID data segments, which comply with a small reference dataset collected from the target scenario. A two-step heuristic algorithm is proposed to solve the data selection problem. On the other hand, the optimal sensor placement problem is studied to exploit most information from partial observation of human movement. A method for evaluating the motion information amount of any group of wearable inertial sensors based on mutual information is proposed, and a greedy searching method is adopted to obtain the approximate optimal sensor placement of a given sensor number, so that the maximum motion information and minimum redundancy is achieved. Finally, the human posture reconstruction performance is evaluated with different training data and sensor placement selection methods, and experimental results show that the proposed method takes advantages in both posture reconstruction accuracy and model training time. In the 6 sensors configuration, the posture reconstruction errors of our model for walking, running, and playing basketball are 7.25°, 8.84°, and 14.13°, respectively.  相似文献   

14.
Estimating sentence-like units and sentence boundaries in human language is an important task in the context of natural language understanding. While this topic has been considered using a range of techniques, including rule-based approaches and supervised and unsupervised algorithms, a common aspect of these methods is that they inherently rely on a priori knowledge of human language in one form or another. Recently we have been exploring synthetic languages based on the concept of modeling behaviors using emergent languages. These synthetic languages are characterized by a small alphabet and limited vocabulary and grammatical structure. A particular challenge for synthetic languages is that there is generally no a priori language model available, which limits the use of many natural language processing methods. In this paper, we are interested in exploring how it may be possible to discover natural ‘chunks’ in synthetic language sequences in terms of sentence-like units. The problem is how to do this with no linguistic or semantic language model. Our approach is to consider the problem from the perspective of information theory. We extend the basis of information geometry and propose a new concept, which we term information topology, to model the incremental flow of information in natural sequences. We introduce an information topology view of the incremental information and incremental tangent angle of the Wasserstein-1 distance of the probabilistic symbolic language input. It is not suggested as a fully viable alternative for sentence boundary detection per se but provides a new conceptual method for estimating the structure and natural limits of information flow in language sequences but without any semantic knowledge. We consider relevant existing performance metrics such as the F-measure and indicate limitations, leading to the introduction of a new information-theoretic global performance based on modeled distributions. Although the methodology is not proposed for human language sentence detection, we provide some examples using human language corpora where potentially useful results are shown. The proposed model shows potential advantages for overcoming difficulties due to the disambiguation of complex language and potential improvements for human language methods.  相似文献   

15.
《X射线光谱测定》2004,33(4):301-305
The quantitative aspects of a use of random left‐censoring as a statistical approach accounting for the detection limit effects in x‐ray fluorescence (XRF) analysis were investigated using the Monte Carlo simulations. More precisely, the performance of the Kaplan–Meier method applied to the estimation of the original concentration distributions from detection limit censored concentration measurements are discussed. The simulations were performed for assumed log‐normal and log‐stable concentration distributions, which are known to model fairly well both concentrations and detection limits for biomedical and environmental samples. In particular, the question of the accuracy of the estimation of the mean value and median for the discussed concentration distributions using the Kaplan–Meier estimator was addressed. It is demonstrated that both the mean value and median of the concentration distribution can be estimated from censored data fairly precisely, typically within 4% for the log‐normal and within 15% for log‐stable models, even for substantial censoring levels (up to 80%). Moreover, the estimation of the median is much more precise than that of the mean value, in particular for stable distributions. Finally, the simulations show that random left‐censoring can be recommended as a standard tool for analysing the detection limit censored concentration measurements of trace elements in XRF analysis. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

16.
Entropy estimation faces numerous challenges when applied to various real-world problems. Our interest is in divergence and entropy estimation algorithms which are capable of rapid estimation for natural sequence data such as human and synthetic languages. This typically requires a large amount of data; however, we propose a new approach which is based on a new rank-based analytic Zipf–Mandelbrot–Li probabilistic model. Unlike previous approaches, which do not consider the nature of the probability distribution in relation to language; here, we introduce a novel analytic Zipfian model which includes linguistic constraints. This provides more accurate distributions for natural sequences such as natural or synthetic emergent languages. Results are given which indicates the performance of the proposed ZML model. We derive an entropy estimation method which incorporates the linguistic constraint-based Zipf–Mandelbrot–Li into a new non-equiprobable coincidence counting algorithm which is shown to be effective for tasks such as entropy rate estimation with limited data.  相似文献   

17.
In this article, we consider a version of the challenging problem of learning from datasets whose size is too limited to allow generalisation beyond the training set. To address the challenge, we propose to use a transfer learning approach whereby the model is first trained on a synthetic dataset replicating features of the original objects. In this study, the objects were smartphone photographs of near-complete Roman terra sigillata pottery vessels from the collection of the Museum of London. Taking the replicated features from published profile drawings of pottery forms allowed the integration of expert knowledge into the process through our synthetic data generator. After this first initial training the model was fine-tuned with data from photographs of real vessels. We show, through exhaustive experiments across several popular deep learning architectures, different test priors, and considering the impact of the photograph viewpoint and excessive damage to the vessels, that the proposed hybrid approach enables the creation of classifiers with appropriate generalisation performance. This performance is significantly better than that of classifiers trained exclusively on the original data, which shows the promise of the approach to alleviate the fundamental issue of learning from small datasets.  相似文献   

18.
This paper investigates the statistical inference of inverse power Lomax distribution parameters under progressive first-failure censored samples. The maximum likelihood estimates (MLEs) and the asymptotic confidence intervals are derived based on the iterative procedure and asymptotic normality theory of MLEs, respectively. Bayesian estimates of the parameters under squared error loss and generalized entropy loss function are obtained using independent gamma priors. For Bayesian computation, Tierney–Kadane’s approximation method is used. In addition, the highest posterior credible intervals of the parameters are constructed based on the importance sampling procedure. A Monte Carlo simulation study is carried out to compare the behavior of various estimates developed in this paper. Finally, a real data set is analyzed for illustration purposes.  相似文献   

19.
The viewpoint taken in this paper is that data assimilation is fundamentally a statistical problem and that this problem should be cast in a Bayesian framework. In the absence of model error, the correct solution to the data assimilation problem is to find the posterior distribution implied by this Bayesian setting. Methods for dealing with data assimilation should then be judged by their ability to probe this distribution. In this paper we propose a range of techniques for probing the posterior distribution, based around the Langevin equation; and we compare these new techniques with existing methods.

When the underlying dynamics is deterministic, the posterior distribution is on the space of initial conditions leading to a sampling problem over this space. When the underlying dynamics is stochastic the posterior distribution is on the space of continuous time paths. By writing down a density, and conditioning on observations, it is possible to define a range of Markov Chain Monte Carlo (MCMC) methods which sample from the desired posterior distribution, and thereby solve the data assimilation problem. The basic building-blocks for the MCMC methods that we concentrate on in this paper are Langevin equations which are ergodic and whose invariant measures give the desired distribution; in the case of path space sampling these are stochastic partial differential equations (SPDEs).

Two examples are given to show how data assimilation can be formulated in a Bayesian fashion. The first is weather prediction, and the second is Lagrangian data assimilation for oceanic velocity fields. Furthermore the relationship between the Bayesian approach outlined here and the commonly used Kalman filter based techniques, prevalent in practice, is discussed. Two simple pedagogical examples are studied to illustrate the application of Bayesian sampling to data assimilation concretely. Finally a range of open mathematical and computational issues, arising from the Bayesian approach, are outlined.  相似文献   


20.
Diffuse optical tomography (DOT) is a non-linear, ill-posed, boundary value and optimization problem which necessitates regularization. Also, Bayesian methods are suitable owing to measurements data are sparse and correlated. In such problems which are solved with iterative methods, for stabilization and better convergence, the solution space must be small. These constraints subject to extensive and overdetermined system of equations which model retrieving criteria specially total least squares (TLS) must to refine model error. Using TLS is limited to linear systems which is not achievable when applying traditional Bayesian methods. This paper presents an efficient method for model refinement using regularized total least squares (RTLS) for treating on linearized DOT problem, having maximum a posteriori (MAP) estimator and Tikhonov regulator. This is done with combination Bayesian and regularization tools as preconditioner matrices, applying them to equations and then using RTLS to the resulting linear equations. The preconditioning matrixes are guided by patient specific information as well as a priori knowledge gained from the training set. Simulation results illustrate that proposed method improves the image reconstruction performance and localize the abnormally well.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号