首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In a host of business applications, biomedical and epidemiological studies, the problem of multicollinearity among predictor variables is a frequent issue in longitudinal data analysis for linear mixed models (LMM). We consider an efficient estimation strategy for high-dimensional data application, where the dimensions of the parameters are larger than the number of observations. In this paper, we are interested in estimating the fixed effects parameters of the LMM when it is assumed that some prior information is available in the form of linear restrictions on the parameters. We propose the pretest and shrinkage estimation strategies using the ridge full model as the base estimator. We establish the asymptotic distributional bias and risks of the suggested estimators and investigate their relative performance with respect to the ridge full model estimator. Furthermore, we compare the numerical performance of the LASSO-type estimators with the pretest and shrinkage ridge estimators. The methodology is investigated using simulation studies and then demonstrated on an application exploring how effective brain connectivity in the default mode network (DMN) may be related to genetics within the context of Alzheimer’s disease.  相似文献   

2.
Maximum entropy network ensembles have been very successful in modelling sparse network topologies and in solving challenging inference problems. However the sparse maximum entropy network models proposed so far have fixed number of nodes and are typically not exchangeable. Here we consider hierarchical models for exchangeable networks in the sparse limit, i.e., with the total number of links scaling linearly with the total number of nodes. The approach is grand canonical, i.e., the number of nodes of the network is not fixed a priori: it is finite but can be arbitrarily large. In this way the grand canonical network ensembles circumvent the difficulties in treating infinite sparse exchangeable networks which according to the Aldous-Hoover theorem must vanish. The approach can treat networks with given degree distribution or networks with given distribution of latent variables. When only a subgraph induced by a subset of nodes is known, this model allows a Bayesian estimation of the network size and the degree sequence (or the sequence of latent variables) of the entire network which can be used for network reconstruction.  相似文献   

3.
Latent Variable Models (LVMs) are well established tools to accomplish a range of different data processing tasks. Applications exploit the ability of LVMs to identify latent data structure in order to improve data (e.g., through denoising) or to estimate the relation between latent causes and measurements in medical data. In the latter case, LVMs in the form of noisy-OR Bayes nets represent the standard approach to relate binary latents (which represent diseases) to binary observables (which represent symptoms). Bayes nets with binary representation for symptoms may be perceived as a coarse approximation, however. In practice, real disease symptoms can range from absent over mild and intermediate to very severe. Therefore, using diseases/symptoms relations as motivation, we here ask how standard noisy-OR Bayes nets can be generalized to incorporate continuous observables, e.g., variables that model symptom severity in an interval from healthy to pathological. This transition from binary to interval data poses a number of challenges including a transition from a Bernoulli to a Beta distribution to model symptom statistics. While noisy-OR-like approaches are constrained to model how causes determine the observables’ mean values, the use of Beta distributions additionally provides (and also requires) that the causes determine the observables’ variances. To meet the challenges emerging when generalizing from Bernoulli to Beta distributed observables, we investigate a novel LVM that uses a maximum non-linearity to model how the latents determine means and variances of the observables. Given the model and the goal of likelihood maximization, we then leverage recent theoretical results to derive an Expectation Maximization (EM) algorithm for the suggested LVM. We further show how variational EM can be used to efficiently scale the approach to large networks. Experimental results finally illustrate the efficacy of the proposed model using both synthetic and real data sets. Importantly, we show that the model produces reliable results in estimating causes using proofs of concepts and first tests based on real medical data and on images.  相似文献   

4.
Accurate evaluation of Bayesian model evidence for a given data set is a fundamental problem in model development. Since evidence evaluations are usually intractable, in practice variational free energy (VFE) minimization provides an attractive alternative, as the VFE is an upper bound on negative model log-evidence (NLE). In order to improve tractability of the VFE, it is common to manipulate the constraints in the search space for the posterior distribution of the latent variables. Unfortunately, constraint manipulation may also lead to a less accurate estimate of the NLE. Thus, constraint manipulation implies an engineering trade-off between tractability and accuracy of model evidence estimation. In this paper, we develop a unifying account of constraint manipulation for variational inference in models that can be represented by a (Forney-style) factor graph, for which we identify the Bethe Free Energy as an approximation to the VFE. We derive well-known message passing algorithms from first principles, as the result of minimizing the constrained Bethe Free Energy (BFE). The proposed method supports evaluation of the BFE in factor graphs for model scoring and development of new message passing-based inference algorithms that potentially improve evidence estimation accuracy.  相似文献   

5.
Extracting latent nonlinear dynamics from observed time-series data is important for understanding a dynamic system against the background of the observed data. A state space model is a probabilistic graphical model for time-series data, which describes the probabilistic dependence between latent variables at subsequent times and between latent variables and observations. Since, in many situations, the values of the parameters in the state space model are unknown, estimating the parameters from observations is an important task. The particle marginal Metropolis–Hastings (PMMH) method is a method for estimating the marginal posterior distribution of parameters obtained by marginalization over the distribution of latent variables in the state space model. Although, in principle, we can estimate the marginal posterior distribution of parameters by iterating this method infinitely, the estimated result depends on the initial values for a finite number of times in practice. In this paper, we propose a replica exchange particle marginal Metropolis–Hastings (REPMMH) method as a method to improve this problem by combining the PMMH method with the replica exchange method. By using the proposed method, we simultaneously realize a global search at a high temperature and a local fine search at a low temperature. We evaluate the proposed method using simulated data obtained from the Izhikevich neuron model and Lévy-driven stochastic volatility model, and we show that the proposed REPMMH method improves the problem of the initial value dependence in the PMMH method, and realizes efficient sampling of parameters in the state space models compared with existing methods.  相似文献   

6.
Recent astronomical observations indicate that the Universe is presently almost flat and undergoing a period of accelerated expansion. Basing on Einstein's general relativity all these observations can be explained by the hypothesis of a dark energy component in addition to cold dark matter (CDM). Because the nature of this dark energy is unknown, it was proposed some alternative scenario to explain the current accelerating Universe. The key point of this scenario is to modify the standard FRW equation instead of mysterious dark energy component. The standard approach to constrain model parameters, based on the likelihood method, gives a best-fit model and confidence ranges for those parameters. We always arbitrary choose the set of parameters which define a model which we compare with observational data. Because in the generic case, the introducing of new parameters improves a fit to the data set, there appears the problem of elimination of model parameters which can play an insufficient role. The Bayesian information criteria of model selection (BIC) is dedicated to promotion a set of parameters which should be incorporated to the model. We divide class of all accelerating cosmological models into two groups according to the two types of explanation acceleration of the Universe. Then the Bayesian framework of model selection is used to determine the set of parameters which gives preferred fit to the SNIa data. We find a few of flat cosmological models which can be recommend by the Bayes factor. We show that models with dark energy as a new fluid are favoured over models featuring a modified FRW equation.  相似文献   

7.
This paper investigates the asymptotic properties of estimators obtained from the so called CVA (canonical variate analysis) subspace algorithm proposed by Larimore (1983) in the case when the data is generated using a minimal state space system containing unit roots at the seasonal frequencies such that the yearly difference is a stationary vector autoregressive moving average (VARMA) process. The empirically most important special cases of such data generating processes are the I(1) case as well as the case of seasonally integrated quarterly or monthly data. However, increasingly also datasets with a higher sampling rate such as hourly, daily or weekly observations are available, for example for electricity consumption. In these cases the vector error correction representation (VECM) of the vector autoregressive (VAR) model is not very helpful as it demands the parameterization of one matrix per seasonal unit root. Even for weekly series this amounts to 52 matrices using yearly periodicity, for hourly data this is prohibitive. For such processes estimation using quasi-maximum likelihood maximization is extremely hard since the Gaussian likelihood typically has many local maxima while the parameter space often is high-dimensional. Additionally estimating a large number of models to test hypotheses on the cointegrating rank at the various unit roots becomes practically impossible for weekly data, for example. This paper shows that in this setting CVA provides consistent estimators of the transfer function generating the data, making it a valuable initial estimator for subsequent quasi-likelihood maximization. Furthermore, the paper proposes new tests for the cointegrating rank at the seasonal frequencies, which are easy to compute and numerically robust, making the method suitable for automatic modeling. A simulation study demonstrates by example that for processes of moderate to large dimension the new tests may outperform traditional tests based on long VAR approximations in sample sizes typically found in quarterly macroeconomic data. Further simulations show that the unit root tests are robust with respect to different distributions for the innovations as well as with respect to GARCH-type conditional heteroskedasticity. Moreover, an application to Kaggle data on hourly electricity consumption by different American providers demonstrates the usefulness of the method for applications. Therefore the CVA algorithm provides a very useful initial guess for subsequent quasi maximum likelihood estimation and also delivers relevant information on the cointegrating ranks at the different unit root frequencies. It is thus a useful tool for example in (but not limited to) automatic modeling applications where a large number of time series involving a substantial number of variables need to be modelled in parallel.  相似文献   

8.
砂梨糖度近红外光谱波段遗传算法优化   总被引:6,自引:0,他引:6  
遗传算法不受搜索空间限制性假设的约束,利用简单的编码技术和繁殖机制来解决复杂近红外光谱数据的优化问题。文章采用遗传算法的波段选择法(R-SGA)对砂梨近红外光谱进行了波段优化,得到丰水、圆黄、黄金三种梨的R-SGA最佳因子数分别为10, 12和16,并分别建立了单一品种GA-PLS模型;丰水梨和黄金梨的GA-PLS模型精度高于全谱PLS模型,其模型的RMSEP分别为0.608/0.632和0.524/0.540;圆黄梨GA-PLS模型精度(RMSEP=0.610)与全谱PLS模型(RMSEP=0.595)相当。经波段优化分析表明,使用552个数据点建立多品种砂梨混合模型,具有较高稳健性和预测性(RMSEC=0.627,RMSEP=0.641)。结果表明:基于遗传算法进行波段优化可以提高砂梨糖度模型精度,提高建模效率,同时说明建立多品种砂梨糖度通用模型是可行的。  相似文献   

9.
Factor analysis is a well known statistical method to describe the variability among observed variables in terms of a smaller number of unobserved latent variables called factors. While dealing with multivariate time series, the temporal correlation structure of data may be modeled by including correlations in latent factors, but a crucial choice is the covariance function to be implemented. We show that analyzing multivariate time series in terms of latent Gaussian processes, which are mutually independent but with each of them being characterized by exponentially decaying temporal correlations, leads to an efficient implementation of the expectation–maximization algorithm for the maximum likelihood estimation of parameters, due to the properties of block-tridiagonal matrices. The proposed approach solves an ambiguity known as the identifiability problem, which renders the solution of factor analysis determined only up to an orthogonal transformation. Samples with just two temporal points are sufficient for the parameter estimation: hence the proposed approach may be applied even in the absence of prior information about the correlation structure of latent variables by fitting the model to pairs of points with varying time delay. Our modeling allows one to make predictions of the future values of time series and we illustrate our method by applying it to an analysis of published gene expression data from cell culture HeLa.  相似文献   

10.
In this paper, a general framework is developed for determining the underlying parameters of general signal models through the application of maximum likelihood estimation theory for functions whose variables separate. This method extends previous work in sinusoidal and exponential estimation to include models with other functional bases, such as exponential functions with nonconstant amplitudes and Bessel functions. Nonuniform spatial sampling is also possible with this technique. The maximum likelihood method is applied to the identification of wave components along one-dimensional structural elements. Results are given which demonstrate the viability and accuracy of the technique estimating exponential and Bessel function model parameters from noisy simulation data.  相似文献   

11.
Smart transportation is an important part of smart urban areas, and travel characteristics analysis and traffic prediction modeling are the two key technical measures of building smart transportation systems. Although online car-hailing has developed rapidly and has a large number of users, most of the studies on travel characteristics do not focus on online car-hailing, but instead on taxis, buses, metros, and other traditional means of transportation. The traditional univariate variable hybrid time series traffic prediction model based on the autoregressive integrated moving average (ARIMA) ignores other explanatory variables. To fill the research gap on online car-hailing travel characteristics analysis and overcome the shortcomings of the univariate variable hybrid time series traffic prediction model based on ARIMA, based on online car-hailing operational data sets, we analyzed the online car-hailing travel characteristics from multiple dimensions, such as district, time, traffic jams, weather, air quality, and temperature. A traffic prediction method suitable for multivariate variables hybrid time series modeling is proposed in this paper, which uses the maximal information coefficient (MIC) to perform feature selection, and fuses autoregressive integrated moving average with explanatory variable (ARIMAX) and long short-term memory (LSTM) for data regression. The effectiveness of the proposed multivariate variables hybrid time series traffic prediction model was verified on the online car-hailing operational data sets.  相似文献   

12.
Blood velocity estimation is complicated by the strong echoes received from tissue surrounding the vessel under investigation. Proper blood velocity estimation necessitates use of a filter for separation of the different signal components. Development of these filters and new estimators requires RF-data, where the tissue component is known. In vivo RF-data does not have this property. Instead simulated data incorporating all relevant features of the measurement situation can be employed. One feature is the motion in the surrounding tissue induced by pulsation, heartbeat, and breathing. This study has developed models for the motions and incorporated them into the RF simulation program Field II, thereby obtaining realistic simulated data. A powerful tool for evaluation of different filters and estimators is then available. The model parameters can be varied according to the physical situation with respect to scan-site and the individual to be scanned. The nature of pulsation is discussed, and a relation between the pressure in the carotid artery and the experienced vessel wall motion is derived.  相似文献   

13.
Volatility, which represents the magnitude of fluctuating asset prices or returns, is used in the problems of finance to design optimal asset allocations and to calculate the price of derivatives. Since volatility is unobservable, it is identified and estimated by latent variable models known as volatility fluctuation models. Almost all conventional volatility fluctuation models are linear time-series models and thus are difficult to capture nonlinear and/or non-Gaussian properties of volatility dynamics. In this study, we propose an entropy based Student’s t-process Dynamical model (ETPDM) as a volatility fluctuation model combined with both nonlinear dynamics and non-Gaussian noise. The ETPDM estimates its latent variables and intrinsic parameters by a robust particle filtering based on a generalized H-theorem for a relative entropy. To test the performance of the ETPDM, we implement numerical experiments for financial time-series and confirm the robustness for a small number of particles by comparing with the conventional particle filtering.  相似文献   

14.
基于RF-GABPSO混合选择算法的黑土有机质含量估测研究   总被引:1,自引:0,他引:1  
针对土壤有机质含量高光谱估测研究中变量维数过高与特征谱段筛选问题,提出了一种结合随机森林和自适应搜索算法的混合特征选择方法。首先依据随机森林变量重要性原理获取初始优化集,然后利用遗传二进制粒子群封装算法对初始优化集进一步自适应筛选。对于土壤有机质含量估测建模问题,选择稳健性强且能有效处理高维变量的随机森林算法。以典型黑土区采集的土壤样品为研究对象,将ASD光谱仪获取的可见光-近红外区间光谱数据和经化学分析得到的土壤有机质含量为数据源,对原始光谱进行光谱变换和重采样处理后,采用随机森林-遗传二进制粒子群混合选择方法提取特征光谱区间,构建有机质含量随机森林估测模型。与利用全光谱、随机森林方法筛选的光谱和自适应搜索算法筛选的光谱构建随机森林模型得到的预测精度进行比较。结果表明,利用随机森林-遗传二进制粒子群混合特征选择算法筛选的波谱变量参与随机森林建模,预测决定系数,均方根误差和相对分析误差分别为0.838,0.54%,2.534。该方案应用最少的变量个数获得最高的预测精度,能够较高效地估测黑土有机质含量,也能为其他类型土壤在有机质含量估测研究的变量筛选与建模问题上提供参考。  相似文献   

15.
We present cluster Monte Carlo algorithms for theXYZ quantum spin models. In the special case ofS=1/2, the new algorithm can be viewed as a cluster algorithm for the 8-vertex model. As an example, we study theS=1/2XY model in two dimensions with a representation in which the quantization axis lies in the easy plane. We find that the numerical autocorrelation time for the cluster algorithm remains of the order of unity and does not show any significant dependence on the temperature, the system size, or the Trotter number. On the other hand, the autocorrelation time for the conventional algorithm strongly depends on these parameters and can be very large. The use of improved estimators for thermodynamic averages further enhances the efficiency of the new algorithms.  相似文献   

16.
This paper discussed the estimation of stress-strength reliability parameter R=P(Y<X) based on complete samples when the stress-strength are two independent Poisson half logistic random variables (PHLD). We have addressed the estimation of R in the general case and when the scale parameter is common. The classical and Bayesian estimation (BE) techniques of R are studied. The maximum likelihood estimator (MLE) and its asymptotic distributions are obtained; an approximate asymptotic confidence interval of R is computed using the asymptotic distribution. The non-parametric percentile bootstrap and student’s bootstrap confidence interval of R are discussed. The Bayes estimators of R are computed using a gamma prior and discussed under various loss functions such as the square error loss function (SEL), absolute error loss function (AEL), linear exponential error loss function (LINEX), generalized entropy error loss function (GEL) and maximum a posteriori (MAP). The Metropolis–Hastings algorithm is used to estimate the posterior distributions of the estimators of R. The highest posterior density (HPD) credible interval is constructed based on the SEL. Monte Carlo simulations are used to numerically analyze the performance of the MLE and Bayes estimators, the results were quite satisfactory based on their mean square error (MSE) and confidence interval. Finally, we used two real data studies to demonstrate the performance of the proposed estimation techniques in practice and to illustrate how PHLD is a good candidate in reliability studies.  相似文献   

17.
In this paper, the parameter estimation problem of a truncated normal distribution is discussed based on the generalized progressive hybrid censored data. The desired maximum likelihood estimates of unknown quantities are firstly derived through the Newton–Raphson algorithm and the expectation maximization algorithm. Based on the asymptotic normality of the maximum likelihood estimators, we develop the asymptotic confidence intervals. The percentile bootstrap method is also employed in the case of the small sample size. Further, the Bayes estimates are evaluated under various loss functions like squared error, general entropy, and linex loss functions. Tierney and Kadane approximation, as well as the importance sampling approach, is applied to obtain the Bayesian estimates under proper prior distributions. The associated Bayesian credible intervals are constructed in the meantime. Extensive numerical simulations are implemented to compare the performance of different estimation methods. Finally, an authentic example is analyzed to illustrate the inference approaches.  相似文献   

18.
For stationary power sources such as utility boilers, it is useful to dispose of parametric models able to describe their behavior in a wide range of operating conditions, to predict some Quantities of Interest (QOIs) that need to be consistent with experimental observations. The development of predictive simulation tools for large scale systems cannot rely on full-order models, as the latter would lead to prohibitive costs when coupled to sampling techniques in the model parameter space. An alternative approach consists of using a Surrogate Model (SM). As the number of QOIs is often high and many SMs need to be trained, Principal Component Analysis (PCA) can be used to encode the set of QOIs in a much smaller set of scalars, called PCA scores. A SM is then built for each PCA score rather than for each QOI. The advantage of reducing the number of variables is twofold: computational costs are reduced (less SMs need to be trained) and information is preserved (correlation among the original variables).The strategy is applied to a CFD model simulating the Alstom 15 MWth oxy-pilot Boiler Simulation Facility (BSF). In practice, experiments cannot provide full coverage of the pulverized-coal utility boiler due to both practicality and costs. Values of the model’s parameters which guarantee consistency with the experimental data of this test facility for 121 QOIs are found, by training a SM based on the combination of Kriging and PCA, using only 5 latent variables.  相似文献   

19.
彭钢 《计算物理》2018,35(1):87-94
本文开展基于连续能量共轭加权蒙特卡罗的堆芯动态参数计算研究,这些参数主要包括缓发中子有效份额、瞬发中子代时间和瞬发中子衰减常数,在目前普遍采用的迭代裂变概率(IFP)的基础上,扩展原有IFP方法中共轭通量的选择,比较径迹长度估计、碰撞估计、吸收估计和tally记数估计的差别,及协方差和方差权重计算.同时,给出多个概率区间的动态参数分布范围,对迭代裂变概率法进行深入的研究,对比并给出下一代事件估计计数和不同中间代迭代裂变概率估计计数对计算结果的影响,从中得出合适的迭代代数.相关程序在计算完成后自动输出各种粒子分布状况统计.  相似文献   

20.
方海燕  刘兵  李小平  孙海峰  薛梦凡  沈利荣  朱金鹏 《物理学报》2016,65(11):119701-119701
为提高X射线脉冲星导航中累积脉冲轮廓的时间延迟估计精度, 分析了X射线脉冲星累积脉冲轮廓的频谱特性和现有Taylor快速傅立叶变换时延估计算法的缺陷, 提出了一种基于最优频段的累积轮廓时延估计算法, 并通过建立不同信噪比下时延估计误差与所采用频段之间的关系以确定最优频段. 数值及实测数据实验结果表明: 在短时观测或光子流量较小时, 该算法优于常用的近似最大似然 (FAML)、相关 (CC)、最小二乘 (NLS) 及加权最小二乘 (WNLS) 方法; 在观测时间较长或光子流量较大时, 该算法的估计精度与CC及NLS方法相当, 但其运算量低于NLS, FAML 及WNLS方法. 本文所提算法适用于短时观测脉冲轮廓或低流量脉冲星的高精度时延估计.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号