首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the parameter estimation of limit extreme value distributions, most employed methods only use some of the available data. Using the peaks-over-threshold method for Generalized Pareto Distribution (GPD), only the observations above a certain threshold are considered; therefore, a big amount of information is wasted. The aim of this work is to make the most of the information provided by the observations in order to improve the accuracy of Bayesian parameter estimation. We present two new Bayesian methods to estimate the parameters of the GPD, taking into account the whole data set from the baseline distribution and the existing relations between the baseline and the limit GPD parameters in order to define highly informative priors. We make a comparison between the Bayesian Metropolis–Hastings algorithm with data over the threshold and the new methods when the baseline distribution is a stable distribution, whose properties assure we can reduce the problem to study standard distributions and also allow us to propose new estimators for the parameters of the tail distribution. Specifically, three cases of stable distributions were considered: Normal, Lévy and Cauchy distributions, as main examples of the different behaviors of the tails of a distribution. Nevertheless, the methods would be applicable to many other baseline distributions through finding relations between baseline and GPD parameters via studies of simulations. To illustrate this situation, we study the application of the methods with real data of air pollution in Badajoz (Spain), whose baseline distribution fits a Gamma, and show that the baseline methods improve estimates compared to the Bayesian Metropolis–Hastings algorithm.  相似文献   

2.
Volatility, which represents the magnitude of fluctuating asset prices or returns, is used in the problems of finance to design optimal asset allocations and to calculate the price of derivatives. Since volatility is unobservable, it is identified and estimated by latent variable models known as volatility fluctuation models. Almost all conventional volatility fluctuation models are linear time-series models and thus are difficult to capture nonlinear and/or non-Gaussian properties of volatility dynamics. In this study, we propose an entropy based Student’s t-process Dynamical model (ETPDM) as a volatility fluctuation model combined with both nonlinear dynamics and non-Gaussian noise. The ETPDM estimates its latent variables and intrinsic parameters by a robust particle filtering based on a generalized H-theorem for a relative entropy. To test the performance of the ETPDM, we implement numerical experiments for financial time-series and confirm the robustness for a small number of particles by comparing with the conventional particle filtering.  相似文献   

3.
Preferential attachment is widely recognised as the principal driving force behind the evolution of many growing networks, and measuring the extent to which it occurs during the growth of a network is important for explaining its overall structure. Conventional methods require that the timeline of a growing network is known, that is, the order in which the nodes of the network appeared in time is available. But growing network datasets are commonly accompanied by missing-timelines, in which instance the order of the nodes in time cannot be readily ascertained from the data. To address this shortcoming, we propose a Markov chain Monte Carlo algorithm for measuring preferential attachment in growing networks with missing-timelines. Key to our approach is that any growing network model gives rise to a probability distribution over the space of networks. This enables a growing network model to be fitted to a growing network dataset with missing-timeline, allowing not only for the prevalence of preferential attachment to be estimated as a model parameter, but the timeline also. Parameter estimation is achieved by implementing a novel Metropolis–Hastings sampling scheme for updating both the preferential attachment parameter and timeline. A simulation study demonstrates that our method accurately measures the occurrence of preferential attachment in networks generated according to the underlying model. What is more, our approach is illustrated on a small sub-network of the United States patent citation network. Since the timeline for this example is in fact known, we are able to validate our approach against the conventional methods, showing that they give mutually consistent estimates.  相似文献   

4.
With the aim of improving the reconstruction of stochastic evolution equations from empirical time-series data, we derive a full representation of the generator of the Kramers–Moyal operator via a power-series expansion of the exponential operator. This expansion is necessary for deriving the different terms in a stochastic differential equation. With the full representation of this operator, we are able to separate finite-time corrections of the power-series expansion of arbitrary order into terms with and without derivatives of the Kramers–Moyal coefficients. We arrive at a closed-form solution expressed through conditional moments, which can be extracted directly from time-series data with a finite sampling intervals. We provide all finite-time correction terms for parametric and non-parametric estimation of the Kramers–Moyal coefficients for discontinuous processes which can be easily implemented—employing Bell polynomials—in time-series analyses of stochastic processes. With exemplary cases of insufficiently sampled diffusion and jump-diffusion processes, we demonstrate the advantages of our arbitrary-order finite-time corrections and their impact in distinguishing diffusion and jump-diffusion processes strictly from time-series data.  相似文献   

5.
Entropy estimation faces numerous challenges when applied to various real-world problems. Our interest is in divergence and entropy estimation algorithms which are capable of rapid estimation for natural sequence data such as human and synthetic languages. This typically requires a large amount of data; however, we propose a new approach which is based on a new rank-based analytic Zipf–Mandelbrot–Li probabilistic model. Unlike previous approaches, which do not consider the nature of the probability distribution in relation to language; here, we introduce a novel analytic Zipfian model which includes linguistic constraints. This provides more accurate distributions for natural sequences such as natural or synthetic emergent languages. Results are given which indicates the performance of the proposed ZML model. We derive an entropy estimation method which incorporates the linguistic constraint-based Zipf–Mandelbrot–Li into a new non-equiprobable coincidence counting algorithm which is shown to be effective for tasks such as entropy rate estimation with limited data.  相似文献   

6.
There has been a considerable amount of literature on binomial regression models that utilize well-known link functions, such as logistic, probit, and complementary log-log functions. The conventional binomial model is focused only on a single parameter representing one probability of success. However, we often encounter data for which two different success probabilities are of interest simultaneously. For instance, there are several offensive measures in baseball to predict the future performance of batters. Under these circumstances, it would be meaningful to consider more than one success probability. In this article, we employ a bivariate binomial distribution that possesses two success probabilities to conduct a regression analysis with random effects being incorporated under a Bayesian framework. Major League Baseball data are analyzed to demonstrate our methodologies. Extensive simulation studies are conducted to investigate model performances.  相似文献   

7.
A load-sharing system is defined as a parallel system whose load will be redistributed to its surviving components as each of the components fails in the system. Our focus is on making statistical inference of the parameters associated with the lifetime distribution of each component in the system. In this paper, we introduce a methodology which integrates the conventional procedure under the assumption of the load-sharing system being made up of fundamental hypothetical latent random variables. We then develop an expectation maximization algorithm for performing the maximum likelihood estimation of the system with Lindley-distributed component lifetimes. We adopt several standard simulation techniques to compare the performance of the proposed methodology with the Newton–Raphson-type algorithm for the maximum likelihood estimate of the parameter. Numerical results indicate that the proposed method is more effective by consistently reaching a global maximum.  相似文献   

8.
Formal Bayesian comparison of two competing models, based on the posterior odds ratio, amounts to estimation of the Bayes factor, which is equal to the ratio of respective two marginal data density values. In models with a large number of parameters and/or latent variables, they are expressed by high-dimensional integrals, which are often computationally infeasible. Therefore, other methods of evaluation of the Bayes factor are needed. In this paper, a new method of estimation of the Bayes factor is proposed. Simulation examples confirm good performance of the proposed estimators. Finally, these new estimators are used to formally compare different hybrid Multivariate Stochastic Volatility–Multivariate Generalized Autoregressive Conditional Heteroskedasticity (MSV-MGARCH) models which have a large number of latent variables. The empirical results show, among other things, that the validity of reduction of the hybrid MSV-MGARCH model to the MGARCH specification depends on the analyzed data set as well as on prior assumptions about model parameters.  相似文献   

9.
An important aspect of using entropy-based models and proposed “synthetic languages”, is the seemingly simple task of knowing how to identify the probabilistic symbols. If the system has discrete features, then this task may be trivial; however, for observed analog behaviors described by continuous values, this raises the question of how we should determine such symbols. This task of symbolization extends the concept of scalar and vector quantization to consider explicit linguistic properties. Unlike previous quantization algorithms where the aim is primarily data compression and fidelity, the goal in this case is to produce a symbolic output sequence which incorporates some linguistic properties and hence is useful in forming language-based models. Hence, in this paper, we present methods for symbolization which take into account such properties in the form of probabilistic constraints. In particular, we propose new symbolization algorithms which constrain the symbols to have a Zipf–Mandelbrot–Li distribution which approximates the behavior of language elements. We introduce a novel constrained EM algorithm which is shown to effectively learn to produce symbols which approximate a Zipfian distribution. We demonstrate the efficacy of the proposed approaches on some examples using real world data in different tasks, including the translation of animal behavior into a possible human language understandable equivalent.  相似文献   

10.
Clustering is a major unsupervised learning algorithm and is widely applied in data mining and statistical data analyses. Typical examples include k-means, fuzzy c-means, and Gaussian mixture models, which are categorized into hard, soft, and model-based clusterings, respectively. We propose a new clustering, called Pareto clustering, based on the Kolmogorov–Nagumo average, which is defined by a survival function of the Pareto distribution. The proposed algorithm incorporates all the aforementioned clusterings plus maximum-entropy clustering. We introduce a probabilistic framework for the proposed method, in which the underlying distribution to give consistency is discussed. We build the minorize-maximization algorithm to estimate the parameters in Pareto clustering. We compare the performance with existing methods in simulation studies and in benchmark dataset analyses to demonstrate its highly practical utilities.  相似文献   

11.
The minimal dominating set for a digraph (directed graph) is a prototypical hard combinatorial optimization problem. In a previous paper, we studied this problem using the cavity method. Although we found a solution for a given graph that gives very good estimate of the minimal dominating size, we further developed the one step replica symmetry breaking theory to determine the ground state energy of the undirected minimal dominating set problem. The solution space for the undirected minimal dominating set problem exhibits both condensation transition and cluster transition on regular random graphs. We also developed the zero temperature survey propagation algorithm on undirected Erdös-Rényi graphs to find the ground state energy. In this paper we continue to develope the one step replica symmetry breaking theory to find the ground state energy for the directed minimal dominating set problem. We find the following. (i) The warning propagation equation can not converge when the connectivity is greater than the core percolation threshold value of 3.704. Positive edges have two types warning, but the negative edges have one. (ii) We determine the ground state energy and the transition point of the Erdös-Rényi random graph. (iii) The survey propagation decimation algorithm has good results comparable with the belief propagation decimation algorithm.  相似文献   

12.
This paper describes some of the analytic tools developed recently by Ghirlanda and Guerra in the investigation of the distribution of overlaps in the Sherrington–Kirkpatrick spin glass model and of Parisi's ultrametricity. In particular, we introduce to this task a simplified (but also generalized) model on which the Gaussian analysis is made easier. Moments of the Hamiltonian and derivatives of the free energy are expressed as polynomials of the overlaps. Under the essential tool of self-averaging, we describe with full rigour, various overlap identities and replica independence that actually hold in a rather large generality. The results are presented in a language accessible to probabilists and analysts.  相似文献   

13.
提出了一种基于粒子滤波状态估计的滚动轴承故障识别方法,该方法主要包括故障模型建立和故障识别两个步骤。在故障模型建立部分,首先依据滚动轴承不同故障状态下的振动信号,建立对应的自回归模型,作为故障模型;在故障识别部分,将正常状态下对应的模型,转化为状态空间模型,设计粒子滤波器,然后对不同的故障状态进行估计,提取其残差的相关特征,并结合模型参数特征应用BP神经网络识别算法进行故障识别。最后以美国凯斯西储大学的滚动轴承振动数据为例,验证了该方法的有效性。  相似文献   

14.
The statistical inference of the reliability and parameters of the stress–strength model has received great attention in the field of reliability analysis. When following the generalized progressive hybrid censoring (GPHC) scheme, it is important to discuss the point estimate and interval estimate of the reliability of the multicomponent stress–strength (MSS) model, in which the stress and the strength variables are derived from different distributions by assuming that stress follows the Chen distribution and that strength follows the Gompertz distribution. In the present study, the Newton–Raphson method was adopted to derive the maximum likelihood estimation (MLE) of the model parameters, and the corresponding asymptotic distribution was adopted to construct the asymptotic confidence interval (ACI). Subsequently, the exact confidence interval (ECI) of the parameters was calculated. A hybrid Markov chain Monte Carlo (MCMC) method was adopted to determine the approximate Bayesian estimation (BE) of the unknown parameters and the high posterior density credible interval (HPDCI). A simulation study with the actual dataset was conducted for the BEs with squared error loss function (SELF) and the MLEs of the model parameters and reliability, comparing the bias and mean squares errors (MSE). In addition, the three interval estimates were compared in terms of the average interval length (AIL) and coverage probability (CP).  相似文献   

15.
Kullback–Leibler divergence KL(p,q) is the standard measure of error when we have a true probability distribution p which is approximate with probability distribution q. Its efficient computation is essential in many tasks, as in approximate computation or as a measure of error when learning a probability. In high dimensional probabilities, as the ones associated with Bayesian networks, a direct computation can be unfeasible. This paper considers the case of efficiently computing the Kullback–Leibler divergence of two probability distributions, each one of them coming from a different Bayesian network, which might have different structures. The paper is based on an auxiliary deletion algorithm to compute the necessary marginal distributions, but using a cache of operations with potentials in order to reuse past computations whenever they are necessary. The algorithms are tested with Bayesian networks from the bnlearn repository. Computer code in Python is provided taking as basis pgmpy, a library for working with probabilistic graphical models.  相似文献   

16.
The ever-increasing travel demand has brought great challenges to the organization, operation, and management of the subway system. An accurate estimation of passenger flow distribution can help subway operators design corresponding operation plans and strategies scientifically. Although some literature has studied the problem of passenger flow distribution by analyzing the passengers’ path choice behaviors based on AFC (automated fare collection) data, few studies focus on the passenger flow distribution while considering the passenger–train matching probability, which is the key problem of passenger flow distribution. Specifically, the existing methods have not been applied to practical large-scale subway networks due to the computational complexity. To fill this research gap, this paper analyzes the relationship between passenger travel behavior and train operation in the space and time dimension and formulates the passenger–train matching probability by using multi-source data including AFC, train timetables, and network topology. Then, a reverse derivation method, which can reduce the scale of possible train combinations for passengers, is proposed to improve the computational efficiency. Simultaneously, an estimation method of passenger flow distribution is presented based on the passenger–train matching probability. Finally, two sets of experiments, including an accuracy verification experiment based on synthetic data and a comparison experiment based on real data from the Beijing subway, are conducted to verify the effectiveness of the proposed method. The calculation results show that the proposed method has a good accuracy and computational efficiency for a large-scale subway network.  相似文献   

17.
We study the random link traveling salesman problem, where lengths l ij between city i and city j are taken to be independent, identically distributed random variables. We discuss a theoretical approach, the cavity method, that has been proposed for finding the optimum tour length over this random ensemble, given the assumption of replica symmetry. Using finite size scaling and a renormalized model, we test the cavity predictions against the results of simulations, and find excellent agreement over a range of distributions. We thus provide numerical evidence that the replica symmetric solution to this problem is the correct one. Finally, we note a surprising result concerning the distribution of k th-nearest neighbor links in optimal tours, and invite a theoretical understanding of this phenomenon.  相似文献   

18.
This paper investigates the randomness assignment problem for a class of continuous-time stochastic nonlinear systems, where variance and entropy are employed to describe the investigated systems. In particular, the system model is formulated by a stochastic differential equation. Due to the nonlinearities of the systems, the probability density functions of the system state and system output cannot be characterised as Gaussian even if the system is subjected to Brownian motion. To deal with the non-Gaussian randomness, we present a novel backstepping-based design approach to convert the stochastic nonlinear system to a linear stochastic process, thus the variance and entropy of the system variables can be formulated analytically by the solving Fokker–Planck–Kolmogorov equation. In this way, the design parameter of the backstepping procedure can be then obtained to achieve the variance and entropy assignment. In addition, the stability of the proposed design scheme can be guaranteed and the multi-variate case is also discussed. In order to validate the design approach, the simulation results are provided to show the effectiveness of the proposed algorithm.  相似文献   

19.
In this paper, we have analyzed the mathematical model of various nonlinear oscillators arising in different fields of engineering. Further, approximate solutions for different variations in oscillators are studied by using feedforward neural networks (NNs) based on the backpropagated Levenberg–Marquardt algorithm (BLMA). A data set for different problem scenarios for the supervised learning of BLMA has been generated by the Runge–Kutta method of order 4 (RK-4) with the “NDSolve” package in Mathematica. The worth of the approximate solution by NN-BLMA is attained by employing the processing of testing, training, and validation of the reference data set. For each model, convergence analysis, error histograms, regression analysis, and curve fitting are considered to study the robustness and accuracy of the design scheme.  相似文献   

20.
The diffusion of a particle in a one-dimensional random force field (Sinai diffusion) is studied using the replica method. This method, which maps the problem onto a quantum problem, is shown to be a simple and direct way to calculate the long-time diffusive behavior. Results for the distribution of the local Green's function, the particle distribution, and persistence are obtained.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号