首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 687 毫秒
1.
首先通过Hadar等价变换方法将高阶隐马氏模型转换为与之等价的一阶向量值隐马氏模型,然后利用动态规划原理建立了一阶向量值隐马氏模型的Viterbi算法,最后通过高阶隐马氏模型和一阶向量值隐马氏模型之间的等价关系建立了高阶隐马氏模型基于动态规划推广的Viterbi算法.研究结果在一定程度上推广了几乎所有隐马氏模型文献中所涉及到的解码问题的Viterbi算法,从而进一步丰富和发展了高阶隐马氏模型的算法理论.  相似文献   

2.
3.
In this paper we consider a simulated annealing algorithm for multiobjective optimization problems. With a suitable choice of the acceptance probabilities, the algorithm is shown to converge asymptotically, that is, the Markov chain that describes the algorithm converges with probability one to the Pareto optimal set.  相似文献   

4.
We present a numerical algorithm for pricing derivatives on electricity prices. The algorithm is based on approximating the generator of the underlying price process on a lattice of prices, resulting in an approximation of the stochastic process by a continuous time Markov chain. We numerically study the rate of convergence of the algorithm for the case of the Merton jump-diffusion model and apply the algorithm to calculate prices and sensitivities of both European and Bermudan electricity derivatives when the underlying price follows a stochastic process which exhibits both fast mean-reversion and jumps of large magnitude.  相似文献   

5.
We show that the LP formulation for an undiscounted multi-chain Markov decision problem can be put in a block upper-triangular form by a polynomial time procedure. Each minimal block (after an appropriate dynamic revision) gives rise to a single-chain Markov decision problem which can be treated independently. An optimal solution to each single-chain problem can be connected by auxiliary dual programs to obtain an optimal solution to a multi-chain problem.  相似文献   

6.
本文研究平均报酬马氏决策过程(MDP)的相对值迭代算法.给出了span半范数压缩因子的一个表达式,证明了该因子小于1时本文绘出的相对值迭代算法及小步长相对值迭代算法均收敛到其最优解.  相似文献   

7.
In this paper we consider Markov Decision Processes with discounted cost and a random rate in Borel spaces. We establish the dynamic programming algorithm in finite and infinity horizon cases. We provide conditions for the existence of measurable selectors. And we show an example of consumption-investment problem. This research was partially supported by the PROMEP grant 103.5/05/40.  相似文献   

8.
We introduce a revised simplex algorithm for solving a typical type of dynamic programming equation arising from a class of finite Markov decision processes. The algorithm also applies to several types of optimal control problems with diffusion models after discretization. It is based on the regular simplex algorithm, the duality concept in linear programming, and certain special features of the dynamic programming equation itself. Convergence is established for the new algorithm. The algorithm has favorable potential applicability when the number of actions is very large or even infinite.  相似文献   

9.
In this paper, we study discounted Markov decision processes on an uncountable state space. We allow a utility (reward) function to be unbounded both from above and below. A new feature in our approach is an easily verifiable rate of growth condition introduced for a positive part of the utility function. This assumption, in turn, enables us to prove the convergence of a value iteration algorithm to a solution to the Bellman equation. Moreover, by virtue of the optimality equation we show the existence of an optimal stationary policy.  相似文献   

10.
Value iteration and optimization of multiclass queueing networks   总被引:2,自引:0,他引:2  
Chen  Rong-Rong  Meyn  Sean 《Queueing Systems》1999,32(1-3):65-97
This paper considers in parallel the scheduling problem for multiclass queueing networks, and optimization of Markov decision processes. It is shown that the value iteration algorithm may perform poorly when the algorithm is not initialized properly. The most typical case where the initial value function is taken to be zero may be a particularly bad choice. In contrast, if the value iteration algorithm is initialized with a stochastic Lyapunov function, then the following hold: (i) a stochastic Lyapunov function exists for each intermediate policy, and hence each policy is regular (a strong stability condition), (ii) intermediate costs converge to the optimal cost, and (iii) any limiting policy is average cost optimal. It is argued that a natural choice for the initial value function is the value function for the associated deterministic control problem based upon a fluid model, or the approximate solution to Poisson’s equation obtained from the LP of Kumar and Meyn. Numerical studies show that either choice may lead to fast convergence to an optimal policy. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

11.
This paper develops a rare event simulation algorithm for a discrete-time Markov chain in the first orthant. The algorithm gives a very good estimate of the stationary distribution along one of the axes and it is shown to be efficient. A key idea is to study an associated time reversed Markov chain that starts at the rare event. We will apply the algorithm to a Markov chain related to a Jackson network with two stations.  相似文献   

12.
This work develops a discrete event model for a multi-product multi-stage production and storage (P&S) problem subject to random demand. The intervention problem consists of three types of possible decisions made at the end of one stage, which depend on the observed demand (or lack of) for each item: (i) to proceed further with the production of the same product, (ii) to proceed with the production of another product or (iii) to halt the production. The intervention problem is formulated in terms of dynamic programming (DP) operators and each possible solution induces an homogeneous Markov chain that characterizes the dynamics. However, solving directly the DP problem is not a viable task in situations involving a moderately large number of products with many production stages, and the idea of the paper is to detach from strict optimality with monitored precision, and rely on stability. The notion of stochastic stability brought to bear requires a finite set of positive recurrent states and the paper derives necessary and sufficient conditions for a policy to induce such a set in the studied P&S problem. An approximate value iteration algorithm is proposed, which applies to the broader class of control problems described by homogeneous Markov chains that satisfy a structural condition pointed out in the paper. This procedure iterates in a finite subset of the state space, circumventing the computational burden of standard dynamic programming. To benchmark the approach, the proposed algorithm is applied to a simple two-product P&S system.  相似文献   

13.
A Markov chain plays an important role in an interacting multiple model (IMM) algorithm which has been shown to be effective for target tracking systems. Such systems are described by a mixing of continuous states and discrete modes. The switching between system modes is governed by a Markov chain. In real world applications, this Markov chain may change or needs to be changed. Therefore, one may be concerned about a target tracking algorithm with the switching of a Markov chain. This paper concentrates on fault-tolerant algorithm design and algorithm analysis of IMM estimation with the switching of a Markov chain. Monte Carlo simulations are carried out and several conclusions are given.  相似文献   

14.
We consider a finite state-action discounted constrained Markov decision process with uncertain running costs and known transition probabilities. We propose equivalent linear programming, second-order cone programming and semidefinite programming problems for the robust constrained Markov decision processes when the uncertain running cost vectors belong to polytopic, ellipsoidal, and semidefinite cone uncertainty sets, respectively. As an application, we study a variant of a machine replacement problem and perform numerical experiments on randomly generated instances of various sizes.  相似文献   

15.
Chain graph (CG) is a general model of graphical Markov models. Some different chain graphs may describe the same conditional independence structure, then we say that these CGs are Markov equivalent. In 1990 Frydenberg showed that every class of Markov equivalent CGs has a CG which is called the largest chain graph with the greatest number of lines. This paper presents an efficient algorithm for finding the largest chain graph of the corresponding Markov equivalent class of a given CG. The computational complexity of the algorithm is O(n3). It is more efficient than the complexity O(n!) of the present algorithms. Also a more intuitive graphical characterization of the largest chain graph is provided based on the algorithm in this paper.  相似文献   

16.
This paper studies a stochastic linear quadratic (LQ) control problem in the infinite time horizon with Markovian jumps in parameter values. In contrast to the deterministic case, the cost weighting matrices of the state and control are allowed to be indinifite here. When the generator matrix of the jump process – which is assumed to be a Markov chain – is known and time-invariant, the well-posedness of the indefinite stochastic LQ problem is shown to be equivalent to the solvability of a system of coupled generalized algebraic Riccati equations (CGAREs) that involves equality and inequality constraints. To analyze the CGAREs, linear matrix inequalities (LMIs) are utilized, and the equivalence between the feasibility of the LMIs and the solvability of the CGAREs is established. Finally, an LMI-based algorithm is devised to slove the CGAREs via a semidefinite programming, and numerical results are presented to illustrate the proposed algorithm.  相似文献   

17.
The previously known works describing the generalization of least-square regularized regression algorithm are usually based on the assumption of independent and identically distributed (i.i.d.) samples. In this paper we go far beyond this classical framework by studying the generalization of least-square regularized regression algorithm with Markov chain samples. We first establish a novel concentration inequality for uniformly ergodic Markov chains, then we establish the bounds on the generalization of least-square regularized regression algorithm with uniformly ergodic Markov chain samples, and show that least-square regularized regression algorithm with uniformly ergodic Markov chains is consistent.  相似文献   

18.
Factored Markov Decision Processes (MDPs) provide a compact representation for modeling sequential decision making problems with many variables. Approximate linear programming (LP) is a prominent method for solving factored MDPs. However, it cannot be applied to models with large treewidth due to the exponential number of constraints. This paper proposes a novel and efficient approximate method to represent the exponentially many constraints. We construct an augmented junction graph from the factored MDP, and represent the constraints using a set of cluster constraints and separator constraints, where the cluster constraints play the role of reducing the number of constraints, and the separator constraints enforce the consistency of neighboring clusters so as to improve the accuracy. In the case where the junction graph is tree-structured, our method provides an equivalent representation to the original constraints. In other cases, our method provides a good trade-off between computation and accuracy. Experimental results on different models show that our algorithm performs better than other approximate linear programming algorithms on computational cost or expected reward.  相似文献   

19.
Online (also called “recursive” or “adaptive”) estimation of fixed model parameters in hidden Markov models is a topic of much interest in times series modeling. In this work, we propose an online parameter estimation algorithm that combines two key ideas. The first one, which is deeply rooted in the Expectation-Maximization (EM) methodology, consists in reparameterizing the problem using complete-data sufficient statistics. The second ingredient consists in exploiting a purely recursive form of smoothing in HMMs based on an auxiliary recursion. Although the proposed online EM algorithm resembles a classical stochastic approximation (or Robbins–Monro) algorithm, it is sufficiently different to resist conventional analysis of convergence. We thus provide limited results which identify the potential limiting points of the recursion as well as the large-sample behavior of the quantities involved in the algorithm. The performance of the proposed algorithm is numerically evaluated through simulations in the case of a noisily observed Markov chain. In this case, the algorithm reaches estimation results that are comparable to those of the maximum likelihood estimator for large sample sizes. The supplemental material for this article available online includes an appendix with the proofs of Theorem 1 and Corollary 1 stated in Section 4 as well as the MATLAB/OCTAVE code used to implement the algorithm in the case of a noisily observed Markov chain considered in Section 5.  相似文献   

20.
Sampling from an intractable probability distribution is a common and important problem in scientific computing. A popular approach to solve this problem is to construct a Markov chain which converges to the desired probability distribution, and run this Markov chain to obtain an approximate sample. In this paper, we provide two methods to improve the performance of a given discrete reversible Markov chain. These methods require the knowledge of the stationary distribution only up to a normalizing constant. Each of these methods produces a reversible Markov chain which has the same stationary distribution as the original chain, and dominates the original chain in the ordering introduced by Peskun [11]. We illustrate these methods on two Markov chains, one connected to hidden Markov models and one connected to card shuffling. We also prove a result which shows that the Metropolis-Hastings algorithm preserves the Peskun ordering for Markov transition matrices.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号