共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure μn of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model. 相似文献
3.
Edilson F. Arruda 《Operations Research Letters》2011,39(3):193-197
This note addresses the time aggregation approach to ergodic finite state Markov decision processes with uncontrollable states. We propose the use of the time aggregation approach as an intermediate step toward constructing a transformed MDP whose state space is comprised solely of the controllable states. The proposed approach simplifies the iterative search for the optimal solution by eliminating the need to define an equivalent parametric function, and results in a problem that can be solved by simpler, standard MDP algorithms. 相似文献
4.
M. Sun 《Journal of Optimization Theory and Applications》1993,79(2):405-413
We introduce a revised simplex algorithm for solving a typical type of dynamic programming equation arising from a class of finite Markov decision processes. The algorithm also applies to several types of optimal control problems with diffusion models after discretization. It is based on the regular simplex algorithm, the duality concept in linear programming, and certain special features of the dynamic programming equation itself. Convergence is established for the new algorithm. The algorithm has favorable potential applicability when the number of actions is very large or even infinite. 相似文献
5.
We characterize the value function and the optimal stopping time for a large class of optimal stopping problems where the underlying process to be stopped is a fairly general Markov process. The main result is inspired by recent findings for Lévy processes obtained essentially via the Wiener–Hopf factorization. The main ingredient in our approach is the representation of the β-excessive functions as expected suprema. A variety of examples is given. 相似文献
6.
7.
刘克 《应用数学学报(英文版)》1999,15(2):183-189
1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe... 相似文献
8.
We study controlled Markov processes where multiple decisions need to be made for each state. We present conditions on the cost structure and the state transition mechanism of the process under which optimal decisions are restricted to a subset of the decision space. As a result, the numerical computation of the optimal policy may be significantly expedited. 相似文献
9.
《Operations Research Letters》2022,50(2):218-223
We consider a finite state-action discounted constrained Markov decision process with uncertain running costs and known transition probabilities. We propose equivalent linear programming, second-order cone programming and semidefinite programming problems for the robust constrained Markov decision processes when the uncertain running cost vectors belong to polytopic, ellipsoidal, and semidefinite cone uncertainty sets, respectively. As an application, we study a variant of a machine replacement problem and perform numerical experiments on randomly generated instances of various sizes. 相似文献
10.
The Markov decision process is studied under the maximization of the probability that total discounted rewards exceed a target level. We focus on and study the dynamic programing equations of the model. We give various properties of the optimal return operator and, for the infinite planning-horizon model, we characterize the optimal value function as a maximal fixed point of the previous operator. Various turnpike results relating the finite and infinite-horizon models are also given. 相似文献
11.
《Optimization》2012,61(9):1133-1150
This article presents a new method of linear programming (LP) for solving Markov decision processes (MDPs) based on the simplex method (SM). SM has shown to be the most efficient method in many practical problems; unfortunately, classical SM has an exponential complexity. Therefore, new SMs have emerged for obtaining optimal solutions in the most efficient way. The cosine simplex method (CSM) is one of them. CSM is based on the Karush Kuhn Tucker conditions, and is able to efficiently solve general LP problems. This work presents a new method named the Markov Cosine Simplex Method (MCSM) for solving MDP problems, which is an extension of CSM. In this article, the efficiency of MCSM is compared to the traditional revised simplex method (RSM); experimental results show that MCSM is far more efficient than RSM. 相似文献
12.
In this paper we consider the problem of optimal stopping and continuous control on some local parameters of a piecewise-deterministic Markov processes (PDP's). Optimality equations are obtained in terms of a set of variational inequalities as well as on the first jump time operator of the PDP. It is shown that if the final cost function is absolutely continuous along trajectories then so is the value function of the optimal stopping problem with continuous control. These results unify and generalize previous ones in the current literature. 相似文献
13.
《Operations Research Letters》2021,49(1):55-61
This note presents a technique that is useful for the study of piecewise deterministic Markov decision processes (PDMDPs) with general policies and unbounded transition intensities. This technique produces an auxiliary PDMDP from the original one. The auxiliary PDMDP possesses certain desired properties, which may not be possessed by the original PDMDP. We apply this technique to risk-sensitive PDMDPs with total cost criteria, and comment on its connection with the uniformization technique. 相似文献
14.
Strongly excessive functions play an important role in the theory of Markov decision processes and Markov games. In this paper the following question is investigated: What are the properties of Markov decision processes which possess a strongly excessive function? A probabilistic characterization is presented in the form of a random drift through a partitioned state space. For strongly excessive functions which have a positive lower bound a characterization is given in terms of the lifetime distribution of the process.Finally we give a characterization in terms of the spectral radius. 相似文献
15.
Markov stopping games with random priority 总被引:1,自引:0,他引:1
Krzysztof Szajowski 《Mathematical Methods of Operations Research》1994,39(1):69-84
In the paper a construction of Nash equilibria for a random priority finite horizon two-person non-zero sum game with stopping of Markov process is given. The method is used to solve the two-person non-zero-sum game version of the secretary problem. Each player can choose only one applicant. If both players would like to select the same one, then the lottery chooses the player. The aim of the players is to choose the best candidate. An analysis of the solutions for different lotteries is given. Some lotteries admit equilibria with equal Nash values for the players.The research was supported in part by Committee of Scientific Research under Grant KBN 211639101. 相似文献
16.
V. Aggarwal R. Chandrasekaran K. P. K. Nair 《Journal of Optimization Theory and Applications》1977,21(1):27-37
A finite-state Markov decision process, in which, associated with each action in each state, there are two rewards, is considered. The objective is to optimize the ratio of the two rewards over an infinite horizon. In the discounted version of this decision problem, it is shown that the optimal value is unique and the optimal strategy is pure and stationary; however, they are dependent on the starting state. Also, a finite algorithm for computing the solution is given. 相似文献
17.
This paper attempts to study the optimal stopping time for semi- Markov processes (SMPs) under the discount optimization criteria with unbounded cost rates. In our work, we introduce an explicit construction of the equivalent semi-Markov decision processes (SMDPs). The equivalence is embodied in the expected discounted cost functions of SMPs and SMDPs, that is, every stopping time of SMPs can induce a policy of SMDPs such that the value functions are equal, and vice versa. The existence of the optimal stopping time of SMPs is proved by this equivalence relation. Next, we give the optimality equation of the value function and develop an effective iterative algorithm for computing it. Moreover, we show that the optimal and ε-optimal stopping time can be characterized by the hitting time of the special sets. Finally, to illustrate the validity of our results, an example of a maintenance system is presented in the end. 相似文献
18.
19.
《Optimization》2012,61(4-5):495-505
This paper investigates properties of the optimality equation and optimal policies in discrete time Markov decision processes with expected discounted total rewards under weak conditions that the model is well defined and the optimality equation is true. The optimal value function is characterized as a solution of the optimality equation and the structure of optimal policies is also given. 相似文献