期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Finite-state approximations for denumerable-state infinite-horizon discounted Markov decision processes

D.J White 《Journal of Mathematical Analysis and Applications》1980,74(1):292-295

相似文献

2.

Finite state approximations for denumerable state infinite horizon discounted Markov decision processes with unbounded rewards

D.J White 《Journal of Mathematical Analysis and Applications》1982,86(1):292-306

相似文献

3.

Fixed point theorems for discounted finite markov decision processes

《Journal of Mathematical Analysis and Applications》1986,116(2):594-597

相似文献

4.

First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs

Yong-hui Huang Guo Xian-ping 《应用数学学报(英文版)》2011,27(2):177-190

This paper considers a first passage model for discounted semi-Markov decision processes with denumerable states and nonnegative costs.The criterion to be optimized is the expected discounted cost incurred during a first passage time to a given target set.We first construct a semi-Markov decision process under a given semi-Markov decision kernel and a policy.Then,we prove that the value function satisfies the optimality equation and there exists an optimal(or e-optimal) stationary policy under suitable conditions by using a minimum nonnegative solution approach.Further we give some properties of optimal policies.In addition,a value iteration algorithm for computing the value function and optimal policies is developed and an example is given.Finally,it is showed that our model is an extension of the first passage models for both discrete-time and continuous-time Markov decision processes. 相似文献

5.

Continuous time shock markov decision processes with discounted criterion

《Optimization》2012,61(2-3):271-283

This paper presents a new concept of Markov decision processes: continuous time shock Markov decision processes, which model Markovian controlled systems sequentially shocked by its environment. Between two adjacent shocks, the system can be modeled by continuous time Markov decision processes. But according to each shock, the system's parameters are changed and an instantaneous state transition occurs. After presenting the model, we prove that the optimality equation, which consists of countable equations, has a unique solution in some function space Ω 相似文献

6.

Stochastic approximations of constrained discounted Markov decision processes

François Dufour Tomás Prieto-Rumeau 《Journal of Mathematical Analysis and Applications》2014

We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure _μ_n

μ_{n}

of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model. 相似文献

7.

Transient solutions for multidimensional denumerable state Markov processes

Guang-Hui Hsu De-Ju Xu 《Queueing Systems》1996,23(1-4):317-329

The computational problem of transient solutions for denumerable state Markov Processes (MP's) has been solved by Hsu and Yuan [12], who derived an efficient algorithm with uniform error. However, when the state space of an MP is of two or more dimensions, even for computational methods dealing with stationary solutions, only the case where one of the dimensions is infinite and all the others are finite has been studied. In this paper, we study transient solutions for multidimensional denumerable state MP's and give an algorithm with uniform error. Some numerical results are presented.Supported by the National Natural Science Foundation of China. 相似文献

8.

Multi-objective infinite-horizon discounted Markov decision processes

D.J White 《Journal of Mathematical Analysis and Applications》1982,89(2):639-647

相似文献

9.

Hierarchical algorithms for discounted and weighted Markov decision processes

Abbad M. Daoui C. 《Mathematical Methods of Operations Research》2003,58(2):237-245

We consider a discrete time finite Markov decision process (MDP) with the discounted and weighted reward optimality criteria. In [1] the authors considered some decomposition of limiting average MDPs. In this paper, we use an analogous approach for discounted and weighted MDPs. Then, we construct some hierarchical decomposition algorithms for both discounted and weighted MDPs. 相似文献

10.

Adaptive policy-iteration and policy-value-iteration for discounted Markov decision processes

Prof. Dr. G. Hübner Prof. Dr. M. Schäl 《Mathematical Methods of Operations Research》1991,35(6):491-503

The paper is concerned with a discounted Markov decision process with an unknown parameter which is estimated anew at each stage. Algorithms are proposed which are intermediate between and include the classical (but time consuming) principle of estimation and control and the simpler nonstationary value iteration which converges more slowly. These algorithms perform one single policy improvement step after each estimation, and then the policy thus obtained is evaluated completely (policy-iteration) or incompletely (policy-value-iteration). The results show that especially both these methods lead to asymptotically discount optimal policies. In addition, these results are generalized to cases where systematic errors will not vanish when the number of stages increases.Dedicated to Prof. Dr. K. Hinderer on the occassion of his 60^th birthday 相似文献

11.

Monotone value iteration for discounted finite Markov decision processes

D.J White 《Journal of Mathematical Analysis and Applications》1985,109(2):311-324

相似文献

12.

Weighted discounted Markov decision processes with perturbation

刘克《应用数学学报(英文版)》1999,15(2):183-189

1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe... 相似文献

13.

Continuous time markov decision processes with interventions

《Stochastics An International Journal of Probability and Stochastic Processes》2013,85(4):235-274

We study Markov jump decision processes with both continuously and instantaneouslyacting decisions and with deterministic drift between jumps. Such decision processes were recentlyintroduced and studied from discrete time approximations point of view by Van der Duyn Schouten.Weobtain necessary and sufficient optimality conditions for these decision processes in terms of equations and inequalities of quasi-variational type. By means of the latter we find simple necessaryand sufficient conditions for the existence of stationary optimal policies in such processes with finite state and action spaces, both in the discounted and average per unit time reward cases. 相似文献

14.

A homotopy approach for infinite horizon discounted Markov decision processes

Douglas John White 《Mathematical Methods of Operations Research》1996,43(3):353-372

In this paper we consider a homotopy deformation approach to solving Markov decision process problems by the continuous deformation of a simpler Markov decision process problem until it is identical with the original problem. Algorithms and performance bounds are given. 相似文献

15.

Constrained markov decision processes with compact state and action spaces: the average case

《Optimization》2012,61(2):255-269

Constrained Markov decision processes with compact state and action spaces are studied under long-run average reward or cost criteria. By introducing a corresponding Lagrange function, a saddle-point theorem is given, by which the existence of a constrained optimal pair of initial state distribution and policy is shown. Also, under the hypothesis of Doeblin, the functional characterization of a constrained optimal policy is obtained 相似文献

16.

Finite-state approximations to denumerable-state dynamic programs

B.L Fox 《Journal of Mathematical Analysis and Applications》1971

We show how to find a sequence of policies for essentially finite-state dynamic programs such that the corresponding vector of optimal returns converges pointwise to that of a denumerable-state dynamic program. The corresponding result for stochastic games is also given. 相似文献

17.

Density estimation and adaptive control of markov processes: Average and discounted criteria

Onésimo Hernández-Lerma Rolando Cavazos-Cadena 《Acta Appl Math》1990,20(3):285-307

We consider a class of discrete-time Markov control processes with Borel state and action spaces, and ^d i.i.d. disturbances with unknown distribution . Under mild semi-continuity and compactness conditions, and assuming that is absolutely continuous with respect to Lebesgue measure, we establish the existence of adaptive control policies which are (1) optimal for the average-reward criterion, and (2) asymptotically optimal in the discounted case. Our results are obtained by taking advantage of some well-known facts in the theory of density estimation. This approach allows us to avoid restrictive conditions on the state space and/or on the system's transition law imposed in recent works, and on the other hand, it clearly shows the way to other applications of nonparametric (density) estimation to adaptive control.Research partially supported by The Third World Academy of Sciences under Research Grant No. MP 898-152. 相似文献

18.

Computational comparison of value iteration algorithms for discounted Markov decision processes

L.C. Thomas R. Harley A.C. Lavercombe 《Operations Research Letters》1983,2(2):72-76

This paper describes a computational comparison of value iteration algorithms for discounted Markov decision processes. 相似文献

19.

Generalized polynomial approximations in Markovian decision processes

Paul J. Schweitzer Abraham Seidmann 《Journal of Mathematical Analysis and Applications》1985,110(2):568-582

Fitting the value function in a Markovian decision process by a linear superposition of M basis functions reduces the problem dimensionality from the number of states down to M, with good accuracy retained if the value function is a smooth function of its argument, the state vector. This paper provides, for both the discounted and undiscounted cases, three algorithms for computing the coefficients in the linear superposition: linear programming, policy iteration, and least squares. 相似文献

20.

Finite state approximations for denumerable-state infinite horizon contracted Markov decision processes: The policy space method

D.J White 《Journal of Mathematical Analysis and Applications》1979,72(2):512-523

相似文献