Markov programming with policy constraints |
| |
Authors: | NAJ Hastings D Sadjadi |
| |
Institution: | Monash University, Melbourne, Australia;University of Bradford, Bradford, England |
| |
Abstract: | In the theory and applications of Markov decision processes introduced by Howard and subsequently developed by many authors, it is assumed that actions can be chosen independently at each state. A policy constrained Markov decision process is one where selecting a given action in one state restricts the choice of actions in another. This note describes a method for determining a maximal gain policy in the policy constrained case. The method involves the use of bounds on the gain of the feasible policies to produce a policy ranking list. This list then forms a basis for a bounded enumeration procedure which yields the optimal policy. |
| |
Keywords: | |
本文献已被 ScienceDirect 等数据库收录! |
|