An optimality principle for Markovian decision processes期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

An optimality principle for Markovian decision processes

Authors:	Paul J Schweitzer Bezalel Gavish

Affiliation:	IBM Thomas J. Watson Research Center, Yorktown Heights, New York 10598 USA;IBM Israel Scientific Center, Haifa, Israel

Abstract:	The following optimality principle is established for finite undiscounted or discounted Markov decision processes: If a policy is (gain, bias, or discounted) optimal in one state, it is also optimal for all states reachable from this state using this policy. The optimality principle is used constructively to demonstrate the existence of a policy that is optimal in every state, and then to derive the coupled functional equations satisfied by the optimal return vectors. This reverses the usual sequence, where one first establishes (via policy iteration or linear programming) the solvability of the coupled functional equations, and then shows that the solution is indeed the optimal return vector and that the maximizing policy for the functional equations is optimal for every state.

Keywords:
本文献已被 ScienceDirect 等数据库收录！