摘 要: | In this paper, we discuss the structure of optimal policies for discountedsemi--Markov decision programming with unbounded rewards: {S, (A(i), i∈S), q, t,r,V_α}, where state space S is a countable set; in state i∈S, available action setA(i) is any set, and (A(i),(i)) is a measurable space; q is a time homogeneousfamily of jumps of states; t is a distributiou family of state jump's time, andonly depends on current state and current action too; V_αis the αa-discounted totalexpected reward.
|