首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Temporal Difference-Based Policy Iteration for Optimal Control of Stochastic Systems
Authors:Kang Cheng  Shumin Fei  Kanjian Zhang  Xiaomei Liu  Haikun Wei
Institution:1. School of Automation, Southeast University, Nanjing, Jiangsu, 210096, China
Abstract:In this paper, a unified policy iteration approach is presented for the optimal control problem of stochastic system with discounted average cost and continuous state space. The approach consists of temporal difference learning-based potential function approximation algorithms and performance difference formula-based policy improvement. The approximation algorithms are derived by solving the Poisson equation-based fixed-point equation, which can be viewed as continuous versions of least squares policy evaluation algorithm and least squares temporal difference algorithm. The simulations are provided to illustrate the effectiveness of the approach.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号