基于多智能体强化学习的混合博弈模式下多无人机辅助通信系统设计 MARL-based Design of Multi-Unmanned Aerial Vehicle Assisted Communication System with Hybrid Gaming Mode期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于多智能体强化学习的混合博弈模式下多无人机辅助通信系统设计

引用本文：	吴官翰,贾维敏,赵建伟,高飞飞,姚敏立.基于多智能体强化学习的混合博弈模式下多无人机辅助通信系统设计[J].电子与信息学报,2022,44(3):940-950.

作者姓名：	吴官翰贾维敏赵建伟高飞飞姚敏立

作者单位：	1.火箭军工程大学西安 7100382.中国酒泉卫星发射中心酒泉 7350003.清华大学北京 100084

摘要：	空天地一体化通信作为未来6G的发展方向,很好地弥补了当前无线通信覆盖不足的弊端.该文提出一种基于多智能体强化学习(MARL)的多无人机(Multi-UAV)辅助通信算法,在用户与无人机(UAVs)构成的混合博弈模式下求解纳什均衡近似解,解决了动态环境下UAVs轨迹设计、多维资源调度以及用户接入策略联合优化问题.结合马尔...
关键词：	多无人机辅助通信多智能体强化学习混合博弈纳什均衡
收稿时间：	2021-07-02
MARL-based Design of Multi-Unmanned Aerial Vehicle Assisted Communication System with Hybrid Gaming Mode

WU Guanhan,JIA Weimin,ZHAO Jianwei,GAO Feifei,YAO Minli.MARL-based Design of Multi-Unmanned Aerial Vehicle Assisted Communication System with Hybrid Gaming Mode[J].Journal of Electronics & Information Technology,2022,44(3):940-950.

Authors:	WU Guanhan JIA Weimin ZHAO Jianwei GAO Feifei YAO Minli

Institution:	1.Rocket Force University of Engineering, Xi’an 710038, China2.Jiuquan Satellite Launch Center, Jiuquan 735000, China3.Tsinghua University, Beijing 100084, China

Abstract:	As the future development direction of 6G, integrated space-air-ground communication well compensates for the drawback of insufficient current wireless communication coverage. In this paper, a Multi-Unmanned Aerial Vehicle (Multi-UAV) assisted communication algorithm with Multi-Agent Reinforcement Learning (MARL) is proposed to solve the Nash equilibrium approximate solution in a hybrid game model composed of users and UAVs and solve the joint optimization problem of UAV trajectory design, multidimensional resource scheduling and user access strategy in dynamic environment. The Markov game concept is exploited to model this continuous decision process with a Centralized Training Distributed Execution (CTDE) mechanism, and the Proximal Policy Optimization (PPO) algorithm is extended to the multi-agent domain. Two policy output modes are designed for the action space, where both the discrete and continuous actions coexist. Then, the implementation is improved by combining Beta policy. Finally, the effectiveness of the algorithm is verified by simulation experiments.

Keywords:
本文献已被万方数据等数据库收录！
	点击此处可从《电子与信息学报》浏览原始摘要信息
	点击此处可从《电子与信息学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏