首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. Typically, an agent receives its private observations providing a partial view of the true state of the environment. However, in realistic settings, the harsh environment might cause one or more agents to show arbitrarily faulty or malicious behavior, which may suffice to allow the current coordination mechanisms fail. In this paper, we study a practical scenario of multi-agent reinforcement learning systems considering the security issues in the presence of agents with arbitrarily faulty or malicious behavior. The previous state-of-the-art work that coped with extremely noisy environments was designed on the basis that the noise intensity in the environment was known in advance. However, when the noise intensity changes, the existing method has to adjust the configuration of the model to learn in new environments, which limits the practical applications. To overcome these difficulties, we present an Attention-based Fault-Tolerant (FT-Attn) model, which can select not only correct, but also relevant information for each agent at every time step in noisy environments. The multihead attention mechanism enables the agents to learn effective communication policies through experience concurrent with the action policies. Empirical results showed that FT-Attn beats previous state-of-the-art methods in some extremely noisy environments in both cooperative and competitive scenarios, much closer to the upper-bound performance. Furthermore, FT-Attn maintains a more general fault tolerance ability and does not rely on the prior knowledge about the noise intensity of the environment.  相似文献   

2.
Mobile crowdsensing (MCS) is attracting considerable attention in the past few years as a new paradigm for large-scale information sensing. Unmanned aerial vehicles (UAVs) have played a significant role in MCS tasks and served as crucial nodes in the newly-proposed space-air-ground integrated network (SAGIN). In this paper, we incorporate SAGIN into MCS task and present a Space-Air-Ground integrated Mobile CrowdSensing (SAG-MCS) problem. Based on multi-source observations from embedded sensors and satellites, an aerial UAV swarm is required to carry out energy-efficient data collection and recharging tasks. Up to date, few studies have explored such multi-task MCS problem with the cooperation of UAV swarm and satellites. To address this multi-agent problem, we propose a novel deep reinforcement learning (DRL) based method called Multi-Scale Soft Deep Recurrent Graph Network (ms-SDRGN). Our ms-SDRGN approach incorporates a multi-scale convolutional encoder to process multi-source raw observations for better feature exploitation. We also use a graph attention mechanism to model inter-UAV communications and aggregate extra neighboring information, and utilize a gated recurrent unit for long-term performance. In addition, a stochastic policy can be learned through a maximum-entropy method with an adjustable temperature parameter. Specifically, we design a heuristic reward function to encourage the agents to achieve global cooperation under partial observability. We train the model to convergence and conduct a series of case studies. Evaluation results show statistical significance and that ms-SDRGN outperforms three state-of-the-art DRL baselines in SAG-MCS. Compared with the best-performing baseline, ms-SDRGN improves 29.0% reward and 3.8% CFE score. We also investigate the scalability and robustness of ms-SDRGN towards DRL environments with diverse observation scales or demanding communication conditions.  相似文献   

3.
A traditional successive cancellation (SC) decoding algorithm produces error propagation in the decoding process. In order to improve the SC decoding performance, it is important to solve the error propagation. In this paper, we propose a new algorithm combining reinforcement learning and SC flip (SCF) decoding of polar codes, which is called a Q-learning-assisted SCF (QLSCF) decoding algorithm. The proposed QLSCF decoding algorithm uses reinforcement learning technology to select candidate bits for the SC flipping decoding. We establish a reinforcement learning model for selecting candidate bits, and the agent selects candidate bits to decode the information sequence. In our scheme, the decoding delay caused by the metric ordering can be removed during the decoding process. Simulation results demonstrate that the decoding delay of the proposed algorithm is reduced compared with the SCF decoding algorithm, based on critical set without loss of performance.  相似文献   

4.
Recently, deep reinforcement learning (RL) algorithms have achieved significant progress in the multi-agent domain. However, training for increasingly complex tasks would be time-consuming and resource intensive. To alleviate this problem, efficient leveraging of historical experience is essential, which is under-explored in previous studies because most existing methods fail to achieve this goal in a continuously dynamic system owing to their complicated design. In this paper, we propose a method for knowledge reuse called “KnowRU”, which can be easily deployed in the majority of multi-agent reinforcement learning (MARL) algorithms without requiring complicated hand-coded design. We employ the knowledge distillation paradigm to transfer knowledge among agents to shorten the training phase for new tasks while improving the asymptotic performance of agents. To empirically demonstrate the robustness and effectiveness of KnowRU, we perform extensive experiments on state-of-the-art MARL algorithms in collaborative and competitive scenarios. The results show that KnowRU outperforms recently reported methods and not only successfully accelerates the training phase, but also improves the training performance, emphasizing the importance of the proposed knowledge reuse for MARL.  相似文献   

5.
Poker has been considered a challenging problem in both artificial intelligence and game theory because poker is characterized by imperfect information and uncertainty, which are similar to many realistic problems like auctioning, pricing, cyber security, and operations. However, it is not clear that playing an equilibrium policy in multi-player games would be wise so far, and it is infeasible to theoretically validate whether a policy is optimal. Therefore, designing an effective optimal policy learning method has more realistic significance. This paper proposes an optimal policy learning method for multi-player poker games based on Actor-Critic reinforcement learning. Firstly, this paper builds the Actor network to make decisions with imperfect information and the Critic network to evaluate policies with perfect information. Secondly, this paper proposes a novel multi-player poker policy update method: asynchronous policy update algorithm (APU) and dual-network asynchronous policy update algorithm (Dual-APU) for multi-player multi-policy scenarios and multi-player sharing-policy scenarios, respectively. Finally, this paper takes the most popular six-player Texas hold ’em poker to validate the performance of the proposed optimal policy learning method. The experiments demonstrate the policies learned by the proposed methods perform well and gain steadily compared with the existing approaches. In sum, the policy learning methods of imperfect information games based on Actor-Critic reinforcement learning perform well on poker and can be transferred to other imperfect information games. Such training with perfect information and testing with imperfect information models show an effective and explainable approach to learning an approximately optimal policy.  相似文献   

6.
Device-to-device (D2D) technology enables direct communication between devices, which can effectively solve the problem of insufficient spectrum resources in 5G communication technology. Since the channels are shared among multiple D2D user pairs, it may lead to serious interference between D2D user pairs. In order to reduce interference, effectively increase network capacity, and improve wireless spectrum utilization, this paper proposed a distributed resource allocation algorithm with the joint of a deep Q network (DQN) and an unsupervised learning network. Firstly, a DQN algorithm was constructed to solve the channel allocation in the dynamic and unknown environment in a distributed manner. Then, a deep power control neural network with the unsupervised learning strategy was constructed to output an optimized channel power control scheme to maximize the spectrum transmit sum-rate through the corresponding constraint processing. As opposed to traditional centralized approaches that require the collection of instantaneous global network information, the algorithm proposed in this paper used each transmitter as a learning agent to make channel selection and power control through a small amount of state information collected locally. The simulation results showed that the proposed algorithm was more effective in increasing the convergence speed and maximizing the transmit sum-rate than other traditional centralized and distributed algorithms.  相似文献   

7.
In mobile edge computing systems, the edge server placement problem is mainly tackled as a multi-objective optimization problem and solved with mixed integer programming, heuristic or meta-heuristic algorithms, etc. These methods, however, have profound defect implications such as poor scalability, local optimal solutions, and parameter tuning difficulties. To overcome these defects, we propose a novel edge server placement algorithm based on deep q-network and reinforcement learning, dubbed DQN-ESPA, which can achieve optimal placements without relying on previous placement experience. In DQN-ESPA, the edge server placement problem is modeled as a Markov decision process, which is formalized with the state space, action space and reward function, and it is subsequently solved using a reinforcement learning algorithm. Experimental results using real datasets from Shanghai Telecom show that DQN-ESPA outperforms state-of-the-art algorithms such as simulated annealing placement algorithm (SAPA), Top-K placement algorithm (TKPA), K-Means placement algorithm (KMPA), and random placement algorithm (RPA). In particular, with a comprehensive consideration of access delay and workload balance, DQN-ESPA achieves up to 13.40% and 15.54% better placement performance for 100 and 300 edge servers respectively.  相似文献   

8.
With the development and appliance of multi-agent systems, multi-agent cooperation is becoming an important problem in artificial intelligence. Multi-agent reinforcement learning (MARL) is one of the most effective methods for solving multi-agent cooperative tasks. However, the huge sample complexity of traditional reinforcement learning methods results in two kinds of training waste in MARL for cooperative tasks: all homogeneous agents are trained independently and repetitively, and multi-agent systems need training from scratch when adding a new teammate. To tackle these two problems, we propose the knowledge reuse methods of MARL. On the one hand, this paper proposes sharing experience and policy within agents to mitigate training waste. On the other hand, this paper proposes reusing the policies learned by original teams to avoid knowledge waste when adding a new agent. Experimentally, the Pursuit task demonstrates how sharing experience and policy can accelerate the training speed and enhance the performance simultaneously. Additionally, transferring the learned policies from the N-agent enables the (N+1)–agent team to immediately perform cooperative tasks successfully, and only a minor training resource can allow the multi-agents to reach optimal performance identical to that from scratch.  相似文献   

9.
The comprehensively completed BDS-3 short-message communication system, known as the short-message satellite communication system (SMSCS), will be widely used in traditional blind communication areas in the future. However, short-message processing resources for short-message satellites are relatively scarce. To improve the resource utilization of satellite systems and ensure the service quality of the short-message terminal is adequate, it is necessary to allocate and schedule short-message satellite processing resources in a multi-satellite coverage area. In order to solve the above problems, a short-message satellite resource allocation algorithm based on deep reinforcement learning (DRL-SRA) is proposed. First of all, using the characteristics of the SMSCS, a multi-objective joint optimization satellite resource allocation model is established to reduce short-message terminal path transmission loss, and achieve satellite load balancing and an adequate quality of service. Then, the number of input data dimensions is reduced using the region division strategy and a feature extraction network. The continuous spatial state is parameterized with a deep reinforcement learning algorithm based on the deep deterministic policy gradient (DDPG) framework. The simulation results show that the proposed algorithm can reduce the transmission loss of the short-message terminal path, improve the quality of service, and increase the resource utilization efficiency of the short-message satellite system while ensuring an appropriate satellite load balance.  相似文献   

10.
With the development of artificial intelligence, intelligent communication jamming decision making is an important research direction of cognitive electronic warfare. In this paper, we consider a complex intelligent jamming decision scenario in which both communication parties choose to adjust physical layer parameters to avoid jamming in a non-cooperative scenario and the jammer achieves accurate jamming by interacting with the environment. However, when the situation becomes complex and large in number, traditional reinforcement learning suffers from the problems of failure to converge and a high number of interactions, which are fatal and unrealistic in a real warfare environment. To solve this problem, we propose a deep reinforcement learning based and maximum-entropy-based soft actor-critic (SAC) algorithm. In the proposed algorithm, we add an improved Wolpertinger architecture to the original SAC algorithm in order to reduce the number of interactions and improve the accuracy of the algorithm. The results show that the proposed algorithm shows excellent performance in various scenarios of jamming and achieves accurate, fast, and continuous jamming for both sides of the communication.  相似文献   

11.
Machine learning research has been able to solve problems in multiple domains. Machine learning represents an open area of research for solving optimisation problems. The optimisation problems can be solved using a metaheuristic algorithm, which can find a solution in a reasonable amount of time. However, the time required to find an appropriate metaheuristic algorithm, that would have the convenient configurations to solve a set of optimisation problems properly presents a problem. The proposal described in this article contemplates an approach that automatically creates metaheuristic algorithms given a set of optimisation problems. These metaheuristic algorithms are created by modifying their logical structure via the execution of an evolutionary process. This process employs an extension of the reinforcement learning approach that considers multi-agents in their environment, and a learning agent composed of an analysis process and a process of modification of the algorithms. The approach succeeded in creating a metaheuristic algorithm that managed to solve different continuous domain optimisation problems from the experiments performed. The implications of this work are immediate because they describe a basis for the generation of metaheuristic algorithms in an online-evolution.  相似文献   

12.
Most previous studies on multi-agent systems aim to coordinate agents to achieve a common goal, but the lack of scalability and transferability prevents them from being applied to large-scale multi-agent tasks. To deal with these limitations, we propose a deep reinforcement learning (DRL) based multi-agent coordination control method for mixed cooperative–competitive environments. To improve scalability and transferability when applying in large-scale multi-agent systems, we construct inter-agent communication and use hierarchical graph attention networks (HGAT) to process the local observations of agents and received messages from neighbors. We also adopt the gated recurrent units (GRU) to address the partial observability issue by recording historical information. The simulation results based on a cooperative task and a competitive task not only show the superiority of our method, but also indicate the scalability and transferability of our method in various scale tasks.  相似文献   

13.
Anomaly detection research was conducted traditionally using mathematical and statistical methods. This topic has been widely applied in many fields. Recently reinforcement learning has achieved exceptional successes in many areas such as the AlphaGo chess playing and video gaming etc. However, there were scarce researches applying reinforcement learning to the field of anomaly detection. This paper therefore aimed at proposing an adaptable asynchronous advantage actor-critic model of reinforcement learning to this field. The performances were evaluated and compared among classical machine learning and the generative adversarial model with variants. Basic principles of the related models were introduced firstly. Then problem definitions, modelling processes and testing were detailed. The proposed model differentiated the sequence and image from other anomalies by proposing appropriate neural networks of attention mechanism and convolutional network for the two kinds of anomalies, respectively. Finally, performances with classical models using public benchmark datasets (NSL-KDD, AWID and CICIDS-2017, DoHBrw-2020) were evaluated and compared. Experiments confirmed the effectiveness of the proposed model with the results indicating higher rewards and lower loss rates on the datasets during training and testing. The metrics of precision, recall rate and F1 score were higher than or at least comparable to the state-of-the-art models. We concluded the proposed model could outperform or at least achieve comparable results with the existing anomaly detection models.  相似文献   

14.
为了克服现有的WSN节点故障诊断方法所具有的难以实现在线诊断和诊断精度仍然不够高的缺点,设计了一种基于Sarsa算法和改进蚁群算法的WSN节点在线故障诊断方法。首先,建立了监测区域的网络模型和WSN节点故障诊断模型,然后,采用主成分分析法对节点故障样本数据进行降维,从而提高诊断效率,将样本数据作为层次,将故障诊断类作为各层节点建立层次树,采用改进的Sarsa算法求取各层节点的Q值,并将其用于初始化蚁群算法中路径的信息素,最后,提出了一种改进的蚁群算法求取从第一层出发的蚁群到各层节点之间的路径,将各层中信息素最大的节点作为最终的故障诊断类别。在Matlab环境下进行仿真实验,结果证明文中方法能有效实现WSN节点故障诊断,且与其它方法相比,具有故障诊断精确度高且能在线故障的优点,是一种有效的节点故障诊断方法。  相似文献   

15.
Despite the increasing applications, demands, and capabilities of drones, in practice they have only limited autonomy for accomplishing complex missions, resulting in slow and vulnerable operations and difficulty adapting to dynamic environments. To lessen these weaknesses, we present a computational framework for deducing the original intent of drone swarms by monitoring their movements. We focus on interference, a phenomenon that is not initially anticipated by drones but results in complicated operations due to its significant impact on performance and its challenging nature. We infer interference from predictability by first applying various machine learning methods, including deep learning, and then computing entropy to compare against interference. Our computational framework begins by building a set of computational models called double transition models from the drone movements and revealing reward distributions using inverse reinforcement learning. These reward distributions are then used to compute the entropy and interference across a variety of drone scenarios specified by combining multiple combat strategies and command styles. Our analysis confirmed that drone scenarios experienced more interference, higher performance, and higher entropy as they became more heterogeneous. However, the direction of interference (positive vs. negative) was more dependent on combinations of combat strategies and command styles than homogeneity.  相似文献   

16.
A pursuit–evasion game is a classical maneuver confrontation problem in the multi-agent systems (MASs) domain. An online decision technique based on deep reinforcement learning (DRL) was developed in this paper to address the problem of environment sensing and decision-making in pursuit–evasion games. A control-oriented framework developed from the DRL-based multi-agent deep deterministic policy gradient (MADDPG) algorithm was built to implement multi-agent cooperative decision-making to overcome the limitation of the tedious state variables required for the traditionally complicated modeling process. To address the effects of errors between a model and a real scenario, this paper introduces adversarial disturbances. It also proposes a novel adversarial attack trick and adversarial learning MADDPG (A2-MADDPG) algorithm. By introducing an adversarial attack trick for the agents themselves, uncertainties of the real world are modeled, thereby optimizing robust training. During the training process, adversarial learning was incorporated into our algorithm to preprocess the actions of multiple agents, which enabled them to properly respond to uncertain dynamic changes in MASs. Experimental results verified that the proposed approach provides superior performance and effectiveness for pursuers and evaders, and both can learn the corresponding confrontational strategy during training.  相似文献   

17.
Dempster–Shafer theory (DST), which is widely used in information fusion, can process uncertain information without prior information; however, when the evidence to combine is highly conflicting, it may lead to counter-intuitive results. Moreover, the existing methods are not strong enough to process real-time and online conflicting evidence. In order to solve the above problems, a novel information fusion method is proposed in this paper. The proposed method combines the uncertainty of evidence and reinforcement learning (RL). Specifically, we consider two uncertainty degrees: the uncertainty of the original basic probability assignment (BPA) and the uncertainty of its negation. Then, Deng entropy is used to measure the uncertainty of BPAs. Two uncertainty degrees are considered as the condition of measuring information quality. Then, the adaptive conflict processing is performed by RL and the combination two uncertainty degrees. The next step is to compute Dempster’s combination rule (DCR) to achieve multi-sensor information fusion. Finally, a decision scheme based on correlation coefficient is used to make the decision. The proposed method not only realizes adaptive conflict evidence management, but also improves the accuracy of multi-sensor information fusion and reduces information loss. Numerical examples verify the effectiveness of the proposed method.  相似文献   

18.
Multicasting in wireless systems is a natural way to exploit the redundancy in user requests in a content centric network. Power control and optimal scheduling can significantly improve the wireless multicast network’s performance under fading. However, the model-based approaches for power control and scheduling studied earlier are not scalable to large state spaces or changing system dynamics. In this paper, we use deep reinforcement learning, where we use function approximation of the Q-function via a deep neural network to obtain a power control policy that matches the optimal policy for a small network. We show that power control policy can be learned for reasonably large systems via this approach. Further, we use multi-timescale stochastic optimization to maintain the average power constraint. We demonstrate that a slight modification of the learning algorithm allows tracking of time varying system statistics. Finally, we extend the multi-time scale approach to simultaneously learn the optimal queuing strategy along with power control. We demonstrate the scalability, tracking and cross-layer optimization capabilities of our algorithms via simulations. The proposed multi-time scale approach can be used in general large state-space dynamical systems with multiple objectives and constraints, and may be of independent interest.  相似文献   

19.
The breakthrough of wireless energy transmission (WET) technology has greatly promoted the wireless rechargeable sensor networks (WRSNs). A promising method to overcome the energy constraint problem in WRSNs is mobile charging by employing a mobile charger to charge sensors via WET. Recently, more and more studies have been conducted for mobile charging scheduling under dynamic charging environments, ignoring the consideration of the joint charging sequence scheduling and charging ratio control (JSSRC) optimal design. This paper will propose a novel attention-shared multi-agent actor–critic-based deep reinforcement learning approach for JSSRC (AMADRL-JSSRC). In AMADRL-JSSRC, we employ two heterogeneous agents named charging sequence scheduler and charging ratio controller with an independent actor network and critic network. Meanwhile, we design the reward function for them, respectively, by considering the tour length and the number of dead sensors. The AMADRL-JSSRC trains decentralized policies in multi-agent environments, using a centralized computing critic network to share an attention mechanism, and it selects relevant policy information for each agent at every charging decision. Simulation results demonstrate that the proposed AMADRL-JSSRC can efficiently prolong the lifetime of the network and reduce the number of death sensors compared with the baseline algorithms.  相似文献   

20.
In this paper we present a fluorescence microscopy investigation of the digestive process of Litonotus lamella, which we used as a prototype of the raptorial feeding behaviour. The sequence of its digestive processes, from the capture of the prey (Euplotes crassus) to the formation of digestive vacuoles was evidenced using Acridine Orange, a high quantum yield fluorescent vital probe which has a great affinity with the acid vesicle compartments. Acid vesicle localization and displacement during digestion consisted of the following phases: in starved Litonotus vesicles were localized in the neck of the ciliate; after the capture, they were displaced around the vacoule containing the prey in early digestiion, whereas in advanced digestion tens of digestive vacuoles containing Euplotes fragments were spread over the body of the predator; 1 h after capture, the digestive process was complete, and acid vesicles reorganized in the neck of the predator. The most important feature we observed was that the digestive process occurred not in the single phagosome as in filter feeder ciliates, such as Paramecium, but in many digestive vacuoles scattered all over the predator cytoplasm, which asynchronously digest prey fragments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号