首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Because of their convincing performance, there is a growing interest in using evolutionary algorithms for reinforcement learning. We propose learning of neural network policies by the covariance matrix adaptation evolution strategy (CMA-ES), a randomized variable-metric search algorithm for continuous optimization. We argue that this approach, which we refer to as CMA Neuroevolution Strategy (CMA-NeuroES), is ideally suited for reinforcement learning, in particular because it is based on ranking policies (and therefore robust against noise), efficiently detects correlations between parameters, and infers a search direction from scalar reinforcement signals. We evaluate the CMA-NeuroES on five different (Markovian and non-Markovian) variants of the common pole balancing problem. The results are compared to those described in a recent study covering several RL algorithms, and the CMA-NeuroES shows the overall best performance.  相似文献   

2.
3.
Fast and simple approximation schemes for generalized flow   总被引:3,自引:0,他引:3  
We present fast and simple fully polynomial-time approximation schemes (FPTAS) for generalized versions of maximum flow, multicommodity flow, minimum cost maximum flow, and minimum cost multicommodity flow. We extend and refine fractional packing frameworks introduced in FPTAS’s for traditional multicommodity flow and packing linear programs. Our FPTAS’s dominate the previous best known complexity bounds for all of these problems, some by more than a factor of n 2, where n is the number of nodes. This is accomplished in part by introducing an efficient method of solving a sequence of generalized shortest path problems. Our generalized multicommodity FPTAS’s are now as fast as the best non-generalized ones. We believe our improvements make it practical to solve generalized multicommodity flow problems via combinatorial methods. Received: June 3, 1999 / Accepted: May 22, 2001?Published online September 17, 2001  相似文献   

4.
本文讨论了再生核Hilbert 空间上一类广泛的正则化回归算法的学习率问题. 在分析算法的样本误差时, 我们利用了一种复加权的经验过程, 保证了方差与惩罚泛函同时被阈值控制, 从而避免了繁琐的迭代过程. 本文得到了比之前文献结果更为快速的学习率.  相似文献   

5.
With the rapid development of artificial intelligence in recent years, applying various learning techniques to solve mixed-integer linear programming(MILP) problems has emerged as a burgeoning research domain. Apart from constructing end-to-end models directly, integrating learning approaches with some modules in the traditional methods for solving MILPs is also a promising direction. The cutting plane method is one of the fundamental algorithms used in modern MILP solvers, and the selection of ...  相似文献   

6.
7.
Prabir Daripa 《PAMM》2007,7(1):2020065-2020066
A brief review of our fast algorithms is given in this short paper. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

8.
Reinforcement learning schemes perform direct on-line search in control space. This makes them appropriate for modifying control rules to obtain improvements in the performance of a system. The effectiveness of a reinforcement learning strategy is studied here through the training of a learning classifier system (LCS) that controls the movement of an autonomous vehicle in simulated paths including left and right turns. The LCS comprises a set of condition-action rules (classifiers) that compete to control the system and evolve by means of a genetic algorithm (GA). Evolution and operation of classifiers depend upon an appropriate credit assignment mechanism based on reinforcement learning. Different design options and the role of various parameters have been investigated experimentally. The performance of vehicle movement under the proposed evolutionary approach is superior compared with that of other (neural) approaches based on reinforcement learning that have been applied previously to the same benchmark problem.  相似文献   

9.
We present an algorithm which aggregates online when learning to behave optimally in an average reward Markov decision process. The algorithm is based on the reinforcement learning algorithm UCRL and uses confidence intervals for aggregating the state space. We derive bounds on the regret our algorithm suffers with respect to an optimal policy. These bounds are only slightly worse than the original bounds for UCRL.  相似文献   

10.
The CPR (“cumulative proportional reinforcement”) learning rule stipulates that an agent chooses a move with a probability proportional to the cumulative payoff she obtained in the past with that move. Previously considered for strategies in normal form games (Laslier, Topol and Walliser, Games and Econ. Behav., 2001), the CPR rule is here adapted for actions in perfect information extensive form games. The paper shows that the action-based CPR process converges with probability one to the (unique) subgame perfect equilibrium.Received: October 2004  相似文献   

11.
12.
A new approach to develop parallel and distributed algorithms of scheduling tasks in parallel computers is proposed. A game theoretical model with the use of genetic-algorithms based learning machines called classifier systems as players in a game, serves as a theoretical framework of the approach. Experimental study of such a system shows its self-organizing features and the ability of collective behaviour. Following this approach a parallel and distributed scheduler is described. A simple version of the proposed scheduler has been implemented. Results of the experimental study of the scheduler demonstrate its high performance.  相似文献   

13.
14.
Live virtual machine migration can have a major impact on how a cloud system performs, as it consumes significant amount of network resources, such as bandwidth. A virtual machine migration occurs when a host becomes over-utilised or under-utilised. In this paper, we propose a network aware live migration strategy that monitors the current demand level of bandwidth when network congestion occurs and performs appropriate actions based on what it is experiencing. The Artificial Intelligence technique that is based on Reinforcement Learning acts as a decision support system, enabling an agent to learn an optimal time to schedule a virtual machine migration depending on the current bandwidth usage in a data centre. We show from our results that an autonomous agent can learn to utilise available network resources such as bandwidth when network saturation occurs at peak times.  相似文献   

15.
The generalization of policies in reinforcement learning is a main issue, both from the theoretical model point of view and for their applicability. However, generalizing from a set of examples or searching for regularities is a problem which has already been intensively studied in machine learning. Thus, existing domains such as Inductive Logic Programming have already been linked with reinforcement learning. Our work uses techniques in which generalizations are constrained by a language bias, in order to regroup similar states. Such generalizations are principally based on the properties of concept lattices. To guide the possible groupings of similar states of the environment, we propose a general algebraic framework, considering the generalization of policies through a partition of the set of states and using a language bias as an a priori knowledge. We give a practical application as an example of our theoretical approach by proposing and experimenting a bottom-up algorithm.  相似文献   

16.
Based on the existing pivot rules, the simplex method for linear programming is not polynomial in the worst case. Therefore, the optimal pivot of the simplex method is crucial. In this paper, we propose the optimal rule to find all the shortest pivot paths of the simplex method for linear programming problems based on Monte Carlo tree search. Specifically, we first propose the SimplexPseudoTree to transfer the simplex method into tree search mode while avoiding repeated basis variables. Secondly...  相似文献   

17.
In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of “artificial trajectories” from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning.  相似文献   

18.
We present a self-adaptive and distributed metaheuristic called Coalition-Based Metaheuristic (CBM). This method is based on the Agent Metaheuristic Framework (AMF) and hyper-heuristic approach. In CBM, several agents, grouped in a coalition, concurrently explore the search space of a given problem instance. Each agent modifies a solution with a set of operators. The selection of these operators is determined by heuristic rules dynamically adapted by individual and collective learning mechanisms. The intention of this study is to exploit AMF and hyper-heuristic approaches to conceive an efficient, flexible and modular metaheuristic. AMF provides a generic model of metaheuristic that encourages modularity, and hyper-heuristic approach gives some guidelines to design flexible search methods. The performance of CBM is assessed by computational experiments on the vehicle routing problem.  相似文献   

19.
Games can be easy to construct but difficult to solve due to current methods available for finding the Nash Equilibrium. This issue is one of many that face modern game theorists and those analysts that need to model situations with multiple decision-makers. This paper explores the use of reinforcement learning, a standard artificial intelligence technique, as a means to solve a simple dynamic airline pricing game. Three different reinforcement learning approaches are compared: SARSA, Q-learning and Monte Carlo Learning. The pricing game solution is surprisingly sophisticated given the game's simplicity and this sophistication is reflected in the learning results. The paper also discusses extra analytical benefit obtained from applying reinforcement learning to these types of problems.  相似文献   

20.
This paper studies the problem of synthesizing control policies for uncertain continuous-time nonlinear systems from linear temporal logic (LTL) specifications using model-based reinforcement learning (MBRL). Rather than taking an abstraction-based approach, we view the interaction between the LTL formula’s corresponding Büchi automaton and the nonlinear system as a hybrid automaton whose discrete dynamics match exactly those of the Büchi automaton. To find satisfying control policies, we pose a sequence of optimal control problems associated with states in the accepting run of the automaton and leverage control barrier functions (CBFs) to prevent specification violation. Since solving many optimal control problems for a nonlinear system is computationally intractable, we take a learning-based approach in which the value function of each problem is learned online in real-time. Specifically, we propose a novel off-policy MBRL algorithm that allows one to simultaneously learn the uncertain dynamics of the system and the value function of each optimal control problem online while adhering to CBF-based safety constraints. Unlike related approaches, the MBRL method presented herein decouples convergence, stability, and safety, allowing each aspect to be studied independently, leading to stronger safety guarantees than those developed in related works. Numerical results are presented to validate the efficacy of the proposed method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号