期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Neuroevolution strategies for episodic reinforcement learning

Verena Heidrich-Meisner Christian Igel 《Journal of Algorithms in Cognition, Informatics and Logic》2009,64(4):152-168

Because of their convincing performance, there is a growing interest in using evolutionary algorithms for reinforcement learning. We propose learning of neural network policies by the covariance matrix adaptation evolution strategy (CMA-ES), a randomized variable-metric search algorithm for continuous optimization. We argue that this approach, which we refer to as CMA Neuroevolution Strategy (CMA-NeuroES), is ideally suited for reinforcement learning, in particular because it is based on ranking policies (and therefore robust against noise), efficiently detects correlations between parameters, and infers a search direction from scalar reinforcement signals. We evaluate the CMA-NeuroES on five different (Markovian and non-Markovian) variants of the common pole balancing problem. The results are compared to those described in a recent study covering several RL algorithms, and the CMA-NeuroES shows the overall best performance. 相似文献

2.

Regime-switching recurrent reinforcement learning for investment decision making

Dietmar Maringer Tikesh Ramtohul 《Computational Management Science》2012,9(1):89-107

相似文献

3.

Fast and simple approximation schemes for generalized flow 总被引：3，自引：0，他引：3

Lisa K. Fleischer Kevin D. Wayne 《Mathematical Programming》2002,91(2):215-238

We present fast and simple fully polynomial-time approximation schemes (FPTAS) for generalized versions of maximum flow, multicommodity flow, minimum cost maximum flow, and minimum cost multicommodity flow. We extend and refine fractional packing frameworks introduced in FPTAS’s for traditional multicommodity flow and packing linear programs. Our FPTAS’s dominate the previous best known complexity bounds for all of these problems, some by more than a factor of n ², where n is the number of nodes. This is accomplished in part by introducing an efficient method of solving a sequence of generalized shortest path problems. Our generalized multicommodity FPTAS’s are now as fast as the best non-generalized ones. We believe our improvements make it practical to solve generalized multicommodity flow problems via combinatorial methods. Received: June 3, 1999 / Accepted: May 22, 2001?Published online September 17, 2001 相似文献

4.

正则化回归算法的快速学习率

下载免费PDF全文

佟宏志陈迪荣杨凤红《中国科学:数学》2012,42(12):1251-1262

本文讨论了再生核Hilbert 空间上一类广泛的正则化回归算法的学习率问题. 在分析算法的样本误差时, 我们利用了一种复加权的经验过程, 保证了方差与惩罚泛函同时被阈值控制, 从而避免了繁琐的迭代过程. 本文得到了比之前文献结果更为快速的学习率. 相似文献

5.

Enhancing cut selection through reinforcement learning

Shengchao Wang Liang Chen Lingfeng Niu Yu-Hong Dai 《中国科学数学(英文版)》2024,(6):1377-1394

With the rapid development of artificial intelligence in recent years, applying various learning techniques to solve mixed-integer linear programming(MILP) problems has emerged as a burgeoning research domain. Apart from constructing end-to-end models directly, integrating learning approaches with some modules in the traditional methods for solving MILPs is also a promising direction. The cutting plane method is one of the fundamental algorithms used in modern MILP solvers, and the selection of ... 相似文献

6.

Generalized reinforcement learning in perfect-information games

Maxwell Pak Bing Xu 《International Journal of Game Theory》2016,45(4):985-1011

相似文献

7.

Fast algorithms for PDEs in simple and complex geometries

Prabir Daripa 《PAMM》2007,7(1):2020065-2020066

8.

Autonomous vehicle navigation using evolutionary reinforcement learning

《European Journal of Operational Research》1998,108(2):306-318

Reinforcement learning schemes perform direct on-line search in control space. This makes them appropriate for modifying control rules to obtain improvements in the performance of a system. The effectiveness of a reinforcement learning strategy is studied here through the training of a learning classifier system (LCS) that controls the movement of an autonomous vehicle in simulated paths including left and right turns. The LCS comprises a set of condition-action rules (classifiers) that compete to control the system and evolve by means of a genetic algorithm (GA). Evolution and operation of classifiers depend upon an appropriate credit assignment mechanism based on reinforcement learning. Different design options and the role of various parameters have been investigated experimentally. The performance of vehicle movement under the proposed evolutionary approach is superior compared with that of other (neural) approaches based on reinforcement learning that have been applied previously to the same benchmark problem. 相似文献

9.

Adaptive aggregation for reinforcement learning in average reward Markov decision processes

Ronald Ortner 《Annals of Operations Research》2013,208(1):321-336

We present an algorithm which aggregates online when learning to behave optimally in an average reward Markov decision process. The algorithm is based on the reinforcement learning algorithm UCRL and uses confidence intervals for aggregating the state space. We derive bounds on the regret our algorithm suffers with respect to an optimal policy. These bounds are only slightly worse than the original bounds for UCRL. 相似文献

10.

A reinforcement learning process in extensive form games

Jean-François Laslier Bernard Walliser 《International Journal of Game Theory》2005,33(2):219-227

The CPR (“cumulative proportional reinforcement”) learning rule stipulates that an agent chooses a move with a probability proportional to the cumulative payoff she obtained in the past with that move. Previously considered for strategies in normal form games (Laslier, Topol and Walliser, Games and Econ. Behav., 2001), the CPR rule is here adapted for actions in perfect information extensive form games. The paper shows that the action-based CPR process converges with probability one to the (unique) subgame perfect equilibrium.Received: October 2004 相似文献

11.

Combining statistical and reinforcement learning in rule-based classification

Jorge Muruzábal 《Computational Statistics》2001,16(3):341-359

相似文献

12.

Distributed scheduling using simple learning machines

《European Journal of Operational Research》1998,107(2):401-413

A new approach to develop parallel and distributed algorithms of scheduling tasks in parallel computers is proposed. A game theoretical model with the use of genetic-algorithms based learning machines called classifier systems as players in a game, serves as a theoretical framework of the approach. Experimental study of such a system shows its self-organizing features and the ability of collective behaviour. Following this approach a parallel and distributed scheduler is described. A simple version of the proposed scheduler has been implemented. Results of the experimental study of the scheduler demonstrate its high performance. 相似文献

13.

Nonconvergence to saddle boundary points under perturbed reinforcement learning

Georgios C. Chasparis Jeff S. Shamma Anders Rantzer 《International Journal of Game Theory》2015,44(3):667-699

相似文献

14.

A reinforcement learning approach for the scheduling of live migration from under utilised hosts

Martin Duggan Jim Duggan Enda Howley Enda Barrett 《Memetic Computing》2017,9(4):283-293

Live virtual machine migration can have a major impact on how a cloud system performs, as it consumes significant amount of network resources, such as bandwidth. A virtual machine migration occurs when a host becomes over-utilised or under-utilised. In this paper, we propose a network aware live migration strategy that monitors the current demand level of bandwidth when network congestion occurs and performs appropriate actions based on what it is experiencing. The Artificial Intelligence technique that is based on Reinforcement Learning acts as a decision support system, enabling an agent to learn an optimal time to schedule a virtual machine migration depending on the current bandwidth usage in a data centre. We show from our results that an autonomous agent can learn to utilise available network resources such as bandwidth when network saturation occurs at peak times. 相似文献

15.

Algebraic results and bottom-up algorithm for policies generalization in reinforcement learning using concept lattices

Marc Ricordeau Michel Liquire 《Nonlinear Analysis: Hybrid Systems》2008,2(2):684-694

The generalization of policies in reinforcement learning is a main issue, both from the theoretical model point of view and for their applicability. However, generalizing from a set of examples or searching for regularities is a problem which has already been intensively studied in machine learning. Thus, existing domains such as Inductive Logic Programming have already been linked with reinforcement learning. Our work uses techniques in which generalizations are constrained by a language bias, in order to regroup similar states. Such generalizations are principally based on the properties of concept lattices. To guide the possible groupings of similar states of the environment, we propose a general algebraic framework, considering the generalization of policies through a partition of the set of states and using a language bias as an a priori knowledge. We give a practical application as an example of our theoretical approach by proposing and experimenting a bottom-up algorithm. 相似文献

16.

Optimal pivot path of the simplex method for linear programming based on reinforcement learning

Anqi Li Tiande Guo Congying Han Bonan Li Haoran Li 《中国科学数学(英文版)》2024,(6):1263-1286

Based on the existing pivot rules, the simplex method for linear programming is not polynomial in the worst case. Therefore, the optimal pivot of the simplex method is crucial. In this paper, we propose the optimal rule to find all the shortest pivot paths of the simplex method for linear programming problems based on Monte Carlo tree search. Specifically, we first propose the SimplexPseudoTree to transfer the simplex method into tree search mode while avoiding repeated basis variables. Secondly... 相似文献

17.

Batch mode reinforcement learning based on the synthesis of artificial trajectories

Raphael Fonteneau Susan A. Murphy Louis Wehenkel Damien Ernst 《Annals of Operations Research》2013,208(1):383-416

In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of “artificial trajectories” from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning. 相似文献

18.

Coalition-based metaheuristic: a self-adaptive metaheuristic using reinforcement learning and mimetism

David Meignan Abderrafiaa Koukam Jean-Charles Créput 《Journal of Heuristics》2010,16(6):859-879

We present a self-adaptive and distributed metaheuristic called Coalition-Based Metaheuristic (CBM). This method is based on the Agent Metaheuristic Framework (AMF) and hyper-heuristic approach. In CBM, several agents, grouped in a coalition, concurrently explore the search space of a given problem instance. Each agent modifies a solution with a set of operators. The selection of these operators is determined by heuristic rules dynamically adapted by individual and collective learning mechanisms. The intention of this study is to exploit AMF and hyper-heuristic approaches to conceive an efficient, flexible and modular metaheuristic. AMF provides a generic model of metaheuristic that encourages modularity, and hyper-heuristic approach gives some guidelines to design flexible search methods. The performance of CBM is assessed by computational experiments on the vehicle routing problem. 相似文献

19.

Comparing reinforcement learning approaches for solving game theoretic models: a dynamic airline pricing game example

A Collins L Thomas 《The Journal of the Operational Research Society》2012,63(8):1165-1173

Games can be easy to construct but difficult to solve due to current methods available for finding the Nash Equilibrium. This issue is one of many that face modern game theorists and those analysts that need to model situations with multiple decision-makers. This paper explores the use of reinforcement learning, a standard artificial intelligence technique, as a means to solve a simple dynamic airline pricing game. Three different reinforcement learning approaches are compared: SARSA, Q-learning and Monte Carlo Learning. The pricing game solution is surprisingly sophisticated given the game's simplicity and this sophistication is reflected in the learning results. The paper also discusses extra analytical benefit obtained from applying reinforcement learning to these types of problems. 相似文献

20.

Temporal logic guided safe model-based reinforcement learning: A hybrid systems approach

《Nonlinear Analysis: Hybrid Systems》2023

This paper studies the problem of synthesizing control policies for uncertain continuous-time nonlinear systems from linear temporal logic (LTL) specifications using model-based reinforcement learning (MBRL). Rather than taking an abstraction-based approach, we view the interaction between the LTL formula’s corresponding Büchi automaton and the nonlinear system as a hybrid automaton whose discrete dynamics match exactly those of the Büchi automaton. To find satisfying control policies, we pose a sequence of optimal control problems associated with states in the accepting run of the automaton and leverage control barrier functions (CBFs) to prevent specification violation. Since solving many optimal control problems for a nonlinear system is computationally intractable, we take a learning-based approach in which the value function of each problem is learned online in real-time. Specifically, we propose a novel off-policy MBRL algorithm that allows one to simultaneously learn the uncertain dynamics of the system and the value function of each optimal control problem online while adhering to CBF-based safety constraints. Unlike related approaches, the MBRL method presented herein decouples convergence, stability, and safety, allowing each aspect to be studied independently, leading to stronger safety guarantees than those developed in related works. Numerical results are presented to validate the efficacy of the proposed method. 相似文献