首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 25 毫秒
1.
Accelerating autonomous learning by using heuristic selection of actions   总被引:2,自引:0,他引:2  
This paper investigates how to make improved action selection for online policy learning in robotic scenarios using reinforcement learning (RL) algorithms. Since finding control policies using any RL algorithm can be very time consuming, we propose to combine RL algorithms with heuristic functions for selecting promising actions during the learning process. With this aim, we investigate the use of heuristics for increasing the rate of convergence of RL algorithms and contribute with a new learning algorithm, Heuristically Accelerated Q-learning (HAQL), which incorporates heuristics for action selection to the Q-Learning algorithm. Experimental results on robot navigation show that the use of even very simple heuristic functions results in significant performance enhancement of the learning rate.  相似文献   

2.
Intelligent optimization refers to the promising technique of integrating learning mechanisms into (meta-)heuristic search. In this paper, we use multi-agent reinforcement learning for building high-quality solutions for the multi-mode resource-constrained project scheduling problem (MRCPSP). We use a network of distributed reinforcement learning agents that cooperate to jointly learn a well-performing constructive heuristic. Each agent, being responsible for one activity, uses two simple learning devices, called learning automata, that learn to select a successor activity order and a mode, respectively. By coupling the reward signals for both learning tasks, we can clearly show the advantage of using reinforcement learning in search. We present some comparative results, to show that our method can compete with the best performing algorithms for the MRCPSP, yet using only simple learning schemes without the burden of complex fine-tuning.  相似文献   

3.
This paper presents a highly effective reinforcement learning enhancement of multi-neighborhood tabu search for the max-mean dispersion problem. The reinforcement learning component uses the Q-learning mechanism that incorporates the accumulated feedback information collected from the actions performed during the search to guide the generation of diversified solutions. The tabu search component employs 1-flip and reduced 2-flip neighborhoods to collaboratively perform the neighborhood exploration for attaining high-quality local optima. A learning automata method is integrated in tabu search to adaptively determine the probability of selecting each neighborhood. Computational experiments on 80 challenging benchmark instances demonstrate that the proposed algorithm is favorably competitive with the state-of-the-art algorithms in the literature, by finding new lower bounds for 3 instances and matching the best known results for the other instances. Key elements and properties are also analyzed to disclose the source of the benefits of our integration of learning mechanisms and tabu search.  相似文献   

4.
The convergence properties for reinforcement learning approaches, such as temporal differences and Q-learning, have been established under moderate assumptions for discrete state and action spaces. In practice, however, many systems have either continuous action spaces or a large number of discrete elements. This paper presents an approximate dynamic programming approach to reinforcement learning for continuous action set-point regulator problems, which learns near-optimal control policies based on scalar performance measures. The continuous-action space (CAS) algorithm uses derivative-free line search methods to obtain the optimal action in the continuous space. The theoretical convergence properties of the algorithm are presented. Several heuristic stopping criteria are investigated and practical application is illustrated by two example problems—the inverted pendulum balancing problem and the power system stabilization problem.  相似文献   

5.
In this paper, we consider the batch mode reinforcement learning setting, where the central problem is to learn from a sample of trajectories a policy that satisfies or optimizes a performance criterion. We focus on the continuous state space case for which usual resolution schemes rely on function approximators either to represent the underlying control problem or to represent its value function. As an alternative to the use of function approximators, we rely on the synthesis of “artificial trajectories” from the given sample of trajectories, and show that this idea opens new avenues for designing and analyzing algorithms for batch mode reinforcement learning.  相似文献   

6.
Reinforcement learning schemes perform direct on-line search in control space. This makes them appropriate for modifying control rules to obtain improvements in the performance of a system. The effectiveness of a reinforcement learning strategy is studied here through the training of a learning classifier system (LCS) that controls the movement of an autonomous vehicle in simulated paths including left and right turns. The LCS comprises a set of condition-action rules (classifiers) that compete to control the system and evolve by means of a genetic algorithm (GA). Evolution and operation of classifiers depend upon an appropriate credit assignment mechanism based on reinforcement learning. Different design options and the role of various parameters have been investigated experimentally. The performance of vehicle movement under the proposed evolutionary approach is superior compared with that of other (neural) approaches based on reinforcement learning that have been applied previously to the same benchmark problem.  相似文献   

7.
提出了一种新的算法.这个算法通过潜在地牺牲控制策略的最优性来获取其鲁棒性.这是因为,如果在理论模型与实际的物理系统之间存在不匹配,或者实际系统是非静态的,或者控制动作的可使用性随时间的变化而变化时,那么鲁棒性就可能成为一个十分重要的问题.主要工作是给出了一组逼近算法和它们的收敛结果.利用广义平均算子来替代最优算子max(或min),对激励学习中的一类最重要的算法——动态规划算法——进行了研究,并讨论了它们的收敛性,目的就是为了提高激励学习算法的鲁棒性.同时使用了更具一般性的风险敏感度性能评价体系,发现基于动态规划的学习算法中的一般结论在这种体系之下并不完全成立.  相似文献   

8.
Solving a semi-Markov decision process (SMDP) using value or policy iteration requires precise knowledge of the probabilistic model and suffers from the curse of dimensionality. To overcome these limitations, we present a reinforcement learning approach where one optimizes the SMDP performance criterion with respect to a family of parameterised policies. We propose an online algorithm that simultaneously estimates the gradient of the performance criterion and optimises it using stochastic approximation. We apply our algorithm to call admission control. Our proposed policy gradient SMDP algorithm and its application to admission control is novel.  相似文献   

9.
Local search methods are widely used to improve the performance of evolutionary computation algorithms in all kinds of domains. Employing advanced and efficient exploration mechanisms becomes crucial in complex and very large (in terms of search space) problems, such as when employing evolutionary algorithms to large-scale data mining tasks. Recently, the GAssist Pittsburgh evolutionary learning system was extended with memetic operators for discrete representations that use information from the supervised learning process to heuristically edit classification rules and rule sets. In this paper we first adapt some of these operators to BioHEL, a different evolutionary learning system applying the iterative learning approach, and afterwards propose versions of these operators designed for continuous attributes and for dealing with noise. The performance of all these operators and their combination is extensively evaluated on a broad range of synthetic large-scale datasets to identify the settings that present the best balance between efficiency and accuracy. Finally, the identified best configurations are compared with other classes of machine learning methods on both synthetic and real-world large-scale datasets and show very competent performance.  相似文献   

10.
The generalization of policies in reinforcement learning is a main issue, both from the theoretical model point of view and for their applicability. However, generalizing from a set of examples or searching for regularities is a problem which has already been intensively studied in machine learning. Thus, existing domains such as Inductive Logic Programming have already been linked with reinforcement learning. Our work uses techniques in which generalizations are constrained by a language bias, in order to regroup similar states. Such generalizations are principally based on the properties of concept lattices. To guide the possible groupings of similar states of the environment, we propose a general algebraic framework, considering the generalization of policies through a partition of the set of states and using a language bias as an a priori knowledge. We give a practical application as an example of our theoretical approach by proposing and experimenting a bottom-up algorithm.  相似文献   

11.
We address an important problem in telecommunications planning: the multiperiod network expansion problem. Our aim is to show that it can be efficiently solved using a new local search approach. To achieve our goal, we first show how to adapt the problem's formulation to meet our local search program's requirements. We then describe GLIT (Generic Local Improvement Template), a new, generic auto-calibrating local search algorithm; the different neighbouring strategies that we designed specifically for this problem; as well as a Genetic algorithm for this problem. We compare and discuss the performance of these algorithms using extensive computational tests on a large range of instances with up to 7500 arcs. These experiments show that GLIT clearly outperforms competitive approaches, especially when it is coupled with Genetic algorithms. We also show that the hybrid algorithms Genetic/Tabu, Genetic/Simulated Annealing, and finally Genetic/GLIT all outperform both pure Genetic and pure local search algorithms.  相似文献   

12.
Several meta-heuristic algorithms, such as evolutionary algorithms (EAs) and genetic algorithms (GAs), have been developed for solving feature selection problems due to their efficiency for searching feature subset spaces in feature selection problems. Recently, hybrid GAs have been proposed to improve the performance of conventional GAs by embedding a local search operation, or sequential forward floating search mutation, into the GA. Existing hybrid algorithms may damage individuals’ genetic information obtained from genetic operations during the local improvement procedure because of a sequential process of the mutation operation and the local improvement operation. Another issue with a local search operation used in the existing hybrid algorithms is its inappropriateness for large-scale problems. Therefore, we propose a novel approach for solving large-sized feature selection problems, namely, an EA with a partial sequential forward floating search mutation (EAwPS). The proposed approach integrates a local search technique, that is, the partial sequential forward floating search mutation into an EA method. Two algorithms, EAwPS-binary representation (EAwPS-BR) for medium-sized problems and EAwPS-integer representation (EAwPS-IR) for large-sized problems, have been developed. The adaptation of a local improvement method into the EA speeds up the search and directs the search into promising solution areas. We compare the performance of the proposed algorithms with other popular meta-heuristic algorithms using the medium- and large-sized data sets. Experimental results demonstrate that the proposed EAwPS extracts better features within reasonable computational times.  相似文献   

13.
In this paper effectiveness of several agent strategy learning algorithms is compared in a new multi-agent Farmer–Pest learning environment. Learning is often utilized by multi-agent systems which can deal with complex problems by means of their decentralized approach. With a number of learning methods available, a need for their comparison arises. This is why we designed and implemented new multi-dimensional Farmer–Pest problem domain, which is suitable for benchmarking learning algorithms. This paper presents comparison results for reinforcement learning (SARSA) and supervised learning (Naïve Bayes, C4.5 and Ripper). These algorithms are tested on configurations with various complexity with not delayed rewards. The results show that algorithm performances depend highly on the environment configuration and various conditions favor different learning algorithms.  相似文献   

14.
In this paper we investigate the use of hyper-heuristic methodologies for predicting DNA sequences. In particular, we utilize Sequencing by Hybridization. We believe that this is the first time that hyper-heuristics have been investigated in this domain. A hyper-heuristic is provided with a set of low-level heuristics and the aim is to decide which heuristic to call at each decision point. We investigate three types of hyper-heuristics. Two of these (simulated annealing and tabu search) draw their inspiration from meta-heuristics. The choice function hyper-heuristic draws its inspiration from reinforcement learning. We utilize two independent sets of low-level heuristics. The first set is based on a previous tabu search method, with the second set being a significant extension to this basic set, including utilizing a different representation and introducing the definition of clusters. The datasets we use comprises two randomly generated datasets and also a publicly available biological dataset. In total, we carried out experiments using 70 different combinations of heuristics, using the three datasets mentioned above and investigating six different hyper-heuristic algorithms. Our results demonstrate the effectiveness of a hyper-heuristic approach to this problem domain. It is necessary to provide a good set of low-level heuristics, which are able to both intensify and diversify the search but this approach has demonstrated very encouraging results on this extremely difficult and important problem domain.  相似文献   

15.
The paper deals with a risk averse dynamic programming problem with infinite horizon. First, the required assumptions are formulated to have the problem well defined. Then the Bellman equation is derived, which may be also seen as a standalone reinforcement learning problem. The fact that the Bellman operator is contraction is proved, guaranteeing convergence of various solution algorithms used for dynamic programming as well as reinforcement learning problems, which we demonstrate on the value iteration and the policy iteration algorithms.  相似文献   

16.
A neural fuzzy control system with structure and parameter learning   总被引:8,自引:0,他引:8  
A general connectionist model, called neural fuzzy control network (NFCN), is proposed for the realization of a fuzzy logic control system. The proposed NFCN is a feedforward multilayered network which integrates the basic elements and functions of a traditional fuzzy logic controller into a connectionist structure which has distributed learning abilities. The NFCN can be constructed from supervised training examples by machine learning techniques, and the connectionist structure can be trained to develop fuzzy logic rules and find membership functions. Associated with the NFCN is a two-phase hybrid learning algorithm which utilizes unsupervised learning schemes for structure learning and the backpropagation learning scheme for parameter learning. By combining both unsupervised and supervised learning schemes, the learning speed converges much faster than the original backpropagation algorithm. The two-phase hybrid learning algorithm requires exact supervised training data for learning. In some real-time applications, exact training data may be expensive or even impossible to obtain. To solve this problem, a reinforcement neural fuzzy control network (RNFCN) is further proposed. The RNFCN is constructed by integrating two NFCNs, one functioning as a fuzzy predictor and the other as a fuzzy controller. By combining a proposed on-line supervised structure-parameter learning technique, the temporal difference prediction method, and the stochastic exploratory algorithm, a reinforcement learning algorithm is proposed, which can construct a RNFCN automatically and dynamically through a reward-penalty signal (i.e., “good” or “bad” signal). Two examples are presented to illustrate the performance and applicability of the proposed models and learning algorithms.  相似文献   

17.
Motivated by the problem of learning a linear regression model whose parameter is a large fixed-rank non-symmetric matrix, we consider the optimization of a smooth cost function defined on the set of fixed-rank matrices. We adopt the geometric framework of optimization on Riemannian quotient manifolds. We study the underlying geometries of several well-known fixed-rank matrix factorizations and then exploit the Riemannian quotient geometry of the search space in the design of a class of gradient descent and trust-region algorithms. The proposed algorithms generalize our previous results on fixed-rank symmetric positive semidefinite matrices, apply to a broad range of applications, scale to high-dimensional problems, and confer a geometric basis to recent contributions on the learning of fixed-rank non-symmetric matrices. We make connections with existing algorithms in the context of low-rank matrix completion and discuss the usefulness of the proposed framework. Numerical experiments suggest that the proposed algorithms compete with state-of-the-art algorithms and that manifold optimization offers an effective and versatile framework for the design of machine learning algorithms that learn a fixed-rank matrix.  相似文献   

18.
This paper introduces the Two-Echelon Production-Routing Problem. This problem is motivated from the petrochemical industry, enlarging the supply chain integration by taking into account production, inventory, and routing decisions in a two-echelon vendor-managed inventory system. We describe, model, and design a branch-and-cut (B&C) to solve the problem under different inventory policies. We also propose a novel exact algorithm, by employing parallel computing techniques, in order to combine local search procedures within a traditional B&C scheme. We evaluate the performance of our methods through extensive computational experiments, both by comparing the algorithms, the effectiveness of the different inventory policies, and the impact of these policies on the partial costs. We derive many managerial insights based on the results. We also validate our new exact algorithm by solving similar problems from the literature, such as the two-echelon multi-depot inventory-routing (2E-MDIRP) and the classical multi-vehicle production-routing problem (MV-PRP). Computational experiments show that our method is very competitive. Based on 512 experiments for the 2E-MDIRP, our algorithm was able to find 111 new best known solutions (BKS), besides proving 412 optimal solutions, against 298 from the literature. For 336 experiments over small and medium size MV-PRP instances, we proved 242 optimal solutions, 11 more than the exact methods from the literature, besides providing 95 new BKS. Moreover, we were the first to tackle large MV-PRP instances exactly, and in this case, our algorithm provides all BKS for instances up to 50 customers, 20 periods and 5 vehicles, outperforming all meta/matheuristics procedures from the literature.  相似文献   

19.
We survey a new approach that the author and his co-workers have developed to formulate stochastic control problems (predominantly queueing systems) asmathematical programming problems. The central idea is to characterize the region of achievable performance in a stochastic control problem, i.e., find linear or nonlinear constraints on the performance vectors that all policies satisfy. We present linear and nonlinear relaxations of the performance space for the following problems: Indexable systems (multiclass single station queues and multiarmed bandit problems), restless bandit problems, polling systems, multiclass queueing and loss networks. These relaxations lead to bounds on the performance of an optimal policy. Using information from the relaxations we construct heuristic nearly optimal policies. The theme in the paper is the thesis that better formulations lead to deeper understanding and better solution methods. Overall the proposed approach for stochastic control problems parallels efforts of the mathematical programming community in the last twenty years to develop sharper formulations (polyhedral combinatorics and more recently nonlinear relaxations) and leads to new insights ranging from a complete characterization and new algorithms for indexable systems to tight lower bounds and nearly optimal algorithms for restless bandit problems, polling systems, multiclass queueing and loss networks.  相似文献   

20.
This paper presents a meta-algorithm for approximating the Pareto optimal set of costly black-box multiobjective optimization problems given a limited number of objective function evaluations. The key idea is to switch among different algorithms during the optimization search based on the predicted performance of each algorithm at the time. Algorithm performance is modeled using a machine learning technique based on the available information. The predicted best algorithm is then selected to run for a limited number of evaluations. The proposed approach is tested on several benchmark problems and the results are compared against those obtained using any one of the candidate algorithms alone.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号