首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The convergence properties for reinforcement learning approaches, such as temporal differences and Q-learning, have been established under moderate assumptions for discrete state and action spaces. In practice, however, many systems have either continuous action spaces or a large number of discrete elements. This paper presents an approximate dynamic programming approach to reinforcement learning for continuous action set-point regulator problems, which learns near-optimal control policies based on scalar performance measures. The continuous-action space (CAS) algorithm uses derivative-free line search methods to obtain the optimal action in the continuous space. The theoretical convergence properties of the algorithm are presented. Several heuristic stopping criteria are investigated and practical application is illustrated by two example problems—the inverted pendulum balancing problem and the power system stabilization problem.  相似文献   

2.
Abstract

This paper presents an algorithm, named adaptive projected subgradient method that can minimize asymptotically a certain sequence of nonnegative convex functions over a closed convex set in a real Hilbert space. The proposed algorithm is a natural extension of the Polyak's subgradient algorithm, for nonsmooth convex optimization problem with a fixed target value, to the case where the convex objective itself keeps changing in the whole process. The main theorem, showing the strong convergence of the algorithm as well as the asymptotic optimality of the sequence generated by the algorithm, can serve as a unified guiding principle of a wide range of set theoretic adaptive filtering schemes for nonstationary random processes. These include not only the existing adaptive filtering techniques; e.g., NLMS, Projected NLMS, Constrained NLMS, APA, and Adaptive parallel outer projection algorithm etc., but also new techniques; e.g., Adaptive parallel min-max projection algorithm, and their embedded constraint versions. Numerical examples show that the proposed techniques are well-suited for robust adaptive signal processing problems.  相似文献   

3.
Carsten Carstensen  Hella Rabus 《PAMM》2008,8(1):10049-10052
The need to develop reliable and efficient adaptive algorithms using mixed finite element methods arises from various applications in fluid dynamics and computational continuum mechanics. In order to save degrees of freedom, not all but just some selected set of finite element domains are refined and hence the fundamental question of convergence requires a new mathematical argument as well as the question of optimality. We will present a new adaptive algorithm for mixed finite element methods to solve the model Poisson problem, for which optimal convergence can be proved. The a posteriori error control of mixed finite element methods dates back to Alonso (1996) Error estimators for a mixed method. and Carstensen (1997) A posteriori error estimate for the mixed finite element method. The error reduction and convergence for adaptive mixed finite element methods has already been proven by Carstensen and Hoppe (2006) Error Reduction and Convergence for an Adaptive Mixed Finite Element Method, Convergence analysis of an adaptive nonconforming finite element methods. Recently, Chen, Holst and Xu (2008) Convergence and Optimality of Adaptive Mixed Finite Element Methods. presented convergence and optimality for adaptive mixed finite element methods following arguments of Rob Stevenson for the conforming finite element method. Their algorithm reduces oscillations, before applying and a standard adaptive algorithm based on usual error estimation. The proposed algorithm does this in a natural way, by switching between the reduction of either the estimated error or oscillations. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

4.
5.
We study the motion-planning problem for a convex m -gon P in a planar polygonal environment Q bounded by n edges. We give the first algorithm that constructs the entire free configuration space (the three-dimensional space of all free placements of P in Q ) in time that is near-quadratic in mn , which is nearly optimal in the worst case. The algorithm is also conceptually simple. Previous solutions were incomplete, more expensive, or produced only part of the free configuration space. Combining our solution with parametric searching, we obtain an algorithm that finds the largest placement of P in Q in time that is also near-quadratic in mn . In addition, we describe an algorithm that preprocesses the computed free configuration space so that reachability queries can be answered in polylogarithmic time. Received September 9, 1997, and in revised form September 24, 1998.  相似文献   

6.
Abstract Adaptive management requires that predictive models be explicit and transparent to improve decisions by comparing management actions, directing further research and monitoring, and facilitating learning. The rufa subspecies of red knots (Calidris canutus rufa), which has recently exhibited steep population declines, relies on horseshoe crab (Limulus polyphemus) eggs as their primary food source during stopover in Delaware Bay during spring migration. We present a model with two different parameterizations for use in the adaptive management of horseshoe crab harvests in the Delaware Bay that links red knot mass gain, annual survival, and fecundity to horseshoe crab dynamics. The models reflect prevailing hypotheses regarding ecological links between these two species. When reported crab harvest from 1998 to 2008 was applied, projections corresponded to the observed red knot population abundances depending on strengths of the demographic relationship between these species. We compared different simulated horseshoe crab harvest strategies to evaluate whether, given this model, horseshoe crab harvest management can affect red knot conservation and found that restricting harvest can benefit red knot populations. Our model is the first to explicitly and quantitatively link these two species and will be used within an adaptive management framework to manage the Delaware Bay system and learn more about the specific nature of the linkage between the two species.  相似文献   

7.
Games can be easy to construct but difficult to solve due to current methods available for finding the Nash Equilibrium. This issue is one of many that face modern game theorists and those analysts that need to model situations with multiple decision-makers. This paper explores the use of reinforcement learning, a standard artificial intelligence technique, as a means to solve a simple dynamic airline pricing game. Three different reinforcement learning approaches are compared: SARSA, Q-learning and Monte Carlo Learning. The pricing game solution is surprisingly sophisticated given the game's simplicity and this sophistication is reflected in the learning results. The paper also discusses extra analytical benefit obtained from applying reinforcement learning to these types of problems.  相似文献   

8.
We study incidence properties among cosets of infinite loops, with emphasis on well‐structured varieties such as antiautomorphic loops and Bol loops. While cosets in groups are either disjoint or identical, we find that the incidence structure in general loops can be much richer. Every symmetric design, for example, can be realized as a canonical collection of cosets of a infinite loop. We show that in the variety of antiautomorphic loops the poset formed by set inclusion among intersections of left cosets is isomorphic to that formed by right cosets. We present an algorithm that, given a infinite Bol loop S, can in some cases determine whether |S| divides |Q| for all infinite Bol loops Q with S?Q, and even whether there is a selection of left cosets of S that partitions Q. This method results in a positive confirmation of Lagrange's Theorem for Bol loops for a few new cases of subloops. Finally, we show that in a left automorphic Moufang loop Q (in particular, in a commutative Moufang loop Q), two left cosets of S?Qare either disjoint or they intersect in a set whose cardinality equals that of some subloop of S.  相似文献   

9.
Due to the advantage of achieving a better performance under weak regularization, elastic net has attracted wide attention in statistics, machine learning, bioinformatics, and other fields. In particular, a variation of the elastic net, adaptive elastic net (AEN), integrates the adaptive grouping effect. In this paper, we aim to develop a new algorithm: Adaptive Elastic Net with Conditional Mutual Information (AEN-CMI) that further improves AEN by incorporating conditional mutual information into the gene selection process. We apply this new algorithm to screen significant genes for two kinds of cancers: colon cancer and leukemia. Compared with other algorithms including Support Vector Machine, Classic Elastic Net, Adaptive Lasso and Adaptive Elastic Net, the proposed algorithm, AEN-CMI, obtains the best classification performance using the least number of genes.  相似文献   

10.
This paper introduces a novel neuro-fuzzy approach for learning and modeling so-called Multi-Input Multi-Output Coupling (MIMO) systems, i.e., systems where the output variables may depend upon all system's input variables. This strong coupling makes the MIMO systems behavior highly oscillatory in time and, as a consequence, it makes these systems not particularly suitable to be learned and represented by using conventional approaches. In order to address this issue, our proposal presents an adaptive supervised learning algorithm capable of forming a suitable collection of Timed Automata based Fuzzy Systems that model the dynamic behavior of a given MIMO system. The adaptive learning is accomplished by taking advantage of the theories coming from the area of times series analysis (such as the Adaptive Piecewise Constant Approximation method) with a well-known neuro-fuzzy framework of the Adaptive Neuro Fuzzy Inference System (ANFIS). In experiments, where our proposal has been tested on the Fuzz-IEEE 2011 Fuzzy Competition dataset, the proposed supervised learning algorithm significantly reduces the output error measure and achieves better performance than the one provided by a conventional application of the ANFIS algorithm.  相似文献   

11.
 A subsemigroup S of a semigroup Q is an order in Q if for every there exist such that , where a and d are contained in (maximal) subgroups of Q, and and are their inverses in these subgroups. A regular semigroup S is strict if it is a subdirect product of completely (0-)simple semigroups. We construct all orders and involutions in Auinger’s model of a strict regular semigroup. This is used to find necessary and sufficient conditions on an involution on an order S in a strict regular semigroup Q for extendibility to an involution on Q.  相似文献   

12.
 A subsemigroup S of a semigroup Q is an order in Q if for every there exist such that , where a and d are contained in (maximal) subgroups of Q, and and are their inverses in these subgroups. A regular semigroup S is strict if it is a subdirect product of completely (0-)simple semigroups. We construct all orders and involutions in Auinger’s model of a strict regular semigroup. This is used to find necessary and sufficient conditions on an involution on an order S in a strict regular semigroup Q for extendibility to an involution on Q. (Received 27 April 1999; in revised form 20 October 1999)  相似文献   

13.
Summary. Adaptive methods of approximation arise in many settings including numerical methods for PDEs and image processing. They can usually be described by a tree which records the adaptive decisions. This paper is concerned with the fast computation of near optimal trees based on n adaptive decisions. The best tree based on n adaptive decisions could be found by examining all such possibilities. However, this is exponential in n and could be numerically prohibitive. The main result of this paper is to show that it is possible to find near optimal trees using computations linear in n.Mathematics Subject Classification (2000): 65Y20, 65N50, 41A63, 41A15, 68W40, 68W25This work has been supported in part by the Office of Naval Research contracts 03-10051, (N00014-00-1-0470), the Army Research Office contract DAAD 19-02-1-0028, and the National Science Foundation grants DMS 9872890, DMS 0221642.  相似文献   

14.
We present an adaptive wavelet method for the numerical solution of elliptic operator equations with nonlinear terms. This method is developed based on tree approximations for the solution of the equations and adaptive fast reconstruction of nonlinear functionals of wavelet expansions. We introduce a constructive greedy scheme for the construction of such tree approximations. Adaptive strategies of both continuous and discrete versions are proposed. We prove that these adaptive methods generate approximate solutions with optimal order in both of convergence and computational complexity when the solutions have certain degree of Besov regularity.  相似文献   

15.
The unquantified set theory MLSR containing the symbols ∪, ∖, ≠, ∈,R (R(x) is interpreted as a rank ofx) is considered. It is proved that there exists an algorithm which for any formulaQ of the MLSR theory decides whetherQ is true or not using the spacec|Q|3 (|Q| is the length ofQ).  相似文献   

16.
We are given n coins of which k are heavy (defective), while the remaining nk are light (good). We know both the weight of the good coins and the weight of the defective ones. Therefore, if we weigh a subset Q ? S with a spring scale, then the outcome will tell us exactly the number of defectives contained in Q. The problem, known as Counterfeit Coins problem, is to identify the set of defective coins by minimizing the number of weighings, also called queries. It is well known that Θ(klog k +1(n/k)) queries are enough, even for non‐adaptive algorithms, in case kcn for some constant 0 < c < 1. A natural interesting generalization arises when we are required to identify any subset of mk defectives. We show that while for randomized algorithms \begin{align*}\tilde{\Theta}(m)\end{align*} queries are sufficient, the deterministic non‐adaptive counterpart still requires Θ(klog k +1(n/k)) queries, in case kn/28; therefore, finding any subset of defectives is not easier than finding all of them by a non‐adaptive deterministic algorithm. © 2012 Wiley Periodicals, Inc. Random Struct. Alg., 2012  相似文献   

17.
In this paper we propose a discrete algorithm for a tracking control of a two-wheeled mobile robot (WMR), using an advanced Adaptive Critic Design (ACD). We used Dual-Heuristic Programming (DHP) algorithm, that consists of two parametric structures implemented as Neural Networks (NNs): an actor and a critic, both realized in a form of Random Vector Functional Link (RVFL) NNs. In the proposed algorithm the control system consists of the DHP adaptive critic, a PD controller and a supervisory term, derived from the Lyapunov stability theorem. The supervisory term guaranties a stable realization of a tracking movement in a learning phase of the adaptive critic structure and robustness in face of disturbances. The discrete tracking control algorithm works online, uses the WMR model for a state prediction and does not require a preliminary learning. Verification has been conducted to illustrate the performance of the proposed control algorithm, by a series of experiments on the WMR Pioneer 2-DX.  相似文献   

18.
Mehrotra's predictor-corrector algorithm [3] is currently considered to be one of the most practically efficient interior-point methods for linear programming. Recently, Zhang and Zhang [18] studied the global convergence properties of the Mehrotra-type predictor-corrector approach and established polynomial complexity bounds for two interior-point algorithms that use the Mehrotra predictor-corrector approach. In this paper, we study the asymptotic convergence rate for the Mehrotra-type predictor-corrector interior-point algorithms. In particular, we construct an infeasible-interior-point algorithm based on the second algorithm proposed in [18] and show that while retaining a complexity bound ofO(n 1.5 t)-iterations, under certain conditions the algorithm also possesses aQ-subquadratic convergence, i.e., a convergence ofQ-order 2 with an unboundedQ-factor.Research supported in part by NSF DMS-9102761 and DOE DE-FG02-93ER25171.  相似文献   

19.
Let p be an odd rational prime and K a finite extension of \Bbb Qp {\Bbb Q}_p . We give a complete classification of those finite abelian extensions L/K L/K in which any ideal of the valuation ring of L is free over its associated order in \Bbb Qp[Gal(L/K)] {\Bbb Q}_p[Gal(L/K)] . In an appendix W. Bley describes an algorithm which can be used to determine the structure of Galois stable ideals in abelian extensions of number fields. The algorithm is applied to give several new and interesting examples.  相似文献   

20.
Most researchers established their inventory lot-size models under trade credit financing by assuming that the supplier offers the retailer fully permissible delay in payments and the products received are all non-defective. However, in the real business environment, it often can be observed that the supplier offers the retailer a fully permissible delay in payments only when the order quantity is greater than or equal to the predetermined quantity Q d . In addition, an arriving order lot usually contains some defective items due to imperfect production processes or other factors. To capture this reality, the paper extends Huang (2007) economic order quantity (EOQ) model with partially permissible delay in payments to consider defective items. We formulate the proposed problem as a profit maximization EOQ model in which the replenishment cycle time is the decision variable. Then we use the arithmetic-geometric mean inequality approach to determine the optimal solution under various situations. An algorithm to obtain the optimal solution is also provided. Finally, the numerical examples and sensitivity analysis are given to illustrate the results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号