Interaction dynamics of two reinforcement learners |
| |
Authors: | Walter J. Gutjahr |
| |
Affiliation: | (1) Dept. of Statistics and Decision Support Systems, University of Vienna, Universitaetsstrasse 5/9, A-1010 Wien, Austria |
| |
Abstract: | The paper investigates a stochastic model where two agents (persons, companies, institutions, states, software agents or other) learn interactive behavior in a series of alternating moves. Each agent is assumed to perform “stimulus-response-consequence” learning, as studied in psychology. In the presented model, the response of one agent to the other agent's move is both the stimulus for the other agent's next move and part of the consequence for the other agent's previous move. After deriving general properties of the model, especially concerning convergence to limit cycles, we concentrate on an asymptotic case where the learning rate tends to zero (“slow learning”). In this case, the dynamics can be described by a system of deterministic differential equations. For reward structures derived from [2×2] bimatrix games, fixed points are determined, and for the special case of the prisoner's dilemma, the dynamics is analyzed in more detail on the assumptions that both agents start with the same or with different reaction probabilities. |
| |
Keywords: | Dynamic systems interaction dynamics multiagent systems prisoner's dilemma reinforcement learning |
本文献已被 SpringerLink 等数据库收录! |
|