1 Fast Multiagent Learning: Cashing in on Team Knowledge
-
Published:2008
Download citation file:
In large distributed systems, it is often difficult for components to learn behavior that is beneficial to the full system, based on their limited worldview. The key culprit in this process is the mismatch between the traditionally slow learning process of the agents and the relatively rapid changes to the environment. In this paper, we present a theoretical result that significantly improves the learning speed of the agents by allowing the agent to receive rewards based on Actions Not Taken (ANT). This increase in speed is based on the agent receiving a counterfactual reward that estimates the reward an agent would have received had it taken a particular action. We then show results that demonstrate the applicability of this method in the congestion problem known as the El Farol Bar Problem. Furthermore, because these counterfactual rewards are partially based on the actions of the other agents in the system, the improvements become more pronounced as the system size increases.