Reinforcement learning, which spurs AI to complete goals using rewards or punishments, is a form of training that’s led to gains in robotics, speech synthesis, and more. Unfortunately, it’s data-intensive, which motivated research teams one from Google Brain (one of Google’s AI research divisions) and the other from Alphabet’s DeepMind to prototype more efficient means of executing it. In a pair of preprint papers, the researchers propose Adaptive Behavior Policy Sharing (ABPS), an algorithm that allows the sharing of experience adaptively selected from a pool of AI agents, and a framework Universal Value Function Approximators (UVFA) that simultaneously learns directed exploration policies with the same AI, with different trade-offs between exploration and exploitation.
The teams claim ABPS achieves superior performance in several Atari games, reducing variance on top agents by 25%. As for UVFA, it doubles the performance of base agents in “hard exploration” in many of the same games while maintaining a high score across the remaining games; it’s the first algorithm to achieve a high score in Pitfall without human demonstrations or hand-crafted features.