Google Brain and DeepMind researchers attack reinforcement learning efficiency

Reinforcement learning, which spurs AI to complete goals using rewards or punishments, is a form of training that’s led to gains in robotics, speech synthesis, and more. Unfortunately, it’s data-intensive, which motivated research teams  one from Google Brain (one of Google’s AI research divisions) and the other from Alphabet’s DeepMind  to prototype more efficient means of executing it. In a pair of preprint papers, the researchers propose Adaptive Behavior Policy Sharing (ABPS), an algorithm that allows the sharing of experience adaptively selected from a pool of AI agents, and a framework  Universal Value Function Approximators (UVFA)  that simultaneously learns directed exploration policies with the same AI, with different trade-offs between exploration and exploitation.

The teams claim ABPS achieves superior performance in several Atari games, reducing variance on top agents by 25%. As for UVFA, it doubles the performance of base agents in “hard exploration” in many of the same games while maintaining a high score across the remaining games; it’s the first algorithm to achieve a high score in Pitfall without human demonstrations or hand-crafted features.

Read More