Researchers from NVIDIA’s newly opened robotics research lab in Seattle will present a new proof of concept approach for reinforcement learning that aims to enhance how robots trained in simulation will perform in the real world. The work will be presented at this week’s International Conference on Robotics and Automation (ICRA) in Montreal.
NVIDIA said the research is part of a growing trend in the deep learning and robotics community that relies on simulation for training. Because such methods are virtual, there is no risk for damage or injury, allowing the robot to train potentially an unlimited number of times before it gets deployed to the real world.
“In robotics, you generally want to train things in simulation because you can cover a wide spectrum of scenarios that are difficult to get data for in the real world,” said Ankur Handa, one of the lead researchers on the project. “The idea behind this work is to train the robot to do something in the simulator that would be tedious, and time-consuming in real life.”
Closing the reality gap
One of the challenges, however, is the discrepancy between what is in the real world and what is in the simulator, Handa said.
“Due to the imprecise simulation modules and lack of high fidelity replication of real-world scenes, policies learned in simulations often cannot be directly applied on real-world systems, a phenomenon that is also known as the reality gap,” the researchers state in their paper. “In this work, we focus on closing the reality gab by learning policies on distributions of simulated scenarios that are optimized for a better policy transfer.”
Using a cluster of 64 NVIDIA Tesla V100 GPUs, with the cuDNN-accelerated TensorFlow deep learning framework, NVIDIA researchers trained a robot to perform two tasks – placing a peg in a hole, and opening a drawer.
For the simulation, the team used the NVIDIA FleX physics engine to simulate and develop the SimOpt algorithm, described in its research work. For both tasks, the robot learns from more than 9,600 simulations each, for about 1.5 to 2 hours, allowing it to swing a peg into a hole and open the drawer accurately.
“Closing the simulation to reality transfer loop is an important component for a robust transfer of robotic policies,” the researchers stated. “In this work, we demonstrated that adapting simulation randomization using real world data can help in learning simulation parameter distributions that are particularly suited for a successful policy transfer without the need for exact replication of the real world environment.”