Presented by:

January 30, 2019      

Researchers at the Massachusetts Institute of Technology have created a program that can train a robotic arm to play Jenga, which requires physical interaction but also requires perception data from both touch and vision.

An article on the experiment appears in today’s issue of the journal Science Robotics titled “See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion.” Nima Fazelli and colleagues built a robotic arm gripper that emulates the multisensory action needed to play Jenga, a game in which blocks must be removed from a tower and then placed on top without collapsing the tower.

Robot Jenga MIT experiment

A robotic arm learns how to play Jenga. Source: Nima Fazeli

Researchers were looking to explore how robots can learn about their environment by moving beyond just using machine vision. “Current learning methodologies struggle with these challenges and have not exploited physics nearly as richly as we believe that humans do,” the researchers said. “Most robotic learning systems still use purely visual data, without a sense of touch; this fundamentally limits how quickly and flexibly a robot can learn about the world … As a consequence, these systems require far more training data than humans do to learn new models or new tasks, and they generalize much less broadly and less robustly.”

Jenga dynamics include touch and sight

The team decided to use the game Jenga to evaluate its hierarchial learning approach to acquire manipulation skills for its robot, and compared them to other learning approaches. “Jenga is a quintessential example of a contact-rich task where we need to integrate with the tower to learn and to infer block mechanics and multimodal behavior by combining touch and sight.”

Researchers used the robot arm, an Intel RealSense D415 camera, and ATI Gamma force/torque sensor mounted at the wrist, for the experiment. Machine learning software was then created for the learning approach.

Initially, without a goal, the robot engaged in a “short exploration phase where it captured the structure of the tower, even ‘hidden’ Jenga pieces, and learned potential outcomes based on the force and visual relationships between all the Jenga pieces, researchers said.

The robot then formulated general concepts to infer behavior of each of the blocks as it played. “For example, it inferred the concept that moving ‘stuck’ blocks does not help progress the game.” The robot could then adjust its behavior to maximize its progress in the game.

The researchers concluded that compared to the other three state-of-the-art learning paradigms, including neural network and reinforcement learning, that the robot was fastest at reaching a certain number of successful block extractions (within 100 games) using this new approach.

More details on the project can be seen here.