Presented by:

MIT Researchers Put Voice Commands in Context

MIT CSAIL is working to help robots interact better with humans.

February 01, 2018      
Geoffrey Oldmixon

Imagine there are two boxes on a table — a box of saltine crackers and a box of granulated sugar. You inform robot with voice commands about the contents of each box and then direct it, “Pick up the snack.” Deducing that sugar is a raw material and, therefore, unlikely to be someone’s snack, the robot instead selects the crackers.

That scenario is being developed by researchers at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (CSAIL), as they look at robot memory in a new way.

In the academic paper, “Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context,” CSAIL authors Rohan Paul, Andrei Barbu, Sue Felshin, Boris Katz, and Nicholas Roy last year presented an Amazon Alexa-like system dubbed “ComText” (for “commands in context”). ComText is intended to enable robots to understand a wide range of voice commands requiring contextual knowledge about objects and their environments.

“Where humans understand the world as a collection of objects and people and abstract concepts, machines view it as pixels, point-clouds and 3-D maps generated from sensors,” explained Paul, one of the lead authors of the paper. “This semantic gap means that, for robots to understand what we want them to do, they need a much richer representation of what we do and say.”

Declarative memory — the recall of concepts, facts, dates, etc. — includes semantic memory (general facts) and episodic memory (personal facts). Most approaches to robot learning have focused only on semantic memory.

ComText is designed to observe a range of visuals and natural language to glean “episodic memory” about an object’s size, shape, position, type, and even whether it belongs to someone. From this knowledge base, it can then reason, infer meaning, and respond to commands.

“The main contribution is this idea that robots should have different kinds of memory, just like people,” said research scientist Barbu. “We have the first mathematical formulation to address this issue, and we’re exploring how these two types of memory play and work off of each other.”

“This work is a nice step towards building robots that can interact much more naturally with people,” said Luke Zettlemoyer, an associate professor of computer science at the University of Washington not involved in the research. “In particular, it will help robots better understand the names that are used to identify objects in the world, and interpret instructions that use those names to better do what users ask.”

More on Human-Machine Interaction: