“Today, functional robots are restricted to carefully engineered and controlled environments like factories and warehouses,” says Yaser Sheikh, an assistant research professor of robotics at Carnegie Mellon University in Pittsburgh.
“For them to provide useful service in our everyday environment, they have to be able to handle unpredictable and changing social spaces.”
If robots are ever to become faithful human companions, capable of interacting directly with users to dutifully perform a variety of different tasks, engineers will need to develop better robot training technologies.
Programming: robot instruction is typically accomplished by code that is either hard-wired into the system or loaded in as needed. Hard-wired or loaded in, both approaches are acceptable for a robot that’s designed to handle only basic, rote-type tasks.
However, when configuring a robot that’s capable of handling numerous complex tasks, particularly work that needs to be modified on the fly, a closer, more personal type of approach is necessary. Training.
Training: emerging as the key to a robot?s better interaction with humans and human environments.
Typical application needs:
- Factory and assembly-line robots that work alongside humans
- Self-driving cars that anticipate what drivers might do on the road
- Personal robots that assist people in health care facilities
- Surveillance droids that watch targets
- Telepresence robots that can automatically figure out how not to move in awkward positions
Visual cues when expecting the unexpected
Sheikh is one of many robotics researchers worldwide working on technologies and approaches designed to make robots capable of autonomous actions in a variety of different situations.
Like many robotic engineers, he feels that visual cues could someday help a robot understand what its human operator needs at a particular moment in much the same way that people can tell whether someone is being serious or jovial, relaxed or urgent.
Sheikh is participating in a research project at Carnegie Mellon’s Robotics Institute that aims to perfect a method for detecting where people’s gazes intersect in the hope that such visual clues could someday be observed and interpreted by robots.
The researchers have tested their method using groups of people equipped with head-mounted video cameras. By noting where the subjects’ gazes converged in a three-dimensional space, the researchers were able to determine if they were listening to a single speaker, interacting as a group or even following the bouncing ball in a ping-pong game.
“It is timely to investigate this research because social cameras have proliferated widely?such as in Smartphones, camcorders or wearable cameras?to the extent that many and, soon, most social events of interest will be captured by one or more such cameras,” Sheikh says.
The system uses crowdsourcing technology to provide subjective information about social groups that would otherwise be difficult or impossible for a robot to ascertain.
The researchers’ algorithm for determining “social saliency” could eventually be used to evaluate a variety of social cues, such as the expressions on people’s faces or body movements, or data collected from other types of visual or audio sensors.
“In the future, robots will need to interact organically with people, and to do so they must understand their social environment, not just their physical environment,” Sheikh says. He notes that head-mounted cameras might someday be used routinely by people who work with robots in cooperative teams.
The research has already been tested in three real-world settings: a meeting involving two work groups, a musical performance and a party in which participants played pool and ping-pong and chatted in small groups.
Sheikh notes that the head-mounted cameras provided precise data about what people were looking at in each of the social settings.
Algorithm counts “gaze concurrences”
The algorithm developed by the research team was able to automatically estimate the number and 3D position of “gaze concurrences”?positions where the gazes of multiple people intersected.
The researchers were surprised by the level of detail they were able to detect. In the party setting, for instance, the algorithm didn’t just indicate that people were looking at the ping-pong table; the gaze concurrence video actually shows the flight of the ball as it bounces and is batted back and forth.
The researchers are currently developing a control system based on their research. “Whether we commercialize it or not depends on how useful it turns out to be,” Sheikh says.
“We’re looking into applications where a virtual director automatically coordinates a system of robotic cameras based on the gaze behavior of the audience of a sports game or a concert.”
A Helping Claw
Cornell University researchers in Ithaca, N.Y., are also exploring ways of enabling a robot to detect and address human needs. Understanding when and where to pour a beer, for example, can be difficult for a robot because of the many variables it encounters while assessing the situation.
Using a Microsoft Kinect 3D camera and a database of 3D videos, the Cornell researchers have developed a robot that can quickly identify specific activities it sees, consider what uses are possible with the objects in the immediate area and then determine how those uses can fit with various activities.
The robot then generates a set of possible actions it may have to make or respond to the future?such as eating, drinking, cleaning, putting items away?and then selects the most probable task. As the action continues, the robot constantly updates and refines its predictions.
“The purpose of this project is to enable robots to anticipate future human activities so that they can perform better assistive tasks with reactive responses,” says graduate student Hema Koppula, a researcher in Cornell’s Personal Robotics Lab.
When an ordinary robot is instructed to refill a person’s cup, for example, it plans its movements in advance and follows them.
But if a person sitting at the table happens to raise the cup to drink from it, the robot might end up spilling the drink.
“Having the ability to anticipate the person’s future actions can help the robot avoid making such mistakes,” Koppula notes. “People anticipate other’s actions all the time when interacting.”
The Cornell technology aims to bring anticipation capabilities to a wide range of robots.
Yet achieving such a seemingly ordinary capability isn’t easy for a robot, since anticipating and responding to human behavior involves so many different variables. The Cornell robot essentially builds a “vocabulary” of small actions it can put together in various ways to recognize a variety of big activities.
In tests, the robot made correct predictions 82 percent of the time when looking one second into the future, 71 percent correct for three seconds and 57 percent correct for 10 seconds.
An upgraded model of the Microsoft Kinect could even further improve the robot’s capabilities, Koppula says. The project has received funding from both Microsoft and the U.S. Army.
“This technology can be used anywhere where humans and robots are together,” Koppula says. “Our technology for understanding and anticipating human actions will allow robots to work well with humans.”