Project Natal grows up
When Microsoft released a video demonstration of its then code-named Project Natal, the company’s gesture-based interface to the widely popular Xbox 360 console, the presentation focused primarily on innovative menu interfaces within the Xbox’s media center. Shortly thereafter, the technology community began to speculate on the revolutionary gaming interfaces Kinect could offer.
Since the release of the Kinect, however, it has become clear that Microsoft’s newest hardware, designed to change the state of the art of gaming and media, could also revolutionize another industry: robotics.
Kinect went on sale in November 2010. Within the first two months, more than 8 million units were sold. Only a subset of these were bundled with Xbox 360 consoles; the remainder were sold as standalone units. The vast majority of those $150 standalone units have gone to existing Xbox 360 owners to augment their systems, as intended. But many have also gone to developers excited to use the Kinect’s powerful sensors to interact with hardware systems-something Microsoft may never have expected.
Kinect is a single set-top unit that combines an RGB (red, green, blue) visible spectrum camera, an infrared (IR) spectrum 3D camera, and a multiarray microphone. The unit itself is placed on a tilt stand to allow it to automatically scan a room to capture an optimal view of the persons interacting with it. Use of Kinect is entirely movement based and, unlike game controllers such as the Nintendo Wiimote, no handheld hardware is required. A user wishing to see the next page of a menu interface on the Xbox may gesture from right to left, akin to turning a page, and the Kinect will recognize the gesture. Game such as Dance Central use Kinect to track the movements of a player’s entire body and match them against required dance movements that determine a player’s score.
The 3D camera used in Kinect was developed by an Israeli company called PrimeSense Ltd. The PrimeSense camera can determine varying depths of up to 1-inch resolution at up to 7 feet, a distance that is ideal for detecting the gestures and full-body movements of the person using it. At a 3-foot distance, the camera can distinguish facial expressions. Unlike many cameras used for intelligent vision systems, Kinect’s cameras are designed so as not to be affected by varying lighting conditions in a room-meaning that Kinect will work equally well in dark rooms, artificially lit rooms, and sunlit rooms. This is an unusual capability in consumer camera hardware and a key capability for machine-vision systems intended to operate in uncontrolled environments.
Kinect’s $150 retail price reflects a concerted design effort on the part of Microsoft to make use of commercial off-the-shelf hardware as a way of holding down the price. The low-cost PrimeSense camera is the most enabling component with respect to price; infrared 3D cameras have run into the thousands of dollars per unit in past years. While the PrimeSense camera may not have the resolution of those higher-end 3D cameras, it is sufficient for many nonprecision applications-gaming, of course, and now robotics.
How Kinect Can Be Used in Robotics
Kinect’s low price, simple mechanical and electrical packaging, and interface quickly attracted hobbyists and researchers looking for interesting ways to interact with robots, and interesting ways for robots to interact with their environments. Robotics developers have already demonstrated platforms that use Kinect for navigation, control, and interfacing with operators.
One of the most compelling uses of Kinect is for terrain mapping and obstacle avoidance. All autonomous (or even semi-autonomous) mobile platforms must have some way of detecting features in their environment so they may safely move around within it. Kinect’s 3D camera can generate a point cloud-similar to that generated by arrays of laser-based LIDAR systems-that may be used to create a map of an environment, including such things as walls, doorways, and desks in an office. The 3D camera can also detect features of both indoor and outdoor terrain, such as stairs, boulders, and inclines. A robot can then use this information to determine a clear path to follow. Additionally, Kinect can be used for obstacle detection, both to avoid existing obstacles, such as furniture, or those that suddenly appear, such as a person suddenly stepping in the path of a robot.
Another of Kinect’s gaming capabilities, gesture control, is also applicable to robotics. A person might use Kinect to control a mobile robot platform, using a standardized set of gestures to instruct it to “stop,” “go forward,” “turn left,” and so forth. In many ways, this is the next iteration of the wireless supervisory control pioneered by developers using the Nintendo Wiimote to achieve a similar control interface. But significantly, Kinect enables more complex interactions than the Wiimote. For example, a humanoid robot might be trained to follow a certain pattern of movements by using Kinect to watch the operator perform the sequence first. Kinect would enable the robot to recognize the person’s limbs, head, torso, and other objects that are perhaps held by the person, such as a pointer-something that is currently not possible with the Wiimote and other remote controls.
Yet another potential robotics application is facial-expression recognition. Kinect is designed to differentiate (though not necessarily interpret) expressions out of the box. If programmers can access that capability and expand it with recognition of emotions, coupled with an “emotional” response, it could result in robots that are more social and thus better able to interact with their human operators.
Who Is Using Kinect?
Kinect’s low cost and ready accessibility make it an exciting tool for hobbyists and researchers. Though many individuals have already begun exploiting Kinect’s capabilities and relative ease of integration with applications, several high-profile efforts stand out. Willow Garage, the Menlo Park, Calif.-based developers of ROS (Robot Operating System) and the PR-2 mobile robot, is one example. Willow Garage developed the “libfreenect” library to support Kinect integration on ROS. The company then sponsored a January “ROS 3D” contest for the best Kinect application using a system running ROS. One winning team developed a visual SLAM (simultaneous localization and mapping) algorithm using Kinect’s 3D and color cameras. The group’s algorithm can be easily reproduced and used by other development teams to design novel mobile platform navigation applications. Another effort used Kinect as a method of teleoperating a humanoid robot. The team’s effort enabled the robot to imitate the gestures and movements of the operator.
At UC Berkeley, as part of the ROS 3D contest, researchers in the Hybrid Systems Lab developed a quadrotor aerial robot that utilizes Kinect’s IR camera for obstacle avoidance. Other academics have experimented with similar capabilities. Still others are finding practical applications with immediate impact, such as search-and-rescue operations for disasters like the February earthquake in Christchurch, New Zealand. For this scenario, a team from the University of Warwick in the United Kingdom designed a robot that uses Kinect as a 3D mapping system to record the layout of a collapsed building and aid rescue workers in determining safe paths for recovering people trapped under rubble.
Developers have found a wide range of applications for Kinect, pointing to myriad opportunities for a similar, commercialized sensor system in the greater robotics industry. But to move this sensor and these applications from the laboratory to the field, one of two things must happen. In one scenario, robotics professionals hoping to take advantage of Kinect’s capabilities could work directly with the sensor OEMs-such as PrimeSense, for an IR camera-and develop their own software that works similarly to Kinect’s interface. This would push Microsoft out of the loop, providing more flexibility for those hoping to develop similar interfaces, but increasing the development time and cost of proprietary systems.
The alternative is for Microsoft to be involved in the process; that is, to officially legitimize the use of Kinect on platforms beyond the Xbox 360 and thus take full advantage of the system’s popularity. Besides the obvious financial incentive to Microsoft, this scenario would allow the industry to move forward more quickly with what could be game-changing technology.
Just days after Kinect’s release in November 2010, the open-source developer group Adafruit Industries announced it would award a $2,000 prize to whoever provided an open-source driver for the Kinect. Microsoft’s initial reaction was negative. The company stated that it “did not condone” any modifications to the Kinect system. But it seems that this stemmed from confusion about the nature of the hacking efforts. Rather than modify Kinect itself in a way that could potentially interfere with Xbox 360 game play-a valid concern for Microsoft, in hoping to maintain fairness in the online gaming community-developers were only interested in reverse engineering the interface in order to use Kinect for nongaming applications. Once this was understood, Microsoft quickly changed course and released a statement in support of developers interested in using Kinect for new applications.
The original developer effort primarily involved understanding the types of messages sent over the USB connection to Kinect. Using built-in tools in Mac OS and Linux operating systems (or with a bit more effort, in Windows) and hardware analyzers to “sniff” the USB communication lines, people were able to discover what messages were sent to the motor, cameras, and microphones to control them, and what type of data was being sent back to the host computer. Through some trial-and-error experimentation, it was possible to determine the nature of each command and how the system responded to it.
Once this low-level information is understood, developers can write higher-level command libraries that are more accessible to others wishing to work with the Kinect. For example, the set of hexadecimal commands that tell the microphone to listen for input, and the similarly formatted reply messages that contain the data from the microphone, can be extracted to more intuitive commands and data formats that an average C++ programmer is familiar with. These libraries are used by other developers to implement higher-level controllers. In the case of the microphone, perhaps it is a controller that moves a motor mount toward a loud sound in front of the Kinect’s stereo microphone array. This controller can then be used by, perhaps, a roboticist developing a robot designed to interact face-to-face with humans and turn to look at a human when it hears voices speaking.
Indeed, Kinect developers are making contributions at each of these levels, and videos of robot applications found on YouTube and other sites demonstrate how developers have taken advantage of many of these tools. Kinect application development still requires a good deal of programming knowledge, but some steps are being taken to enable better-supported and more widespread use.
In fact, the broad popularity of Kinect among the developer community recently inspired an announcement from Microsoft that it plans to release an official Kinect SDK (software development kit) for use by hobbyists and academics in spring 2011. This first SDK license will not cover commercial use, though Microsoft indicated in its announcement that it does plan to release a commercial SDK later in the year.
Microsoft’s willingness to work with developers is exciting and advantageous for the industry, though it does introduce some challenges. Microsoft’s SDK will only support Windows 7, for example, a potential hurdle to academics and hobbyists that primarily use Linux-based systems that have better transparency to their hardware commands. It also ties users to Microsoft’s development and release cycle, as well as pricing, which could be enabling for some commercial applications but limiting for others.
Future Uses of Kinect
Kinect’s true impact on the robotics industry is yet to be seen. But it’s easy to predict some of the earliest effects. From a system standpoint, Kinect has some natural niches in commercial applications, especially in telepresence. Telepresence, one of the biggest up-and-coming robotics industries, is currently dominated by robots in the $10,000 to $30,000 price range, a price point due in large part to the expensive navigation and obstacle avoidance sensors now used on their platforms. Working as a standalone unit, Kinect could potentially replace the majority of that functionality in addition to providing the RGB camera and microphone already used on telepresence robots. This would quickly drive down the cost of telepresence units.
Kinect may also find its way onto more and more systems used in harsh environments, such as the University of Warwick search-and-rescue robot. Compared with the cost of traditional terrain mapping and obstacle avoidance sensors, the low-cost Kinect-combined with an open-source package such as OpenSLAM-would make such a robot nearly disposable and more likely to be accepted by those industries.
But it is not just the use of the Kinect itself that will be so critical to the industry. Kinect’s success will drive down the costs of components like the PrimeSense 3D camera. Other vendors will likely follow, making 3D camera technology more compelling for certain navigation applications compared with expensive alternatives like LIDAR.
The novel hardware and Microsoft’s software will continue to work in concert with pre-existing open-source libraries that many robotics developers already use in machine vision, navigation, and object-recognition applications. Even though the official Microsoft SDK will be key to robust use of Kinect, the open-source community likely will continue to have a strong impact on Kinect’s utility in robotics.
Kinect, and Microsoft along with it, are primed to play an important role in computer vision, mobility, and human-robot interaction in the robotics industry over the next several years. Microsoft has made an important decision in agreeing to support the developer community. Though Microsoft considers its Robotics Developer Studio to be its primary inroad to the robotics field, it may in fact be Kinect that truly helps the company make its mark in the industry. From academic research labs to hobbyists’ garages to commercial manufacturers, Kinect’s innovative, impressive, inexpensive technology will have robots interacting, navigating, and sensing in ways that were not possible before.
The Bottom Line
Kinect’s impressive functionality as a control interface has led developers to envision its many potential applications within the robotics industry. Research efforts already have demonstrated how it may be used as an aid to navigation and as a way for humanoid robots to learn capabilities by mimicking the movements of a human instructor, captured by the device’s infrared and color camera.
Many more applications will become possible in the months ahead when Microsoft releases a noncommercial and then a commercial software development kit. Applications such as gesture and facial recognition are certain to become more sophisticated over time. And the system’s low cost could make it the replacement device of choice for robots whose sales are currently limited by the high price of sensors. Widespread use of Kinect could represent a ground-breaking development in robotics, perhaps as significant as the mouse and touchpad technology were to personal computers.
See related: Kinect Hackers Are Changing the Future of Robotics