When speaking of robots, people tend to imagine a wide range of different machines: Pepper, a social robot from SoftBank; Atlas, the Boston Dynamics humanoid robot that can now do backflips; or in movies and television, the cyborg assassin from the Terminator films or the lifelike figures from Westworld.

HSBC Bank welcomes SoftBank Robotics’ humanoid robot Pepper to its Fifth Avenue branch in New York. (Credit: Mark Von Holden/AP Images for HSBC)
People not familiar with the robotics industry tend to hold polarized views. Either they have unrealistically high expectations of robots with human-level intelligence, or they underestimate the potential of new research and technologies.
Over the past year, I have been asked by friends in the venture capital and startup scene about what’s “actually” going on in deep reinforcement learning and robotics. How are AI-enabled robots different from traditional ones? Do they have the potential to revolutionize various industries? What can and cannot it do now?
These questions tell me how surprisingly challenging it can be to understand the current technological progress and industry landscape, let alone make predictions for the future. This article is a humble attempt to demystify AI, in particular, and deep reinforcement learning-enabled robotics, something that we hear a lot about but understand so superficially or not at all.
Our first question: What are AI-enabled robots and what makes them unique?
Robot evolution — from automation to autonomy
“Machine learning addresses a class of questions that were previously ‘hard for computers and easy for people,’ or, perhaps more usefully, ‘hard for people to describe to computers.” — Benedict Evans, the a16z podcast.
The most important difference that AI brings to robotics is enabling a move away from automation (hard-programmed) to true autonomy (self-directed). You don’t really see the difference if the robot only does one thing. But when the robot needs to handle several tasks, or respond to humans or changes in the environment, it needs certain levels of autonomy.
To explain the evolution of robots, we will borrow definitions from the autonomous car space. For the purposes outlined below, we will use this definition of a robot: “programmable machines capable of carrying out complex actions automatically.”
Level 0 – No automation
People operate machines, and no robots are involved.
Level 1 – Driver assistance (or single automated automation for robotics)
A single function or task is automated, but the robot does not necessarily use information about the environment. This is how robots are used traditionally in automotive or manufacturing industries. Robots are programmed to repeatedly perform specific tasks with high precision and speed. Until recently, most robots in the field have not been able to sense or adapt to changes in the environment.
Level 2 – Partial automation
A machine assists with certain functions, using sensory input from the environment to make decisions. For example, robots can identify and handle different objects with a vision sensor. However, traditional computer vision requires pre-registration and clear instructions for each object. Robots lack the ability to deal with changes, surprises, or new objects.
Level 3 – Conditional autonomy
The machine controls all monitoring of the environment, but it still requires a human’s attention and (instant) intervention for unpredictable events.
Level 4 – High autonomy
The machine is fully autonomous in certain situations or defined areas.
Level 5 – Complete autonomy
A machine is fully autonomous in all situations.
Where are we now in terms of autonomy level?
Today, most robots used in factories are open-looped, or non-feedback controlled, which means their actions are independent of sensor feedback (Level 1).
Few robots in the field take and act based on sensor feedback (Level 2). A collaborative robot, or cobot, is designed to be more versatile and able to work with humans; however, the trade-off is less power and lower speeds, especially when compared to industrial robots. Although the cobot is relatively easier to program, it’s not necessarily autonomous. Human workers often need to handhold a cobot every time there’s any change in the task or environment.
We’ve started to spot pilot projects with AI-enabled robots, indicating a Level 3 or 4 autonomy. Warehouse piece-picking is a good example. In an e-commerce fulfillment warehouse, human workers need to pick and place millions of different products into boxes based on customer requirements.
Traditional computer vision cannot handle such a wide variety of objects because each item needs to be registered, and each robot needs to be programmed beforehand. However, deep learning and reinforcement learning now enables robots to learn to handle various objects with minimum help from humans.
There might be some goods that robots never encountered before and will need help or demonstration from human workers (Level 3). But the algorithm will improve and get closer to full autonomy as the robot collects more data and learns from trial and error (Level 4).
Like the autonomous car industry, robotics startups are also taking different approaches to autonomy for their robots. Some believe in a collaborative future between humans and robots, and focus on Level 3 mastery.
Others believe in a fully autonomous future, skipping Level 3 and focusing on Level 4, and eventually Level 5.
This is a reason why it’s so difficult to assess the actual level of autonomy in some cases. For example, a startup could claim that it’s working on Level 3 human-centered artificial intelligence (for example, teleoperation), while the solution is actually a “mechanical Turk.”
On the other hand, startups targeting Levels 4 and 5 cannot achieve desirable results overnight, which could scare early adopters away, making data collection even more difficult in the early stages.
The rise of AI-enabled robots in warehouses and beyond
On the bright side, robots are being used in a lot more use cases and industries than cars, making Level 4 more accessible for robots than cars. We will first see AI-enabled robots up and running in warehouses, because the warehouse is a semi-controlled environment, and piece picking is a critical, but fault-tolerant task.
On the other hand, autonomous home or surgical robots will happen much later in the future, because more uncertainties exist in the operating environment, and some tasks are not recoverable. We will see more AI-enabled robots being used across more scenarios and industries as precision and reliability of the technology improves.
There are only about 3 million robots in the world, most of which work on handling, welding, and assembly tasks. So far, there are very few robot arms being used in warehouses, agriculture, or industries other than automotive and electronics, due to the limitation of computer vision as mentioned earlier.
Over the next 20 years, we will see explosive growth and a changing industry landscape brought by next-generation robots as deep learning, reinforcement learning, and cloud computing unlock the potential for these robots.
Not all industries will adopt automation at the same pace, however, because of incentives of current players and the technical complexities mentioned earlier.
Next-generation AI-enabled robotics startup landscape
What are some of the growth opportunities in the AI-enabled robotics sector? What are the different approaches and business models taken by startups and incumbents in this market? Here’s an overview of some example companies in each segment. Please note, this is by no means a landscape that includes all the companies – we welcome your input and feedback to make this more complete.
Vertical vs. horizontal approach
The most interesting thing I discovered when looking into the startup scene is two fundamentally different approaches – vertical or horizontal.
Most startups in Silicon Valley, such as Covariant and Osaro, focus on developing solutions for specific vertical markets, such as e-commerce fulfillment, manufacturing, or agriculture. This full-stack approach makes sense, because the technology is still nascent.
Instead of relying on others to supply critical modules or components, building an end-to-end solution might be faster, and gives companies more control over the end use cases and performance.
However, scalable use cases are not that easy to identify. Warehouse piece-picking is a low-hanging fruit, with relatively high customer willingness to pay, and technical feasibility. Almost every warehouse has the same needs for piece picking.
But in manufacturing, assembly tasks could vary factory by factory. Manufacturing also requires higher accuracy an speed than those tasks in warehouses. Even though machine learning allows robots to improve over time, at the moment it still cannot achieve the same accuracy as closed-loop robots, because of how it learns from trial and error.
This is why startups such as Mujin and CapSen Robotics choose traditional computer vision rather than deep reinforcement learning approaches. But this requires every object to be registered, so the training time, flexibility (lacking ability to adapt to changes), and unit economics don’t really make sense. Once deep reinforcement learning reaches the performance threshold and becomes more mainstream, this traditional approach could become irrelevant.
Another issue with startups is their valuation tends to be high. We often see startups raising more than tens of millions of dollars from Silicon Valley, without the promise of any significant revenue stream. It’s easy for entrepreneurs to paint a rosy future of deep reinforcement learning, but the reality is that it will take us years to get there. Venture capitalists bet on teams with good talent and technologies, even though these companies are still far away from making revenue goals.
The more practical, and less often used, approach is to go horizontal, building a technology stack and enablers that can be used across different industries. We can simplify the robotics technology stacks into three components:
- Sensing (input)
- Processing
- Actuation (output)
There’s also development tools in addition to these. I use the term processing loosely here to include everything not in sensing or actuation, including controllers, machine learning, operating systems, and modules for robots.
I believe this segment has the most potential for growth in the near future. One pain point for robotics customers is that the market is extremely fragmented. All robot makers have proprietary languages and interfaces, making it difficult for system integrators and end users to integrate robots with their systems.
As the industry matures and more robots are being used beyond automotive and electronics factories, we will need standard operating systems, protocols, and interfaces for better efficiencies and shorter time to market.
Several startups are working on this modular approach. For example, Veo Robotics is developing safety modules to allow industrial robots and humans to work together, and Realtime Robotics provides solutions to accelerate motion planning.
These are just a few observations I have made from working with and talking to experts in the industry. I look forward to hearing more about your thoughts, and exchanging notes with entrepreneurs, professors, and venture capitalists in this space.