Elon Musk: “Anyone relying on lidar is doomed.” Experts: Maybe not

Lots of companies are working to develop self-driving cars. And almost all of them use lidar, a type of sensor that uses lasers to build a three-dimensional map of the world around the car.

But Tesla CEO Elon Musk argues that these companies are making a big mistake.

“They’re all going to dump lidar,” Elon Musk said at an April event showcasing Tesla’s self-driving technology.

“Anyone relying on lidar is doomed.”

“Lidar is really a shortcut,” added Tesla AI guru Andrej Karpathy. “It sidesteps the fundamental problems of visual recognition that is necessary for autonomy. It gives a false sense of progress, and is ultimately a crutch.”

In recent weeks I asked a number of experts about these claims. And I encountered a lot of skepticism.

“In a sense all of these sensors are crutches,” argued Greg McGuire, a researcher at MCity, the University of Michigan’s testing ground for autonomous vehicles. “That’s what we build, as engineers, as a society—we build crutches.”

Self-driving cars are going to need to be extremely safe and reliable to be accepted by society, McGuire said. And a key principle for high reliability is redundancy. Any single sensor will fail eventually. Using several different types of sensors makes it less likely that a single sensor’s failure will lead to disaster.

“Once you get out into the real world, and get beyond ideal conditions, there’s so much variability,” argues industry analyst (and former automotive engineer) Sam Abuelsamid. “It’s theoretically possible that you can do it with cameras alone, but to really have the confidence that the system is seeing what it thinks it’s seeing, it’s better to have other orthogonal sensing modes”—sensing modes like lidar.

Camera-only algorithms can work surprisingly well

On April 22, the same day Tesla held its autonomy event, a trio of Cornell researchers published a research paper that offered some support for Musk’s claims about lidar. Using nothing but stereo cameras, the computer scientists achieved breakthrough results on KITTI, a popular image recognition benchmark for self-driving systems. Their new technique produced results far superior to previously published camera-only results—and not far behind results that combined camera and lidar data.

Unfortunately, media coverage of the Cornell paper created confusion about what the researchers had actually found. Gizmodo’s writeup, for example, suggested the paper was about where cameras are mounted on a vehicle—a topic that wasn’t even mentioned in the paper. (Gizmodo re-wrote the article after researchers contacted them.)

To understand what the paper actually showed, we need a bit of background about how software converts raw camera images into a labeled three-dimensional model of a car’s surroundings. In the KITTI benchmark, an algorithm is considered a success if it can accurately place a three-dimensional bounding box around each object in a scene.

Software typically tackles this problem in two steps. First, the images are run through an algorithm that assigns a distance estimate to each pixel. This can be done using a pair of cameras and the parallax effect. Researchers have also developed techniques to estimate pixel distances using a single camera. In either case, a second algorithm uses depth estimates to group pixels together into discrete objects, like cars, pedestrians, or cyclists.

“You could close the gap significantly”

“Our approach achieves impressive improvements over the existing state-of-the-art in image-based performance,” they wrote. In one version of the KITTI benchmark (“hard” 3-D detection with an IoU of 0.5), for example, the previous best result for camera-only data was an accuracy of 30%. The Cornell team managed to boost this to 66%.

In other words, one reason that cameras plus lidar performed better than cameras alone had nothing to do with the superior accuracy of lidar’s distance measurements. Rather, it was because the “native” data format produced by lidar happened to be easier for machine-learning algorithms to work with.

“What we showed in our paper is you could close the gap significantly” by converting camera-based data into a lidar-style point cloud, said Kilian Weinberger, a co-author of the Cornell paper, in a phone interview.

Still, Weinberger acknowledged, “there’s still a fair margin between lidar and non-lidar.” We mentioned before that the Cornell team achieved 66% accuracy on one version of the KITTI benchmark. Using the same algorithm on actual lidar point cloud data produced an accuracy of 86%.

Elon Musk: “Anyone relying on lidar is doomed.” Experts: Maybe not

Camera-only algorithms can work surprisingly well

Further Reading

“You could close the gap significantly”

Researchers track fishing fleets by putting radar sensors on birds

Apple reports a blowout Q1 2020, but names coronavirus as a worry for the next quarter

Apple releases iOS 13.3.1 and macOS Catalina 10.15.3

London to deploy live facial recognition to find wanted faces in crowd

Elon Musk: “Anyone relying on lidar is doomed.” Experts: Maybe not

Camera-only algorithms can work surprisingly well

Further Reading

“You could close the gap significantly”

Researchers track fishing fleets by putting radar sensors on birds

Apple reports a blowout Q1 2020, but names coronavirus as a worry for the next quarter

Apple releases iOS 13.3.1 and macOS Catalina 10.15.3

London to deploy live facial recognition to find wanted faces in crowd

Dwell Secure & New York Software Developers Create Disaster Preparedness App

Researchers track fishing fleets by putting radar sensors on birds

Apple reports a blowout Q1 2020, but names coronavirus as a worry for the next quarter