Visual SLAM is Oculus’ solution to room-scale VR.
Update 09/14/2019
Facebook recently released a technical blog on Oculus Insight using visual-inertial SLAM which confirmed the analysis of this article including my prediction that IMU is used as part of the “inertial” system. In addition, in 2016, Facebook detailed its first generation of the SLAM system with direct reference to ORB-SLAM, SVO, and LSD SLAM. Both articles are great technical pieces to read with system architecture diagrams and solutions to the SLAM accuracy and efficiency problem on the embedded device.
The HTC Solution
Room-scale VR has been a work-in-progress ever since manufacturers try to come up with different hardware and wires to allow the VR system to track the user within a room.
To support a fully immersive room-scale VR experience, HTC VIVE has to bundle the lighthouse base stations along with the headset to accurately report the headset position and orientation. The system also tracks the motion of the VIVE Controllers typically placed in user’s hands and VIVE Trackers that can be attached to any physical accessories or controllers. The system works exceptionally well after everything is set up, and it is pretty accurate, but it requires wires, and tracking is always limited to the physical range covered by the lighthouse stations.
The Oculus has been behind in the development of room-scale VR for many years or a least on the surface. Enter Oculus Quest, the All-in-one VR system with no wires or even PC requirements. During the Oculus Quest announcement, the Oculus Insight feature immediately caught my full attention.
This technology works by detecting thousands of points in your environment to compute an accurate position of the headset every millisecond. — Oculus
What is Visual SLAM?
Oculus solved the problem using Visual SLAM. SLAM stands for simultaneous localization and mapping, and the research in this area has been around for many years. SLAM is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent’s location.
SLAM is a chicken-and-egg problem because you need the map so you can accurately localize yourself within the environment, but at the same time, in order to create a functional map based on sensor information, the sensor localization, or in this case the position and orientation of the agent equipped with the sensors, need to be known. If you are not convinced this is a hard problem for VR system, imagine you are in a forest trying to draw a map of the woods with a cheap laser range finder, a campus or just using your eyes.
The application of SLAM is beyond VR. Every self-driving cars, flying drones, or even Mars rovers need a localization system although some system knows the map or least what the map looks like before, so a true SLAM system is not required. However, in the case of Oculus Quest, the task is more difficult because the system won’t know the map of user’s living room initially, and the computation has to be done using a mobile processor. The “visual” part of the Visual SLAM system usually requires one or multiple cameras. Comparing to the system with lidars found in most autonomous cars, the visual system with cameras are much cheaper to build.
Anyone relying on lidar is doomed. — Elon Musk
Oculus Quest with Visual SLAM
Oculus has been lagging behind the room-scale technology for several years or did they? While HTC is perfecting the tracking hardware, Facebook has made several significant hires. Most notably, according to LinkedIn, since 2016, Jakob Engel has been working for Facebook Reality Labs as a Research Lead. Jakob is the author behind LSD-SLAM: Large-scale direct monocular SLAM and a student of the famous Computer Vision expert Daniel Cremers.
Oculus did not stop there by hiring another Visual SLAM expert, Raúl Mur Artal, in 2017. Raúl is the author of the commonly known ORB-SLAM and ORB-SLAM2 package. You can read about the packages in Jeroen’s medium article.
It is not clear to us whether or not Oculus is using LSD-SLAM, ORB-SLAM2, or a brand new system. Regardless of the underlying system, there are many challenges of using Visual SLAM to solve the VR room-scale problem. Oculus Quest is a mobile system with computing restraints. Besides, most Visual SLAM system relies on the vehicle or agent to loop around the environment so the system can perform loop closures to build the map. In a VR system, a user is generally standing in a small area which would give the SLAM system limited information to produce the map. Mapping at night can also be difficult with limited visual cues but IR can help in this case. Also, the other cameras are needed to perform tracking of the controllers. The visual aid alone is probably not enough. I suspect that Oculus used the four wide-angle cameras set up along with additional sensors such as IMU to solve this problem.
The Future is Here
Visual SLAM is the future for Oculus because their device is no longer limited to mapping a room. Depending on the density of the required map and the computing power of the system, Oculus Quest can support multi-room or even arena-scale VR. There are several applications already demonstrating the capability of Oculus Quest beyond room-scale.