Updated: Mar 19, 2019
During the 1980s, the very first SLAM algorithms were not based on vision. Instead, they used single-beam laser rangefinders to obtain limited but direct measures of the distance between a SLAM agent and the obstacles around it. In time, SLAM algorithms and their sensing devices evolved to encompass a wide range of devices.
Most modern SLAM systems are based on vision and use one or more cameras for their main sensing devices. Cameras offer many advantages, such as the inherent richness of visual information, but also come with a set of limitations. To bypass these limitations, we can integrate visual and non-visual sensors to attain the best of both worlds. For example, a laser-based depth sensor is less sensitive to illumination than to rainy conditions, whereas a camera is more sensitive to the latter. Using both devices together is helpful in bypassing the limits of a single type of sensor.
Any visual SLAM system must extract the depths of the feature points in an image. In Monocular Visual SLAM, this extraction can be achieved only by correlating feature points viewed at different points in time while the camera moves. Unfortunately, this method does not provide enough information to disambiguate the depth computation completely, so there is the risk of introducing scale drift.
Passive stereo setups
The first solution to depth ambiguity involves a second camera rigidly placed at a known distance from the main camera. In Stereo Visual SLAM, the known baseline between the cameras can be used with the stereo disparity between observed features matched to calculate the depth of the feature points. This technique is known as a passive range technique, as it does not require a specific illuminator but only ambient light. The advantages of stereo setups include the potential ranges of operations, as well as the costs and power consumption needs of the devices. The disadvantages include dependency on the environmental lighting conditions and direct correlation between the working depth range and the distance between the two cameras.
RGB-D SLAM differs from passive sensing solutions because it pairs the main visual camera with an active depth sensing solution. An active 3D sensor can measure depth by illuminating the scene with a controlled light source and measuring the backscattered light. There are two main categories of active sensors:
There are two main categories of active sensors:
Projected-light sensors, also known as structured light sensors or fringe projection scanners, combine a standard 2D camera with a light projector at a known baseline. The projector illuminates the scene with a known pattern, whose distortion on the surface is observed by the camera to infer the depth and shape of the observed surfaces.
Time-of-flight (ToF) sensors use the speed of light to measure the distance of surfaces. The scene is illuminated in either a pulsed or modulated fashion, and the time delay between light emission and light detection is used to compute the depth.