NeRF and Gaussian Splatting for Robotics and Mapping
08.14.2024
Neural radiance fields, better known as NeRFs, as well as Gaussian Splatting have captured the attention of many in the tech community recently. Both are methods for building and representing environments for the purpose of Novel View Synthesis (NVS): given a set of images of a scene from multiple known viewpoints, the task is to realistically render new synthetic images from angles not included in the original data. The introduction of NeRF was a significant advancement, and both NeRF and Gaussian Splatting now form a spectrum of techniques with different tradeoffs, opening up new possibilities for 3D visualization and asset creation.
Beyond their visual appeal, NeRFs and Gaussian Splatting offer a way to tackle some real challenges.
AI models require a significant amount of training data. While many models can be trained using publicly available data, robotics presents the unique challenge of needing data specific to the platform and use-case. For example, an object detection model trained to recognize people from images typically captured at head height may not perform well on a small mobile robot with a camera mounted close to the ground. To adapt the model to this new camera height, additional training data specific to this perspective is required for fine-tuning.
When we render an image using Novel View Synthesis, we have precise knowledge of the camera position, providing us with highly accurate ground-truth data. Simulation is a popular solution in robotics to address the data scarcity issue, but it introduces the additional challenge of sim-to-real transfer. If the simulation data is too idealized and not sufficiently realistic, models that were trained on this data may struggle to generalize to real-world scenarios. NeRF and Gaussian Splatting offer a way to bridge this realism gap, producing more lifelike synthetic images compared to traditional rendering pipelines.
Visual and lidar mapping software can also benefit NeRF and Gaussian Splatting. As input, these methods require the precise pose of the training images. A popular choice is to use the Structure From Motion tool COLMAP, but our optimized trajectory poses from our mapping tool can serve as an excellent alternative. Additionally, the lidar point cloud can be leveraged to initialize the Gaussian Splatting optimization to yield significantly better results.
As we explore different ways to enhance our maps with more semantic information, NVS can also play a role in bridging between 2D and 3D. While several perception models exist that work directly on 3D data, there is still more maturity in the 2D models space, and we want to be able to leverage these as well. Not only can we use NVS to generate images for running 2D models directly, but we can also consider the potential to use known relationships between 2D and 3D to help supervise the training of 3D models.
NeRF and Gaussian Splatting represent significant advancements in Novel View Synthesis, unlocking new possibilities in this space. Progress in this area is still very active, with many researchers publishing improvements to both methods and developing hybrid approaches that leverage their complementary strengths. These innovations are making NeRF-based approaches faster and Gaussian Splatting approaches more compressed and scalable to large environments. At Kudan, we are excited by these developments and continue to monitor advancements in this space to understand their potential impact on mapping and perception.
■For more details, please contact us from here.