Towards Next-Gen Autonomous Mobile Robotics: A Technical Deep Dive into Visual-Data-Driven AMRs Powered by Kudan Visual SLAM and NVIDIA Isaac Perceptor

03.18.2025

1. Introduction

Autonomous Mobile Robots (AMRs) have rapidly transformed industries by automating complex, repetitive, and hazardous tasks. Initially guided by basic sensors and pre-defined paths, today’s AMRs are evolving into sophisticated visual-data-driven systems, capable of interpreting their environment in real-time through advanced 3D vision and AI-powered perception technologies.

At the forefront of this revolution are integrated solutions combining precise positioning, robust spatial awareness, and real-time data processing. Recognizing these emerging demands, we have brought together two cutting-edge technologies: Kudan Visual SLAM, renowned for its accuracy, robustness and adaptability in challenging real-world environments, and NVIDIA Isaac Perceptor, built on NVIDIA Isaac ROS, a collection of AI-driven workflows providing comprehensive 3D perception capabilities.

This integration empowers AMRs to leverage richer inputs from 3D cameras, enhancing their ability to perceive, understand, and navigate complex, unstructured, and dynamic environments. By combining Kudan’s Visual SLAM, which delivers robust and precise localization and mapping, with NVIDIA Isaac Perceptor’s powerful multi-camera 3D surround vision, robots gain unprecedented awareness and adaptability, pushing the boundaries of what’s possible in industrial automation, logistics, and beyond.

In this deep dive, we’ll explore how the integration of Kudan Visual SLAM and NVIDIA Isaac Perceptor is setting new performance standards for AMR perception, and driving the next leap in the evolution of autonomous robotics.

Key highlights of what the blog entails:

Advantages of 3D cameras over 2D LiDAR
Software architecture and key modules
Performance evaluation in industrial settings
Benefits: cost efficiency, enhanced mapping, superior obstacle detection, precise localization
Future innovations

2. Integrating KdVisual with Isaac Perceptor

Traditional 2D LiDAR-based systems, though effective, present significant limitations including cost, difficulty detecting obstacles beyond a certain height or dynamic elements, and adaptivity to environmental changes. Addressing these constraints, the integration of Kudan Visual SLAM with NVIDIA Isaac Perceptor offers a revolutionary solution by leveraging 3D cameras for comprehensive spatial awareness.

With Isaac Perceptor, AMRs can generate accurate global occupancy maps and local 3D obstacle maps from 3D camera data, eliminating the dependency on costly LiDAR hardware. When combined with Kudan’s Visual SLAM technology, which excels in precise tracking and robust mapping, this approach delivers significant advantages:

Cost Efficiency: Utilizing 3D cameras dramatically reduces hardware costs compared to LiDAR sensors, offering comparable or superior performance.
Expanded Coverage: 3D cameras enable comprehensive perception and detection of both ground-level and overhead obstacles, providing thorough environmental understanding and enhancing operational safety and collision avoidance performance.
Dual Functionality: Cameras used for localization simultaneously perform obstacle detection, eliminating redundant sensor hardware and further reducing overall system costs.
Dynamic Obstacle Resilience: Operating within visual space provides inherent resilience against dynamic environments. Additionally, integrating UNet-based segmentation masks allows dynamic obstacles to be efficiently filtered during map generation and tracking phases, resulting in cleaner maps and more reliable localization.
Enhanced Adaptability to Complex Environments: Unlike traditional 2D LiDAR-based methods, which face challenges in featureless areas like long corridors and dynamic environments with frequent scenery changes, 3D camera-based Visual SLAM leverages both appearance-based and geometric cues. This enables reliable localization even in low-texture spaces, open areas with minimal structural features, and environments undergoing continuous visual transformations.

Further enhancing the integration, NVIDIA’s Nvblox package is leveraged to produce both global and local costmaps directly from depth images. These costmaps facilitate smooth, reliable navigation with the ROS 2 Nav2 stack, optimizing safety and operational efficiency.

Through advanced sensor fusion and robust loop closure mechanisms, Kudan Visual SLAM harnesses Isaac Perceptor’s functionality to deliver a custom SLAM solution, ensuring precise and reliable localization. By effectively merging visual data-driven perception with cutting-edge AI techniques, this integration sets a new standard for AMR navigation, enabling safe, efficient, and cost-effective operations without reliance on expensive traditional sensors.

Key Takeaway: The integration of Kudan Visual SLAM and NVIDIA Isaac Perceptor leverages 3D cameras to overcome the limitations of traditional 2D LiDAR systems, offering cost-effective, robust, and adaptable navigation solutions for AMRs.

3. Software Architecture

The following diagram illustrates the overall software architecture of the integrated package, highlighting the integration and interactions between Kudan Visual SLAM and Isaac Perceptor.

Key Software Modules and Roles:

Stereo Images: Captured using stereo cameras.
Kudan Visual SLAM: Provides precise positioning and orientation during both mapping and tracking.
Isaac ROS ESS Depth Inference: Generates depth images from stereo camera inputs, which are utilized by Nvblox.
UNet Segmentation Mask: Detects dynamic obstacles (primarily people), ensuring their removal from Nvblox costmaps and Kudan Visual SLAM maps for accurate localization.
Isaac ROS Nvblox: Transforms depth images into meshes and costmaps for navigation.
Nav 2 (ROS 2 Navigation Stack): Enables the robot to navigate safely and efficiently based on generated costmaps.
Rviz: Provides visualization capabilities for maps and allow

Data Flow: Perception to Navigation

The system operates in two distinct phases:

Mapping Phase:

Visual Map Creation: Kudan Visual SLAM generates a highly accurate visual SLAM map.
Global Costmap Generation: Nvblox constructs a global costmap using depth images and trajectory from Kudan Visual SLAM.
Dynamic Obstacle Filtering: UNet segmentation masks ensure dynamic obstacles are excluded from both visual and global maps, enhancing map quality.

Tracking and Navigation Phase:

Kudan Visual SLAM uses the previously created visual map for precise robot localization.
Nvblox local 3D reconstruction and costmap enables the ROS 2 Nav2 stack to navigate the robot in complex environments safely and efficiently.
Dynamic obstacles continue to be managed by segmentation masks, further improving tracking reliability of Kudan Visual SLAM.

Integration Challenges and Solutions

One of the key challenges in this setup is ensuring perfect alignment and consistency between the visual map and the global costmap. To achieve this, a two-step mapping process was implemented:

Optimized Visual Mapping with Kudan Visual SLAM: The process begins by generating a highly optimized and loop-closure-refined visual SLAM map using Kudan Visual SLAM. This ensures accurate and stable localization.
Nvblox Map Generation from Aligned Trajectory: Instead of creating both maps independently, the trajectory derived from the Kudan Visual SLAM map is used as the basis for generating the Nvblox map from the same dataset. This approach effectively integrates the benefits of Kudan’s loop closure and optimization processes with Nvblox’s clean occupancy grid generation, ensuring consistency between both mapping layers.

By structuring the mapping phase this way, the system maximizes the strengths of both technologies, leading to highly accurate, stable, and consistent map representations for AMR navigation.

Key Takeaway: The SW architecture integrates Kudan Visual SLAM and NVIDIA Isaac Perceptor through a two-phase approach, which ensures precise localization and efficient navigation while addressing challenges like map alignment.

4. Performance Evaluation & Key Benefits

In this section, we present the evaluation results, demonstrating the performance and advantages of the integrated system.

4.1 Evaluation Environment and System Setup

The evaluation was conducted in a 3,000-square-meter operational warehouse, featuring pallet racks and a dynamic industrial environment. The Nova Carter Robot was used as the test platform, equipped with:

4 Hawk stereo cameras for 3D vision-based navigation
2 2D LiDAR units for baseline comparison
1 Hesai XT-32 3D-LiDAR for ground truth mapping and tracking
1 Laser Distance Meters for ground truth tracking

The evaluation focused on four critical performance aspects:

Mapping Quality – Accuracy in constructing a costmap to support navigation, obstacle avoidance, and path planning.
Collision Avoidance Performance – Effectiveness in detecting and avoiding obstacles in real-time.
Tracking Accuracy – Precision in maintaining accurate localization while moving through the environment.
Tracking Robustness – Stability and reliability of localization in dynamic, complex settings.

4.2 Evaluation Results

– Mapping Quality

The following image compares:

Ground Truth Map: Built using 3D-LiDAR, including all detected static and dynamic obstacles.
Nvblox Costmap: Generated exclusively from camera inputs, demonstrating accurate representation of occupied and free space.

Observations:

The Nvblox costmap successfully replicates the warehouse environment using only camera-based perception, proving its viability as an alternative to LiDAR-based mapping.
Dynamic obstacles were automatically filtered out in the Nvblox costmap using UNet Segmentation Mask, improving navigation accuracy. In contrast, the ground truth map retains all dynamic elements on purpose for comparison.
The entire map-building process was highly efficient and fully automated, requiring no manual intervention and ensuring scalability and deployment readiness.

One of the key benefits of using Nvblox for costmap generation is its ability to construct fully 3D representations of the environment. The video below illustrates how Nvblox is used to build rich, structured 3D maps, providing a clearer and more detailed perception compared to traditional 2D costmaps.

– Collision Avoidance Performance

To assess real-time obstacle detection and collision avoidance capabilities, we compared Nvblox-based 3D perception with traditional 2D-LiDAR.

During the test, a low-lying pallet, which was not present in the global costmap, was placed along the robot’s path.

Nvblox’s camera-based perception accurately detected the pallet in real-time, dynamically updating the costmap and enabling the robot to adjust its trajectory, successfully avoiding a collision.

Traditional 2D LiDAR, however, failed to detect the pallet due to its scanning limitations at lower heights, causing the robot to collide with the obstacle.

Comparison of detection results for low-lying pallet between Nvblox and 2D-LiDAR

Similar tests were conducted with various obstacles, including human workers, boxes, loose cables, and low-reflectivity objects. In all cases, Nvblox exhibited equal or superior obstacle avoidance performance.

This evaluation highlights a critical advantage of camera-based 3D mapping: the ability to detect and react to unexpected obstacles across different heights, significantly enhancing navigation safety in dynamic and unstructured environments.

– Tracking Accuracy

To assess the localization accuracy of Kudan Visual SLAM, we conducted the following controlled tests, focused on the system’s ability to maintain precise localization while navigating through different scenarios.

Test Scenarios

Two distinct navigation patterns were tested:

Straight-Line Navigation – The robot moves in a 25-meter-long linear path to assess drift and consistency.
Circular Navigation – The robot follows a 30-meter-long looped trajectory to assess drift and localization stability with turns.

Evaluation Metrics

The accuracy assessment was based on two key criteria:

Goal-Point Accuracy: The precision in reaching pre-defined target locations.
Trajectory Accuracy: The deviation of the estimated trajectory from the ground truth.

Ground Truth Reference

To ensure reliable benchmarking, we used:

3D LiDAR-based localization (e.g., Kudan 3D-LiDAR SLAM) as the trajectory reference.
Laser Distance Meters to validate goal-point accuracy.

Testing Methodology

To ensure statistical reliability, each test scenario was repeated 10 times under identical conditions.

During the evaluation, data was collected and analyzed to compare Kudan Visual SLAM’s localization performance against the ground truth measurements. Kudan Visual SLAM relied solely on natural environmental structures, without the use of any artificial markers (e.g. 2D markers) to aid localization.

Results

The following section presents the evaluation findings:

1. Goal-Point Accuracy:

2. Trajectory Accuracy:

Trajectory Accuracy – 3 sample results for Straight-Line Navigation

Trajectory Accuracy – 3 sample results for Circle Navigation

The evaluation results indicate that Kudan Visual SLAM provides a sufficiently accurate level of tracking for AMR operations in industrial environments.

For applications requiring higher precision, such as docking scenarios, Kudan Visual SLAM can integrate 2D markers to enhance accuracy further, achieving sub-centimeter precision when necessary.

– Tracking Robustness

The final test assessed the tracking stability and robustness of Kudan Visual SLAM in comparison to 2D LiDAR-based localization under challenging conditions. The evaluation included four key scenarios:

Structural Changes – Assessing the system’s adaptability when environmental structures are modified.
Lighting Variations – Evaluating localization performance under different lighting conditions.
Dynamic Objects – Testing resilience against moving obstacles in the environment.
Recovery from Tracking Loss – Measuring the system’s ability to regain localization after temporary loss.

Structural Changes

Lighting Variations

Dynamic Objects

Recovery from Tracking Loss

These results demonstrate that Kudan Visual SLAM exhibits strong localization robustness across all test scenarios, maintaining stability even in dynamic environments, which would be challenging for 2D LiDAR-based localization due to its single scan-plane limitation, making it less adaptable to environmental changes.

These findings highlight Kudan Visual SLAM’s superior adaptability and reliability, ensuring consistent performance beyond static, structured settings. This presents a key advantage for AMR deployment in real-world industrial environments, as it effectively operates using only natural environmental structures, without the need for artificial markers or additional localization aids.

4.3 Key Benefits

Based on the performance evaluation conducted, the integration of Kudan Visual SLAM and NVIDIA Issac Perceptor provides cost-effective, high-precision, and adaptive navigation for AMRs in dynamic industrial environments.

In summary, the key benefits include:

Cost Efficiency – Eliminates reliance on LiDARs, using 3D cameras for both localization and obstacle detection, reducing hardware and maintenance costs.
Enhanced Mapping & Awareness – Generates accurate, real-time 3D costmaps with automated dynamic obstacle filtering, offering superior spatial understanding over traditional 2D maps.
Superior Obstacle Detection & Collision Avoidance – Detects low-lying, overhead, and unexpected obstacles missed by 2D LiDAR, improving safety and operational reliability.
High-Precision & Robust Localization – Maintains stable tracking even in long trajectories, dynamic environments, and lighting variations.
Adaptability to Complex Environments – Handles structural changes, moving objects, and tracking recovery better than single-plane 2D LiDAR, ensuring cost-efficient and scalable real-world deployment.

By leveraging camera-based perception, this integration delivers scalable, cost-effective, and highly adaptable AMR navigation, proving a viable alternative to LiDAR-based systems for industrial automation.

Key Takeaway: The integrated system excels in mapping quality, collision avoidance, tracking accuracy, and robustness, offering a cost-effective and adaptable alternative to traditional LiDAR-based systems.

5. Future Innovations

By leveraging advanced deep learning techniques, AMRs can now extract richer, more detailed insights from visual data, going beyond simple mapping to truly understanding their surroundings. This shift from geometric perception to semantic comprehension enables robots to recognize objects, predict movement patterns, and filter out dynamic elements, retaining only stable, relevant information.

However, this is just the beginning of a much larger innovation journey. As AI technologies continue to evolve, particularly in image-based intelligence, AMRs will develop an even deeper contextual awareness. Through continuous learning from real-world interactions, they will dynamically adapt to unpredictable industrial environments, refining their decision-making and improving resilience over time. This will result in robots that don’t just follow pre-set paths but intelligently respond to their surroundings in real-time.

The Rise of Contextually Aware AMRs

Future vision-powered AMRs will integrate multi-modal AI, combining computer vision with language processing to enhance comprehension. They won’t just detect and classify objects—they will understand their functional significance within a given task.
For example, an AMR won’t just identify machinery or furniture; it will infer the purpose of spaces based on their content, recognizing a room filled with tools as a workshop or an area with desks and monitors as an office. This deeper contextual interpretation elevates AMRs from simple automation tools to intelligent agents capable of autonomous decision-making and proactive problem-solving.

Expanding Recognition and Human Collaboration

Open-vocabulary recognition will further expand AMRs’ adaptability, allowing them to identify and interact with previously unseen objects and environments. By leveraging foundation models trained on diverse datasets, AMRs will generalize recognition beyond pre-programmed scenarios, ensuring robust, real-time adaptability in complex, evolving environments.

Additionally, integrating AMRs with sophisticated language models will enable seamless human-robot collaboration. These robots will interpret natural language instructions and clearly explain their decisions, fostering trust and cooperation between humans and machines. This intuitive interaction will streamline workflows, improve operational efficiency, and enhance workplace safety.

The Future of Vision-Based AMRs

These advancements are paving the way for AMRs that operate with unprecedented intelligence and autonomy. Vision-powered AMRs will no longer be passive observers—they will become proactive, adaptive, and collaborative entities, revolutionizing industrial environments with smarter, safer, and more contextually aware robotic solutions.

6. Conclusion

This deep dive into Visual-Data-Driven AMRs has showcased how the integration of Kudan Visual SLAM and NVIDIA Isaac Perceptor is redefining AMR navigation with cost-effective mapping, superior obstacle detection, and high-precision localization.
The results confirm that 3D camera-based perception is a scalable alternative to LiDAR, enhancing adaptability in dynamic environments.

As robotics continues to evolve, we invite AMR developers, researchers, and industry professionals to explore this integrated software package, leveraging its robust and high-performance localization and perception capabilities to enhance AMR efficiency and reliability.

Looking ahead, commercial deployments of the AMR integrating this software stack are on the horizon, demonstrating real-world applications at scale. Stay tuned for updates as we advance toward the next phase of vision-powered AMR adoption in industrial automation and beyond.