Introduction
In a significant move for the open-source robotics community, Ant Group has officially released LingBot-Depth on Hugging Face. This massive RGB-D dataset, totaling over 2.7 TB, represents one of the most comprehensive resources available for training spatial perception models in the field of Embodied AI.
As robots move from controlled environments to complex, real-world settings, the ability to accurately perceive depth and 3D geometry becomes critical. LingBot-Depth aims to accelerate research in this area by providing a high-quality foundation for Masked Depth Modeling (MDM), a technique that allows AI to reconstruct missing or noisy depth information using visual cues.
A Massive Scale for Modern AI
The LingBot-Depth dataset is notable not just for its quality, but for its sheer scale and diversity. With over 3 million examples, it provides the necessary data density to train robust foundation models that can generalize across different hardware and environments.
Key statistics of the release include:
- Total Size: 2.71 TB of compressed data.
- Data Pairs: Over 3 million triplets consisting of RGB images, raw sensor depth, and high-precision ground truth.
- Format: High-resolution images and 16-bit PNG depth maps stored in millimeters (mm).
Multi-Modal Data Composition
To ensure the dataset is useful for a wide range of applications, the Robbyant team structured it into three distinct sub-datasets:
- RobbyReal: This section contains data from a variety of industrial depth sensors, including the Orbbec Gemini 335 and Intel RealSense D400 series. It captures the intricacies of real-world physics, including challenging surfaces like mirrors and glass.
- RobbyVla: Dedicated to Vision-Language-Action (VLA) robotics, this portion was collected during manipulation tasks using Franka and UR7e robot arms. It is particularly valuable for researchers working on fine-motor control and interactive perception.
- RobbySim: High-fidelity simulations provide perfectly labeled data for edge cases that are difficult to capture in reality, ensuring that models trained on LingBot-Depth are resilient to rare environmental conditions.
Advancing Spatial Perception
The release of this dataset accompanies the paper "Masked Depth Modeling for Spatial Perception," which introduces the MDM framework. By providing raw depth data alongside ground truth, Ant Group allows researchers to benchmark how well their models can "clean up" the noisy output of consumer-grade sensors.
This is a critical step for Embodied AI, as it reduces the reliance on expensive, high-end LiDAR systems and allows more affordable robots to navigate and interact with the world using standard RGB-D cameras.
Conclusion
The release of LingBot-Depth on Hugging Face is a milestone for open robotics research. By democratizing access to 2.7 TB of high-quality spatial data, Ant Group is providing the community with the "fuel" needed to power the next generation of intelligent, perceptive robots.
Whether you are working on autonomous navigation, robotic manipulation, or foundational vision models, LingBot-Depth offers a robust and versatile starting point for your research.