Ant Group Releases LingBot-Depth: A 2.7 TB RGB-D Dataset for Robotics

Ant Group has published LingBot-Depth on Hugging Face, a massive 2.7 TB dataset featuring over 3 million RGB-D examples for advancing spatial perception in embodied AI.

by HowAIWorks Team
Ant GroupLingBot-DepthHugging FaceRGB-D DatasetRoboticsEmbodied AIComputer VisionVLA

Introduction

In a significant move for the open-source robotics community, Ant Group has officially released LingBot-Depth on Hugging Face. This massive RGB-D dataset, totaling over 2.7 TB, represents one of the most comprehensive resources available for training spatial perception models in the field of Embodied AI.

As robots move from controlled environments to complex, real-world settings, the ability to accurately perceive depth and 3D geometry becomes critical. LingBot-Depth aims to accelerate research in this area by providing a high-quality foundation for Masked Depth Modeling (MDM), a technique that allows AI to reconstruct missing or noisy depth information using visual cues.

A Massive Scale for Modern AI

The LingBot-Depth dataset is notable not just for its quality, but for its sheer scale and diversity. With over 3 million examples, it provides the necessary data density to train robust foundation models that can generalize across different hardware and environments.

Key statistics of the release include:

  • Total Size: 2.71 TB of compressed data.
  • Data Pairs: Over 3 million triplets consisting of RGB images, raw sensor depth, and high-precision ground truth.
  • Format: High-resolution images and 16-bit PNG depth maps stored in millimeters (mm).

Multi-Modal Data Composition

To ensure the dataset is useful for a wide range of applications, the Robbyant team structured it into three distinct sub-datasets:

  • RobbyReal: This section contains data from a variety of industrial depth sensors, including the Orbbec Gemini 335 and Intel RealSense D400 series. It captures the intricacies of real-world physics, including challenging surfaces like mirrors and glass.
  • RobbyVla: Dedicated to Vision-Language-Action (VLA) robotics, this portion was collected during manipulation tasks using Franka and UR7e robot arms. It is particularly valuable for researchers working on fine-motor control and interactive perception.
  • RobbySim: High-fidelity simulations provide perfectly labeled data for edge cases that are difficult to capture in reality, ensuring that models trained on LingBot-Depth are resilient to rare environmental conditions.

Advancing Spatial Perception

The release of this dataset accompanies the paper "Masked Depth Modeling for Spatial Perception," which introduces the MDM framework. By providing raw depth data alongside ground truth, Ant Group allows researchers to benchmark how well their models can "clean up" the noisy output of consumer-grade sensors.

This is a critical step for Embodied AI, as it reduces the reliance on expensive, high-end LiDAR systems and allows more affordable robots to navigate and interact with the world using standard RGB-D cameras.

Conclusion

The release of LingBot-Depth on Hugging Face is a milestone for open robotics research. By democratizing access to 2.7 TB of high-quality spatial data, Ant Group is providing the community with the "fuel" needed to power the next generation of intelligent, perceptive robots.

Whether you are working on autonomous navigation, robotic manipulation, or foundational vision models, LingBot-Depth offers a robust and versatile starting point for your research.

Sources

Frequently Asked Questions

LingBot-Depth is a large-scale RGB-D dataset released by Ant Group's Robbyant team. It contains over 3 million examples and 2.7 TB of data, including real-world captures, simulations, and robotics manipulation data.
The dataset includes three main components: RobbyReal (real-world sensor data), RobbyVla (data from VLA robotics tasks), and RobbySim (high-fidelity simulated environments).
It is designed for training and benchmarking spatial perception models, specifically those using Masked Depth Modeling (MDM) to improve depth estimation accuracy from noisy sensor data.
The dataset features data from multiple industrial-grade sensors, including Orbbec Gemini 335 and various Intel RealSense models (D415, D435, D455).

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.