First-Person View · Real-World Video Dataset
This dataset is captured by professional practitioners performing their actual jobs—for example, coffee scenes are captured from real baristas working at their stations. First-person operation focuses on complete processes of complex real-world tasks, ensuring data has long-term temporal correlation and task closure characteristics; first-person real-world roaming also fully presents spatial exploration and path planning processes.
This dataset's hand first-person view covers thousands of professional operation types, including farm operations, factory operations, kitchen operations, auto repair, hairdressing, craftsmanship, pet grooming, and more; roaming first-person view includes diverse scenarios like urban streets, natural landscapes, and warehouse spaces. Helping models break through scene limitations and achieve exponential improvement in generalization capability.
This dataset is based on the real world, focusing on first-person hand operations and roaming video data from professionals in specific job positions.
Every frame is precisely captured from real scenes, enhancing the model's understanding of physical laws and operation logic.
Two flagship products covering complete data solutions for hand operations and spatial roaming
Providing core data assets for world model and embodied intelligence pretraining
First-Person Long-Range Complex Hand Operation
First-person long-range complex task hand operation dataset. Focused on capturing human hand movements and object interactions during complex task completion, providing high-quality training data for robotics and embodied intelligence.
First-Person Real-World Roaming
First-person real-world roaming dataset. Collecting spatial perception data of human activities in real environments, covering various indoor and outdoor scenes, providing authentic data support for spatial intelligence and world models.
From raw data collection to finished data product delivery
Relies on close cooperation between algorithms and human expertise—leveraging strengths for efficiency and accuracy
Three data examples each for Operative Stream (OS) and Real Roam (RR)
Text annotation (optional): This video shows a person carefully arranging and wrapping a fresh flower bouquet primarily consisting of orange-yellow roses inside a flower shop.
Ego Lens Echo Dataset basic storage format is MP4 file Data parameters: 1080P/30fps
Text annotation (optional): This video shows a chef preparing fish soup, including slicing fish, blanching ingredients and placing them into a bowl with side dishes.
Ego Lens Echo Dataset basic storage format is MP4 file Data parameters: 1080P/30fps
Text annotation (optional): This video shows an auto mechanic performing an oil change maintenance on a vehicle in the workshop, including draining oil by loosening chassis bolts and removing and cleaning the oil filter.
Ego Lens Echo Dataset basic storage format is MP4 file Data parameters: 1080P/30fps
Text annotation (optional): This video shows a first-person view walking through a narrow alley in an old residential area, with weathered brick walls, low-rise flat buildings and parked electric bikes visible around.
Ego Lens Echo Dataset basic storage format is MP4 file Data parameters: 1080P/30fps
Text annotation (optional): This video shows a first-person view cycling along a suburban road on a sunny day, with residential buildings, parked vehicles and distant mountains visible along the way.
Ego Lens Echo Dataset basic storage format is MP4 file Data parameters: 1080P/30fps
Text annotation (optional): This video shows a first-person view of a ski resort environment on a sunny day, including riding a magic carpet up the slope and skiing down a wide, smooth slope, with mountains as the background.
Ego Lens Echo Dataset basic storage format is MP4 file Data parameters: 1080P/30fps