Leveraging unlabeled or weakly labeled egocentric video via self-supervised objectives
Investigate whether and how incorporating weaker or unlabeled egocentric human video through self-supervised objectives during pretraining improves the performance and generalization of the EgoScale flow-based Vision–Language–Action policy for dexterous manipulation.
References
Looking forward, several directions remain open. As egocentric human data continues to grow, incorporating weaker or unlabeled video via self-supervised objectives may further amplify these benefits.
— EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data
(2602.16710 - Zheng et al., 18 Feb 2026) in Conclusion