Action-Geometry Prediction with 3D Geometric Prior for Bimanual Manipulation

Nov 1, 2025ยท
Chongyang Xu
Chongyang Xu
,
Haipeng Li
,
Cheng Shen
,
Haoqiang Fan
,
Ziliang Feng
,
Shuaicheng Liu
ยท 0 min read
Abstract
By building upon pre-trained 3D geometric foundation models like VGGT or pi3, this work unifies geometry-aware latents with semantic features to jointly predict future action sequences and 3D scene evolutions, enabling robust, coordination-aware bimanual manipulation directly from RGB observations.
Type
Publication
Submitted to CVPR 2026
publications
Chongyang Xu
Authors
๐ŸŽ“ Ph.D. Student @ Sichuan University
๐Ÿ”ญ Embodied AI Intern @ Tongyi Robotics

Hello! ๐Ÿ‘‹

I’m Chongyang, a researcher who’s into physical AI & robotics, equally passionate about sports, music, humanities, and sociology. I’m doing multimodal learning and reinforcement learning in the grandest simulator of all: life โ€” one episode at a time, learning what’s worth the strife.

I’m openly seeking collaborations โ€” if you have any research ideas or projects, feel free to reach out!

Education ๐ŸŽ“

I’ve been studying at Sichuan University for 7 years and have fallen deeply in love with Chengdu. I received my B.Eng. in Software Engineering and am now pursuing my Ph.D. in Computer Science.