I'm a PhD (2019-) student at CMU Robotics, advised by Prof. Deva Ramanan. My research is focused on computer vision and machine learning. I’m particularly interested in learning structures (e.g., 3D, motion, physics) from partial observations (e.g., monocular videos).
I obtained a MS degree at CMU Robotics and a BEng degree at Xi'an Jiaotong University. I've also had the opportunity to intern at Meta AI, Google Research, Argo AI, TuSimple, and SVCL at UC San Diego. Our research proposal won the 2021 Qualcomm innovation fellowship.
Github /  Google Scholar  /  Email  /  Resume
Given monocular videos of an articulated object, PPR builds 3D models of the object and the background scenes whose physical configurations, including relative scales, contacts, and control torque profiles satisfy dynamics and contact constraints.
RAC learns category-level deformable 3D models from monocular videos. It disentangles morphology and motion and allows for motion retargeting.
Given casual videos capturing a deformable object, BANMo reconstructs an animatable 3D model in a differentiable volume rendering framework.
Given a long video or multiple short videos, ViSER jointly optimizes articulated 3D shapes and a pixel-surface embedding to establish dense correspondences over video frames.
Given several (8-16) unposed images of the same instance, NeRS optimizes for a textured 3D reconstruction along with the illumination parameters at test-time.
A template-free approach for articulated shape reconstruction from a single video by combining differentiable rendering and data-driven correspondence and segmentation priors.
We propose a neural architecture powered by geometric reasoning that decomposes two frames into a rigid background and multiple moving rigid bodies, parameterized by 3D rigid transformations and depth.
We describe a neural architecture to upgrade 2D optical flow to 3D scene flow using optical expansion, which reveals changes in depth of scene elements over frames, e.g., things moving closer will get bigger.
We introduce several simple modifications to the optical flow volumetric layers that: 1) significantly reduces computation and parameters, 2) enables test-time adaptation of cost volume size, and 3) converges much faster.
To adress the problem of real-time stereo matching on high-res imagery, an end-to-end framework that searches for correspondences incrementally over a coarse-to-fine hierarchy is proposed.
We recast the continuous problem of depth regression as discrete binary classification, whose output is the occupancy probabilities on a 3D voxel grid. Such output reliably and efficiently captures multi-modal depth distributions in ambiguous cases.