Gengshan Yang

Research

I'm interested in 3D computer vision and its intersections with rendering, simulation, robotics, and animal behavior. My research topics include 3D/4D reconstruction, inverse problems (e.g., inverse rendering, physics and control), and motion generation.

(Hover over images for animation)

Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos
Gengshan Yang, Andrea Bajcsy, Shunsuke Saito*, Angjoo Kanazawa*
ICLR, 2025

We learn interactive behavior models of an agent grounded in 3D from casual videos.

DressRecon: Freeform 4D Human Reconstruction from Monocular Video
Jeff Tan, Donglai Xiang, Shubham Tulsiani, Deva Ramanan, Gengshan Yang
3DV, 2025 (Oral)

Dynamic 3D human with cloth & object interactions from a single video, enabled by a two-layer motion field that fuses 3D human prior and generic pixel priors (e.g., normal, flow).

Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation
Ruihan Gao, Kangle Deng, Gengshan Yang, Wenzhen Yuan, Jun-Yan Zhu
NeurIPS, 2024

Using tactile sensing to enhance geometric details for 3D generation.

SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, Jonathon Luiten
CVPR, 2024

3D Gaussian Splatting for SLAM enables precise camera tracking and high-fidelity reconstruction using an RGBD camera.

SLoMo: A General System for Legged Robot Motion Imitation from Casual Videos
John Z. Zhang, Shuo Yang, Gengshan Yang, Arun Bishop, Swaminathan Gurumurthy, Deva Ramanan, Zachary Manchester
RA-L, 2023 / ICRA, 2024

An end-to-end motion transfer framework from monocular videos to legged robots.

PPR: Physically Plausible Reconstruction from Monocular Videos
Gengshan Yang, Shuo Yang, John Z. Zhang, Zachary Manchester, Deva Ramanan
ICCV, 2023 (Oral)

Given monocular videos, PPR builds 4D models of the object and the environment whose physical configurations satisfy dynamics and contact constraints.

Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis
Chonghyuk Song, Gengshan Yang, Kangle Deng, Jun-Yan Zhu, Deva Ramanan
ICCV, 2023

Total-Recon explains an RGBD video with compositional 4D neural fields, which enables extreme view synthesis including embodied views, 3rd-person views, and bird's-eye views.

RAC: Reconstructing Animatable Categories from Videos
Gengshan Yang, Chaoyang Wang, N Dinesh Reddy, Deva Ramanan
CVPR, 2023

RAC learns category-level deformable 3D models from monocular videos. It disentangles morphology and motion and allows for motion retargeting.

Distilling Neural Fields for Real-Time Articulated Shape Reconstruction
Jeff Tan, Gengshan Yang, Deva Ramanan
CVPR, 2023

We distill offline-optimized dynamic NeRFs into efficient video shape, pose, and appearance predictors.

3D-aware Conditional Image Synthesis
Kangle Deng, Gengshan Yang, Deva Ramanan, Jun-Yan Zhu
CVPR, 2023

A 3D-aware conditional generative model for controllable image synthesis. Given a 2D label map, such as a segmentation or edge map, our model learns to synthesize images consistent from different viewpoints.

BANMo: Building Animatable 3D Neural Models from Many Casual Videos
Gengshan Yang, Minh Vo, Natalia Neverova, Deva Ramanan, Andrea Vedaldi, Hanbyul Joo
CVPR, 2022 (Oral)

Given casual videos capturing a deformable object, BANMo reconstructs an animatable 3D model in a differentiable volume rendering framework.

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction
Gengshan Yang, Deqing Sun, Varun Jampani, Daniel Vlasic, Forrester Cole, Ce Liu, Deva Ramanan
NeurIPS, 2021 (Spotlight)

Given a long video or multiple short videos, ViSER jointly optimizes articulated 3D shapes and a pixel-surface embedding to establish dense correspondences over video frames.

NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild
Jason Y. Zhang, Gengshan Yang, Shubham Tulsiani*, Deva Ramanan*
NeurIPS, 2021

Given several (8-16) unposed images of the same instance, NeRS optimizes for a textured 3D reconstruction along with the illumination parameters at test-time.

LASR: Learning Articulated Shape Reconstruction from a Monocular Video
Gengshan Yang, Deqing Sun, Varun Jampani, Daniel Vlasic, Forrester Cole, Huiwen Chang, Deva Ramanan, William T. Freeman, Ce Liu
CVPR, 2021

A template-free approach for articulated shape reconstruction from a single video by combining differentiable rendering and data-driven correspondence and segmentation priors.

Learning to Segment Rigid Motions from Two Frames
Gengshan Yang, Deva Ramanan
CVPR, 2021

We analyze how to decompose two frames into a rigid background and multiple moving rigid bodies and propose a neural architecture to segment rigid motion groups given two frames.

Upgrading Optical Flow to 3D Scene Flow through Optical Expansion
Gengshan Yang, Deva Ramanan
CVPR, 2020 (Oral)

We describe a neural architecture to upgrade 2D optical flow to 3D scene flow using optical expansion, which reveals changes in depth of scene elements over frames, e.g., things moving closer will get bigger.

Volumetric Correspondence Networks for Optical Flow
Gengshan Yang, Deva Ramanan
NeurIPS, 2019

We introduce several simple modifications to the optical flow volumetric layers that: 1) significantly reduces computation and parameters, 2) enables test-time adaptation of cost volume size, and 3) converges much faster.

Hierarchical Deep Stereo Matching on High-resolution Images
Gengshan Yang, Joshua Manela, Michael Happold, Deva Ramanan
CVPR, 2019

To adress the problem of real-time stereo matching on high-res imagery, an end-to-end framework that searches for correspondences incrementally over a coarse-to-fine hierarchy is proposed.

Inferring Distributions Over Depth from a Single Image
Gengshan Yang, Peiyun Hu, Deva Ramanan
IROS, 2019

We cast the continuous problem of depth regression as discrete binary classification, whose output is the occupancy probabilities on a 3D voxel grid. Such output reliably and efficiently captures multi-modal depth distributions in ambiguous cases.

Gengshan Yang | 杨庚山

Software

Research

Activities & Fun