ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction. - Citegraph

Paper Info

Title
ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction.

Abstract
We introduce ViSER, a method for recovering articulated 3D shapes and dense3D trajectories from monocular videos. Previous work on high-quality reconstruction of dynamic 3D shapes typically relies on multiple camera views, strong category-specific priors, or 2D keypoint supervision. We show that none of these are required if one can reliably estimate long-range correspondences in a video, making use of only 2D object masks and two-frame optical flow as inputs. ViSER infers correspondences by matching 2D pixels to a canonical, deformable 3D mesh via video-specific surface embeddings that capture the pixel appearance of each surface point. These embeddings behave as a continuous set of keypoint descriptors defined over the mesh surface, which can be used to establish dense long-range correspondences across pixels. The surface embeddings are implemented as coordinate-based MLPs that are fit to each video via self-supervised losses.Experimental results show that ViSER compares favorably against prior work on challenging videos of humans with loose clothing and unusual poses as well as animals videos from DAVIS and YTVOS. Project page: viser-shape.github.io.

Year	Venue	DocType
2021	Annual Conference on Neural Information Processing Systems	Conference
Citations	PageRank	References
0	0.34	0
Authors
7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Gengshan Yang	1	15	3.34
Deqing Sun	2	1061	44.84
Varun Jampani	3	184	19.44
Daniel Vlasic	4	1	1.03
Forrester Cole	5	0	0.34
Ce Liu	6	3347	188.04
deva ramanan	7	10678	566.72

1