TY - GEN
T1 - Learning the 3D Fauna of the Web
AU - Li, Zizhang
AU - Litvak, Dor
AU - Li, Ruining
AU - Zhang, Yunzhi
AU - Jakab, Tomas
AU - Rupprecht, Christian
AU - Wu, Shangzhe
AU - Vedaldi, Andrea
AU - Wu, Jiajun
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - Learning 3D models of all animals in nature requires massively scaling up existing solutions. With this ultimate goal in mind, we develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly. One crucial bottle-neck of modeling animals is the limited availability of training data, which we overcome by learning our model from 2D Internet images. We show that prior approaches, which are category-specific, fail to generalize to rare species with limited training images. We address this challenge by introducing the Semantic Bank of Skinned Models (SBSM), which automatically discovers a small set of base animal shapes by combining geometric inductive priors with semantic knowledge implicitly captured by an off-the-shelf self-supervised feature extractor. To train such a model, we also contribute a new large-scale dataset of diverse animal species. At inference time, given a single image of any quadruped animal, our model reconstructs an articulated 3D mesh in a feed-forward manner in seconds.
AB - Learning 3D models of all animals in nature requires massively scaling up existing solutions. With this ultimate goal in mind, we develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly. One crucial bottle-neck of modeling animals is the limited availability of training data, which we overcome by learning our model from 2D Internet images. We show that prior approaches, which are category-specific, fail to generalize to rare species with limited training images. We address this challenge by introducing the Semantic Bank of Skinned Models (SBSM), which automatically discovers a small set of base animal shapes by combining geometric inductive priors with semantic knowledge implicitly captured by an off-the-shelf self-supervised feature extractor. To train such a model, we also contribute a new large-scale dataset of diverse animal species. At inference time, given a single image of any quadruped animal, our model reconstructs an articulated 3D mesh in a feed-forward manner in seconds.
KW - 3D reconstruction
KW - animal reconstruction
KW - single-view 3D
UR - https://www.scopus.com/pages/publications/85196874274
U2 - 10.1109/CVPR52733.2024.00931
DO - 10.1109/CVPR52733.2024.00931
M3 - Conference contribution
AN - SCOPUS:85196874274
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 9752
EP - 9762
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PB - Institute of Electrical and Electronics Engineers
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Y2 - 16 June 2024 through 22 June 2024
ER -