Non-Rigid Structure-from-Motion

Reconstructing the 3D shape of a deformable object from a monocular image sequence is a challenging problem, because multiple shape configurations can produce the same image projection. We address the problem of reconstructing volumetric non-rigid 3D geometries under full perspective projection by employing a 3D template model of the object in rest pose. Volumetric non-rigid reconstruction is even more challenging than the reconstruction of planar-like surfaces, because only the front part of the object surface is visible in the image, while the back part and the interior have to be inferred without direct image information. While the object starts to deform, captured by only a single camera, the non-rigid shape is reconstructed sequentially by estimating the camera parameters and the deformation with respect to the template model in an optimization framework.

Volumetric Deformable Structure from Motion

Non-rigid structure from motion plays an important role in computer vision application such as human-computer interaction, motion capture, tracking, etc. Without the knowledge of a shape deformation model, this task is severely under-constrained, because multiple shape configurations can project to the same image location. Our approach builds on volumetric template-based approaches and can be divided in two components:

  • Template computation of rest pose
  • Estimation of camera and shape deformation

At first, a 3D template model of the rest pose is computed. For this purpose, the object is captured in its initial state with a multi-view camera set-up, such that traditional rigid reconstruction techniques can be employed for template generation.

The template serves as geometric and topological prior for the next step, where the template model is modified in order to satisfy the constraints imposed by the new input image depicting the object in a deformed state. The energy function that is minimized comprises two main terms, one accounts for the data fitting, the other controls the smoothness of the deformation. The data fitting term enforces that specific 3D surface points project to the correct image location and penalizes volume configurations that project outside the object silhouette. The deformation is regularized three-fold by taking into account temporal and surface smoothness as well as volume preservation.


L. Kausch, A. Hilsmann, P. Eisert
Template-Based 3D Non-Rigid Shape Estimation from Monocular Image Sequences, Proc. Vision, Modeling, and Visualization (VMV), Bonn, Germany, Sep. 2017.