We present a shape-from-texture SFT formulation, which is equivalent to a single-plane/multiple-view pose estimation problem statement under perspective projection. As in the classical SFT setting, we assume that the texture is constructed of one or more repeating texture elements, called texels, and assume that these texels are small enough such that they can be modeled as planar patches. In contrast to the classical setting, we do not assume that a fronto-parallel view of the texture element is known a priori. Instead, we formulate the SFT problem akin to a Structure-from-Motion (SFM) problem, given n views of the same planar texture patch.
We assume a full perspective camera model with known intrinsics and estimate the patch poses from estimated homographies between the distorted texel appearances in the image. Each homography between two arbitrary patches yields an estimate of the normal vector of one of the two patches (reference patch) and the rigid motion between the two patches. By using each patch as reference to all other patches in turn, we get enough constraints to set up a stable linear cost function to optimize the 3D poses of the texel patches. A smooth surface is computed by regression with approximating thin-plate splines using the estimated patch centroids as data points. The final reconstruction is up to a single global scale factor.
A. Hilsmann, D.C. Schneider, P. Eisert:
Template-free Shape from Texture with Perspective Cameras, Proc. British Machine Vision Conference BMVC 2011, Dundee, Scotland, September 2011. [PDF]