3D Video Coding

We have been active in developing coding solutions for 3D video for several years and have successfully contributed to the major international standards for 3D video coding. With a new, emerging generation of 3D displays that do not require glasses, 3D video formats like multiview video plus depth (MVD) [1] have gained importance. We have played a leading role in developing the 3D video extension of H.265 / HEVC (3D HEVC) and a comprehensive overview of the technical features can be found in [2].

The role of coding and transmission in the 3D video processing chain is illustrated in figure 2. The acquisition of MVD consists of capturing a real world 3D scene by two or more cameras (multiview video), followed by the sender-side processing for obtaining sample-accurate depth maps, i.e., depth estimation. After encoding the MVD sequences, the bit stream is transmitted to the receiver-side. The scene geometry information provided by the depth maps enables rendering views of the scene from various additional perspectives via depth-image-based rendering (DIBR). Hence, the suitable set of views for different 3D displays and applications can be rendered from the decoded video and depth data.

3D-HEVC builds upon the multi-layer coding design of HEVC, where each video or depth sequence of each view represents a different layer and prediction between layers is enabled by so-called inter-layer dependencies. The different supported types of inter-layer prediction are illustrated in figure 3:

  • Inter-view prediction between the different views of either video or depth (as known from Multiview Video Coding).
  • Inter-component prediction between the video and depth component of the same view.
  • Combined inter-view / inter-component prediction between video and depth of different views.   

The base-layer (L0 in figure 3) corresponds to a video layer that has no inter-layer dependencies and is consequently fully compliant to HEVC, while additional block-level coding tools have been included for coding the dependent video and depth layers (L1 to L5 in figure 3). Compared to simulcast coding, bit rate savings of about 70% are achieved for the dependent layers [2].


We have developed a wide variety of 3D video coding tools that are designed for exploiting the statistical dependencies between video and depth views, and explicitly adapted to the specific properties of depth maps:

  • Inter-view prediction of motion data [4]:

Disparity-compensated prediction of motion parameters and residual data for dependent video layers, using inter-view reference pictures and coded or estimated depth information.

  • Geometry-based depth coding [5]-[7]:

New intra and inter-component prediction modes, using wedgelet and contour block partitions, and a complementary residual adaptation method in the spatial domain.

  • Motion parameter inheritance for depth [8]:

Inter-component prediction of motion vectors and block partitioning, reusing the co-located information of a video reference picture.

For increasing the end-to-end quality of a 3D video coding system, we have also developed improvements for the decoder-side view synthesis (DIBR).

 

NOTE — Links to the specification text and the reference software of 3D-HEVC can be found on the HEVC support site.

References

  1. K. Müller, P. Merkle, T. Wiegand, 3-D Video Representation Using Depth Maps, Proceedings of the IEEE , vol.99, no.4, pp.643-656, April 2011.
  2. G. Tech, Y. Chen, K. Müller, J.-R. Ohm, A. Vetro, Y.-K. Wang, Overview of the Multiview and 3D Extensions of High Efficiency Video Coding, IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 1, pp. 35-49, January 2016, doi: 10.1109/TCSVT.2015.2477935.
  3. K. Müller, H. Schwarz, D. Marpe, C. Bartnik, S. Bosse, H. Brust, T. Hinz, H. Lakshman, P. Merkle, F.H. Rhee, G. Tech, M. Winken, T. Wiegand, 3D High-Efficiency Video Coding for Multi-View Video and Depth Data, IEEE Transactions on Image Processing, vol.22, no.9, pp.3366-3378, Sept. 2013.
  4. H. Schwarz, T. Wiegand, Inter-view prediction of motion data in multiview video coding, Picture Coding Symposium (PCS 2012), pp.101-104, 7-9 May 2012.
  5. P. Merkle, K. Müller, D. Marpe, T. Wiegand, Depth Intra Coding for 3D Video based on Geometric Primitives, IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 3, pp. 570-582, March 2016, doi: 10.1109/TCSVT.2015.2407791.
  6. P. Merkle, C. Bartnik, K. Müller, D. Marpe, T. Wiegand, 3D video: Depth coding based on inter-component prediction of block partitions, Picture Coding Symposium (PCS 2012), pp.149-152, 7-9 May 2012.
  7. P. Merkle, K. Müller, T. Wiegand, Coding of depth signals for 3D video using wedgelet block segmentation with residual adaptation, IEEE International Conference on Multimedia and Expo (ICME 2013), pp.1-6, 15-19 July 2013.
  8. M. Winken, H. Schwarz, T. Wiegand, Motion vector inheritance for high efficiency 3D video plus depth coding, Picture Coding Symposium (PCS 2012), pp.53-56, 7-9 May 2012.
  9. G. Tech, H. Schwarz, K. Müller, T. Wiegand, 3D video coding using the synthesized view distortion change, Picture Coding Symposium (PCS 2012), pp.25-28, 7-9 May 2012.
  10. G. Tech, H. Schwarz, K. Müller, T. Wiegand, Synthesized View Distortion Based 3D Video Coding for Extrapolation and Interpolation of Views, IEEE International Conference on Multimedia and Expo (ICME 2012), pp.634-639, 9-13 July 2012.
  11. S. Bosse, H. Schwarz, T. Hinz, T. Wiegand, Encoder control for renderable regions in high efficiency multiview video plus depth coding, Picture Coding Symposium (PCS 2012), pp.129-132, 7-9 May 2012.