The HEVC extension for 3D video coding supports the coding of multiple views and associated depth data. It adds new coding tools to the HEVC design, which improve the compression capabilities for dependent video views and depth data.
Recent improvements in 3D video technology led to a growing interest in 3D video. The number of cinema screens capable of showing 3D movies as well as the number of movies produced in 3D has been constantly increased in recent years. With the availability of 3D-capable TV sets and Blu-ray players, the introduction of first 3D broadcast channels, and the release of 3D Blu-ray discs it has also been started to bring 3D video into consumers’ homes. Autostereoscopic displays, which provide a 3D viewing experience without glasses, are consistently improved and are considered as a promising technology for future 3D home entertainment. In contrast to common stereo displays, autostereoscopic displays require not only two, but a multitude of different views for providing the 3D viewing experience. Since the bit rate required for coding multiview video with the MVC extension of H.264/AVC increases approximately linearly with the number of coded views, MVC is not appropriate for delivering 3D content for autostereoscopic displays.
A promising alternative is the transmission of 3D video in the Multiview Video plus Depth (MVD) format. In the MVD format, typically only a few views are actually coded, but each of them is associated with coded depth data, which represent the basic geometry of the captured video scene. Based on the transmitted video pictures and depth maps, additional views suitable for displaying 3D video content on autostereoscopic displays can be generated using depth image based rendering (DIBR) techniques at the receiver side.
The Image and Video Coding Group and the 3D Coding Group developed an extension of HEVC for coding of 3D video data in the MVD format. The basic structure of the 3D video codec is shown in Figure 1. Similar as for MVC, all video pictures and depth maps that represent the video scene at the same time instant build an access unit and the access units of the input MVD signal are coded consecutively. Inside an access unit, the video picture of the so-called independent view is transmitted first directly followed by the associated depth map. Thereafter, the video pictures and depth maps of other views are transmitted. A video picture is always directly followed by the associated depth map. In principle each component signal is coded using an HEVC-based coder. The corresponding bitstream packets are multiplexed to form the 3D video bitstream. The independent view is coded using a non-modified HEVC coder. The corresponding sub-bitstream can be extracted from the 3D bitstream, decoded with an HEVC decoder, and displayed on a conventional 2D display. The other components are coded using modified HEVC coders, which are extended by including additional coding tools and inter-component prediction techniques that employ already coded data inside the same access unit as indicated by the red arrows in Figure 1. For enabling an optional discarding of depth data from the bitstream, e.g., for decoding a two-view video suitable for conventional stereo displays, the inter-component prediction can be configured in a way that video pictures can be decoded independently of the depth data.
While one of the video views (the base view) is coded with HEVC, additional tools have been developed for coding the dependent video views as well as the depth data.
For dependent video view, the following tools have been added:
- Disparity-compensated prediction (as known from MVC)
- Inter-view prediction of motion parameters
- Inter-view prediction of residual data
For coding of depth data, the following tools have been added:
- Disparity-compensated prediction for depth data of dependent views
- Decreased motion parameter accuracy
- New modes for intra prediction and inter-component prediction
- A new mode for inheriting the motion parameters from the associated video view
- A new encoder control concept for depth data that estimates the distortion in synthesized views instead of using the distortion in the depth domain
Furthermore, for increasing the end-to-end quality of a 3D video coding system, we investigated:
- Improvements for a decoder-side view synthesis based on depth data
- A depth-aware encoder control that encodes areas in dependent view that can be synthesized using the base view with a smaller fidelity
The 3D HEVC extension has been proposed to MPEG and VCEG and was chosen as the starting point for the development of an HEVC-based 3D video coding standard.
References
- H. Schwarz, C. Bartnik, S. Bosse, H. Brust, T. Hinz, H. Lakshman, D. Marpe, P. Merkle, K. Müller, H. Rhee, G. Tech, M. Winken, and T. Wiegand, "3D Video Coding Using Advanced Prediction, Depth Modeling, and Encoder Control Methods", IEEE Intl. Conf. on Image Processing, Oct. 2012.
- G. Tech, H. Schwarz, K. Müller, and T. Wiegand, "Effects of synthesized View Distortion based 3D Video Coding on the Quality of interpolated and extrapolated Views", IEEE Intl. Conf. on Multimedia and Exposition, July 2012.
- H. Schwarz, C. Bartnik, S. Bosse, H. Brust, T. Hinz, H. Lakshman, D. Marpe, P. Merkle, K. Müller, H. Rhee, G. Tech, M. Winken, and T. Wiegand, "3D Video Coding Using Advanced Prediction, Depth Modeling, and Encoder Control Methods", Picture Coding Symposium, May 2012.
- P. Merkle, C. Bartnik, K. Müller, D. Marpe, and T. Wiegand, "3D Video: Depth Coding Based on Inter-component Prediction of Block Partitions", Picture Coding Symposium, May 2012.
- H. Schwarz and T. Wiegand, "Inter-View Prediction of Motion Data in Multiview Video Coding", Picture Coding Symposium, May 2012.
- G. Tech, H. Schwarz, K. Müller, and T. Wiegand, "3D Video Coding using the Synthesized View Distortion Change", Picture Coding Symposium, May 2012.
- M. Winken, H. Schwarz, and T. Wiegand, "Motion Vector Inheritance for High Efficiency 3D Video plus Depth Coding," Picture Coding Symposium, May 2012.
- S. Bosse, H. Schwarz, T. Hinz, T. Wiegand, "Encoder Control for Renderable Regions in High Efficiency Multiview Video Plus Depth Coding", Picture Coding Symposium, May 2012.