SVC Extension of H.264/AVC

The Scalable Video Coding amendment (SVC) of the H.264/AVC standard provides network-friendly scalability at a bit stream level with a moderate increase in decoder complexity relative to single-layer H.264/AVC. It supports functionalities such as bit rate, format, and power adaptation, graceful degradation in lossy transmission environments (cp. Figure 1) as well as lossless rewriting of quality-scalable SVC bit streams to single-layer H.264/AVC bit streams. These functionalities provide enhancements to transmission and storage applications. SVC has achieved significant improvements in coding efficiency with an increased degree of supported scalability relative to the scalable profiles of prior video coding standards.

The desire for scalable video coding, which allows on-the-fly adaptation to certain application requirements such as display and processing capabilities of target devices, and varying transmission conditions, originates from the continuous evolution of receiving devices and the increasing usage of transmission systems that are characterized by a widely varying connection quality. Video coding today is used in a wide range of applications ranging from multimedia messaging, video telephony and video conferencing over mobile TV, wireless and Internet video streaming, to standard- and high-definition TV broadcasting. In particular, the Internet and wireless networks gain more and more importance for video applications. Video transmission in such systems is exposed to variable transmission conditions, which can be dealt with using scalability features. Furthermore, video content is delivered to a variety of decoding devices with heterogeneous display and computational capabilities (see Figure 2). In these heterogeneous environments, flexible adaptation of once-encoded content is desirable, at the same time enabling interoperability of encoder and decoder products from different manufacturers.

Scalability has already been present in the video coding standards MPEG-2 Video, H.263, and MPEG-4 Visual in the form of scalable profiles. However, the provision of spatial and quality scalability in these standards comes along with a considerable growth in decoder complexity and a significant reduction in coding efficiency (i.e., bit rate increase for a given level a reconstruction quality) as compared to the corresponding non-scalable profiles. These drawbacks, which reduced the success of the scalable profiles of the former specifications, are addressed by the new SVC amendment of the H.264/AVC standard.

Types of Scalability

The main supported scalability types are:

Performance

The diagrams in Figure 3 show two examples, where the coding efficiency of SVC is compared with that of single-layer H.264/AVC coding. The coding efficiency is measured in terms of bit rate and average luma peak signal-to-noise ratio (PSNR) of the video frames. Temporal scalability with 5 levels is provided by all bit streams used for comparison; it does not have any negative impact on the rate-distortion results and is already supported in single-layer H.264/AVC. For both the SVC bit streams and the H.264/AVC bit streams the same basic encoder configuration was used and a similar type of encoder optimization was applied. For the SVC bit streams, an additional cross-layer optimization was applied that enabled to trade off the coding efficiency of base and enhancement layer.

In Figure 3(a), SVC with quality scalability is compared to H.264/AVC single layer coding. All SVC rate-distortion points are extracted from one single bit stream, while for single layer coding each rate-distortion point represents a separate non-scalable bit stream. In Figure 3 (b), SVC with spatial scalability is compared to H.264/AVC single layer coding and simulcast of two spatial resolutions with H.264/AVC. The comparison shows that SVC can provide a suitable degree of scalability at the cost of approximately 10% bit rate increase in comparison to the bit rate of single-layer H.264/AVC coding. This bit rate increase usually depends on the degree of scalability, the bit rate range, and the spatial resolution of the included representations. The comparison also shows that SVC is clearly superior to simulcasting single-layer H.264/AVC streams for different spatial resolutions or bit rates.

Contributions of the Image & Video Coding Group

The image and video coding group contributed the following techniques to the SVC design and encoding concepts:

  • The first SVC Model that became the first Working Draft
  • Hierarchical prediction structures for providing temporal scalability, generally improving the coding efficiency, and improving the effectiveness for inter-layer prediction tools in spatial and quality scalable coding
  • Inter-layer prediction tools for spatial and quality scalable coding
  • The concept of decoding enhancement layers with a single motion compensation loop as part of the inter-layer prediction design
  • The key picture concept for efficiently controlling the drift in packet-based quality scalable coding
  • The concept of transform coefficient partitioning for increasing the granularity of packet-based quality scalable coding
  • An RD-optimized multi-layer encoder control for improving the coding efficiency of SVC encoders