Key Picture Concept in SVC

The key picture concept in SVC improves the coding efficiency for packet-based quality scalable coding:

Quality scalability can be considered as a special case of spatial scalability with identical picture sizes for base and enhancement layer. This case, which is also referred to as coarse-grain quality scalable coding (CGS), is supported by the general concept for spatial scalable coding. The same inter-layer prediction mechanisms are employed, but without using the corresponding upsampling operations. When utilizing inter-layer prediction, a refinement of texture information is typically achieved by re-quantizing the residual texture signal in the enhancement layer with a smaller quantization step size relative to that used for the preceding CGS layer. As a specific feature of this configuration, the deblocking of the reference layer intra signal for inter-layer intra prediction is omitted. Furthermore, inter-layer intra and residual prediction are directly performed in the transform coefficient domain in order to reduce the decoding complexity.

The CGS concept only allows a few selected bit rates to be supported in a scalable bit stream. In general, the number of supported rate points is identical to the number of layers. Switching between different CGS layers can only be done at defined points in the bit stream. Furthermore, the CGS concept becomes less efficient, when the relative rate difference between successive CGS layers gets smaller. Especially for increasing the flexibility of bit stream adaptation and error robustness, but also for improving the coding efficiency for bit streams that have to provide a variety of bit rates, a variation of the CGS approach, which is also referred to as medium-grain quality scalability (MGS), is included in the SVC design. The differences to the CGS concept are a modified high-level signalling, which allows a switching between different MGS layers in any access unit, and the so-called key picture concept, which allows the adjustment of a suitable trade-off between drift and enhancement layer coding efficiency for hierarchical prediction structures.

Drift describes the effect that the motion-compensated prediction loops at encoder and decoder are not synchronized, e.g., because quality refinement packets are discarded from a bit stream. Figure 1 illustrates different concepts for trading off enhancement layer coding efficiency and drift for packet-based quality scalable coding.

  • Base layer only control: For fine-grain quality scalable (FGS) coding in MPEG-4 Visual, the prediction structure was chosen in a way that drift is completely omitted. As illustrated in Figure 1(a), motion compensation in MPEG-4 FGS is only performed using the base layer reconstruction as reference, and thus any loss or modification of a quality refinement packet doesn’t have any impact on the motion compensation loop. The drawback of this approach, however, is that it significantly decreases enhancement layer coding efficiency in comparison to single-layer coding. Since only base layer reconstruction signals are used for motion-compensated prediction, the portion of bit rate that is spent for encoding MPEG-4 FGS enhancement layers of a picture cannot be exploited for the coding of following pictures that use this picture as reference.
  • Enhancement layer only control: For quality scalable coding in H.262/MPEG-2 Video, the other extreme case of possible prediction structures was specified. Here, the reference with the highest available quality is always employed for motion-compensated prediction as depicted in Figure 1(b). This enables highly efficient enhancement layer coding and ensures low complexity, since only a single reference picture needs to be stored for each time instant. However, any loss of quality refinement packets results in a drift that can only be controlled by intra updates. – It should be noted that H.262/MPEG-2 Video does not allow partial discarding of quality refinement packets inside a video sequence, and thus the drift issue can be completely avoided in conforming H.262/MPEG-2 Video bit streams by controlling the reconstruction quality of both the base and the enhancement layer during encoding.
  • Two-loop control: As an alternative, a concept with two motion compensation loops as illustrated in Figure 1(c) could be employed. This concept is similar to spatial scalable coding as specified in H.262/MPEG-2 Video, H.263, and MPEG-4 Visual. Although the base layer is not influenced by packet losses in the enhancement layer, any loss of a quality refinement packet results in a drift for the enhancement layer reconstruction.
  • SVC key picture concept: For MGS coding in SVC an alternative approach using so-called key pictures (see Figure 1(d)) has been introduced. For each picture a flag is transmitted, which signals whether the base quality reconstruction or the enhancement layer reconstruction of the reference pictures is employed for motion-compensated prediction. In order to limit the memory requirements, a second syntax element signals whether the base quality representation of a picture is additionally reconstructed and stored in the decoded picture buffer. In order to limit the decoding overhead for such key pictures, SVC specifies that motion parameters must not change between the base and enhancement layer representations of key pictures, and thus also for key pictures, the decoding can be done with a single motion-compensation loop.

Figure 1(d) illustrates how the key picture concept can be efficiently combined with hierarchical prediction structures. All pictures of the coarsest temporal layer are transmitted as key pictures, and only for these pictures the base quality reconstruction is inserted in the decoded picture buffer. Thus, no drift is introduced in the motion compensation loop of the coarsest temporal layer. In contrast to that, all temporal refinement pictures typically use the reference with the highest available quality for motion-compensated prediction, which enables a high coding efficiency for these pictures. Since the key pictures serve as re-synchronization points between encoder and decoder reconstruction, drift propagation is efficiently limited to neighboring pictures of higher temporal layers. The trade-off between enhancement layer coding efficiency and drift can be adjusted by the choice of the GOP size or the number of hierarchy stages. It should be noted that both the quality scalability structure in H.262/MPEG-2 Video (no picture is coded as key picture) and the FGS coding approach in MPEG-4 Visual (all pictures are coded as key pictures) basically represent special cases of the SVC key picture concept. With the MGS concept, any enhancement layer NAL unit can be discarded from a quality scalable bit stream, and thus packet-based quality scalable coding is provided.


  1. H. Schwarz, T. Hinz, H. Kirchhoffer, D. Marpe, and T. Wiegand, "Technical Description of the HHI proposal for SVC CE1," ISO/IEC JTC1/SC29/WG11, doc. M11244, Palma de Mallorca, Spain, Oct. 2004.
  2. H. Schwarz, D. Marpe, and T. Wiegand, "Overview of the Scalable Video Coding Extension of the H.264/AVC Standard," IEEE Trans. on Circuits and Systems for Video Technology, Sept. 2007.