Tools for Parallel Processing and Ultra-Low Delay Support

New high-level syntax structures have been specifically designed for improving the parallelization processing capabilities in the implementation of video coding standards. For that purpose, partitioning of a given picture into slices, slice segments, and tiles as well as the so-called wavefront parallel processing (WPP) is supported in the H.265/HEVC standard.

The parallel processing tools allow subdivision of each picture into multiple partitions that can be processed in parallel. Each partition contains an integer number of coding tree units (CTUs) that may or may not have dependencies on CTUs of other partitions. When WPP or tiles are enabled, typically for each partition a separate slice segment subset is used such that the corresponding entry point offsets (in the slice segment header) indicate the start positions of all picture partition substreams (except for the first substream) in the slice segment. This is necessary for each core to immediately access the partition it has been assigned to decode.

When wavefront parallel processing (WPP) is enabled in HEVC, each CTU row of a picture constitutes a separate partition, such that a number of threads up to the number of CTU rows in a picture can work in parallel to process the individual CTU rows, where the number of CTU rows depends on the ratio of the picture height in luma samples and the luma CTB size in either width or height.

Each CTU row is processed relative to its preceding CTU row by using a delay of two consecutive CTUs. In this way, no dependencies between consecutive CTU rows are broken at the partition boundaries except for the CABAC context variables at the end of each CTU row. To mitigate the potential loss in coding efficiency that would result from the conventional CABAC initialization at the starting point of each CTU row, the content of the partially adapted CABAC context variables are propagated from the encoded/decoded second CTU of the preceding CTU row to the first CTU of the current CTU row, as shown in Fig. 2.

The fragmentation of slices by the use of dependent slice segments was first proposed in [3]. According to this fragmentation concept, a slice in HEVC is defined as a set of slice segments, where the first segment of a slice is the independent slice segment, followed by zero or more dependent slice segments, as exemplarily shown in Fig. 1.

References

Heiko Schwarz, Thomas Schierl, and Detlev Marpe, Block Structures and Parallelism Features in HEVC, High Efficiency Video Coding (HEVC), Algorithms and Architectures, Springer, 2014.
F. Henry and S. Pateux, Wavefront parallel processing, JCT-VC document E196. Geneva, Switzerland. March 2011.
T. Schierl, V. George, A. Henkel, and D. Marpe, Dependent Slices, JCT-VC document I0229, Geneva, Switzerland. April 2012.

Tools for Parallel Processing and Ultra-Low Delay Support

References

Dr.-Ing. Detlev Marpe

Prof. Dr.-Ing. Heiko Schwarz

Prof. Dr.-Ing. Thomas Wiegand