Wavefronts for HEVC Parallelism

Unlike H.264/Advanced Video Coding (AVC), where parallelism was an afterthought, the current High Efficiency Video Coding (HEVC) draft contains several proposals aiming at making the codec more “parallelizable.” H.264/AVC supports slices, which were mainly introduced to prevent loss of quality in the case of transmission errors, but can also be used to parallelize the decoder. Employing slices for parallelism, however, introduces several problems. First, using many slices to increase parallelism incurs significant coding losses. Second, the number of slices is determined by the encoder. If the decoder relies on slices to obtain real-time performance, it may not achieve such a performance if it receives a video sequence with one (or few) slice(s) per frame. One of the two parallelization approaches included in HEVC is Wavefront Parallel Processing (WPP). WPP allows creation of picture partitions that can be processed in parallel without incurring high coding losses.

In WPP, rows of treeblocks are processed in parallel while preserving all coding dependencies. Because processing of a treeblock requires the left, top-left, top, and top-right treeblocks to be available for predictions to operate correctly, a shift of at least two treeblocks is enforced between consecutive rows of treeblocks processed in parallel. Therefore, WPP--compared to tiles in the non–cross border filtering mode—requires additional inter-core communication. Typically, inter-core communication is not a burden for today’s multi-core processor architectures and WPP is therefore suitable for soft- and hardware implementations. In particular, implementations of WPP are straightforward because WPP does not affect the ability to perform single step processing (i.e., entropy coding, predictive coding, as well as in-loop filtering can be applied in a single processing step). An example use case for WPP is high-quality streaming over robust channels. In combination with dependent slices, this tool can be also used in ultra–low delay applications.

Overlapped Wavefront (OWF) allows for overlapping the execution of consecutive pictures using wavefronts. When a thread has finished a treeblock row in the current picture and no more rows are available, it can start processing the next picture instead of waiting for the current picture to finish. 

 

References

  • Chi, C., Alvarez-Mesa, M., Juurlink, B., Clare, G., Henry, F., Pateux, S., Schierl, T. (2012). Parallel Scalability and Efficiency of HEVC Parallelization Approaches, IEEE Transactions on Circuits and Systems for Video Technology, IEEE TCSVT, Special Issue on Emerging Research and Standards in Next Generation Video Coding.
  • Chi, C., Alvarez-Mesa, M., Juurlink, B., George, V., Schierl, T. (2012). Improving the Parallelization Efficiency of HEVC Decoding, Proceedings of IEEE International Conference on Image Processing (ICIP 2012), Orlando, FL, USA.
  • Alvarez-Mesa, M., Chi, C., Juurlink, B., George, V., Schierl, T. (2012). Parallel Video Decoding in the Emerging HEVC Standard, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, Japan.