Generic Quadtree-Based Block Partitioning in HEVC

Video coding standards such as MPEG-1 Part2, H.262/MPEG-2 Part2, or H.264/AVC divide pictures into disjoint square blocks, referred to as macroblocks. Similarly, HEVC specifies a division into square coding tree blocks (CTB). Its size has to be signaled for each video sequence. This is the point that contradicts the previous coding standards with their fixed 16×16 macroblock size in terms of luma samples. Specifically, HEVC allows a CTB size ranging from 16×16 up to 64×64 luma samples. Starting from the root, each node may be recursively split into four square coding blocks (CBs) having the same size. The resulting final tree is called coding quadtree. A further quadtree structure is nested at the leaves of the coding quadtree. It indicates further subdivisions for residual coding and is referred to as residual quadtree (RQT).

The highly variable quadtree-based structure allows a flexible adaptation to the input video signal’s characteristic. Although there is the possibility for each color component using a dedicated partitioning structure, HEVC specifies a single partitioning structure for all color components.

The Nested Quadtree Structure

The right-side of the above figure shows an example for the nested quadtree structure. Solid lines denote the coding quadtree and the dashed lines denote the nested quadtrees for residual coding. The corresponding CTB and its subdivision into coding and residual blocks is on the figure’s left-side. Again, solid lines denote the coding blocks’ bound while dashed lines denote the residual blocks’ bound, with the latter is also referred to as transform blocks due to the application of transform coding.

The coding quadtree in this example has four levels. Its root at level 0 corresponds to the full CTB size, i.e., the maximum coding block size. At level 3, the coding blocks' edge length is one eighth of the CTB edge length. The generic relation for the edge length at level i is given as follows. Let Nmax be the edge length of the CTB that is always a power of two. The edge length at level i is then 1/2i × Nmax.

CTBs are processed in raster scan order while both coding and transform blocks within CTBs are processed in depth-first raster scan orders. This results in the benefit that the top- and the left-neighboring coding blocks are always available, i.e., they have been processed before the current coding block. Hence, they can be used to facilitate the coding process of the current coding block.

Prediction Blocks and Residual Quadtree

Either intra (spatial) or inter (temporal) prediction is used for each coding block. Each coding block may be further divided into so-called prediction blocks (PBs). PB partitioning is restricted and may consist of either one PB with the same edge size as the CB, two rectangular PBs, or four square PBs. In all cases, the prediction residual signal goes through the transform coding stage, using variable-block size DCT in HEVC. With the assistance of the nested RQT, coding blocks may be further subdivided into smaller transform blocks. This leads to cases where the prediction and the transform block sizes are not equal. In a more general terms, the transform block size is always smaller or equal the prediction block size. This is shown in the above figure for the coding block labeled with "7". Transform block sizes in the range from 4×4 to 32×32 for the luma component are supported by HEVC.

  1. D. Marpe, H. Schwarz, S. Bosse, B. Bross, P. Helle, T. Hinz, H. Kirchhoffer, H. Lakshman, T. Nguyen, S.  Oudin, M. Siekmann, K. Sühring, M. Winken, and T. Wiegand, Video Compression Using Nested Quadtree Structures, Leaf Merging and Improved Techniques for Motion Representation and Entropy Coding, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 20, No. 12, pp. 1676-1687, Dec 2010.