Generic Quadtree-Based Approach for Block Partitioning

In most video coding standards like MPEG-1 Part2, H.262/MPEG-2 Part2 or H.264/AVC, a picture is divided into square macroblocks. Similarly, in HEVC a picture is divided into square coding tree blocks (CTBs). The difference to the previous standards, where the macroblock size always equals 16×16 luma samples, is that the size of the CTBs is signaled for each video sequence. From sequence to sequence, the CTB size may vary from 64×64 to 16×16 luma samples. Each CTB can be recursively split into four square coding blocks (CBs) resulting in a so-called coding quadtree. Associated with each CTB is a nested quadtree structure that indicates the subdivision of the CTB for the purpose of prediction and residual coding. The coding quadtree is specified as the outer quadtree of a nested quadtree structure, which determines how the CTB is subdivided into coding blocks, while the inner (nested) quadtree structure specifies the further division of the coding blocks into transform blocks. By dividing each picture into variable sized blocks, it is possible to adapt to the specific characteristics of the input video signal to be coded. Note that, in principle, it is possible to specify different partitionings for the individual signal components (Y, Cb, Cr), or for a group of signal components (e.g., one partitioning for luma and a different one for chroma). However, the latest draft specification of HEVC specifies one single partitioning for all signal components.

Nested Quadtree Structure

The figure above shows an example of a nested quadtree structure. The coding quadtree is shown in solid lines, the nested residual quadtrees for transform coding are shown in dashed lines. On the left-hand side, the corresponding CTB (bold lines) and its subdivision into coding blocks (solid) and transform blocks (dashed) is shown. Here, the coding quadtree (solid lines) has four levels, with the root at level 0 corresponding to the full CTB size (maximum coding block size), and with level 3 corresponding to a coding block size having an edge length of one eighth of the CTB edge length. Generally, the edge length of coding blocks at level i is always 2i ⋅ Nmax , where Nmax is the edge length of the square block of luma samples associated with the CTB. Note that Nmax is always a power of two. In the encoding and decoding process, the CTBs are processed in raster scan order, and the coding and transform blocks within each CTB are processed in depth-first order. This has the benefit, that the top and left neighboring blocks are always encoded before the current block, such that data already transmitted in these blocks can be used to facilitate encoding or decoding of the current block.

Prediction Blocks and Residual Quadtree

For each coding block, either intra (spatial) or inter (temporal) prediction is used. For that purpose, each coding block is further divided into so-called prediction blocks (PB). PB partitioning is restricted and can consist of either one PB with the same size as the CB, two rectangular PBs or four square PBs. In either case, the prediction residual, i.e., the difference between the original input signal and the prediction signal, is transform coded using variable-block size DCT. Note that, according to the residual quadtree (RQT), the coding blocks can be further subdivided into smaller transform blocks, such that the block sizes for prediction and for DCT transform coding do not have to be the same. This is shown in the above figure for the coding block labeled with 7. Transform block sizes in the range of 4×4 to 32×32 for both the luma and chroma component are supported. The transform kernel for each supported transform block size is given by a separable integer approximation of the 2D DCT-II (type-II Discrete Cosine Transform) of the corresponding block size.