Image Processing Model-aided Video Coding

Model-Aided Video Coding_text



Model-Aided Video Coding_text



In recent years, several video coding standards such as H.261, H.263, MPEG-1, and MPEG-2 have been introduced, which mainly address the compression of generic video data for digital storage and communication services. These schemes utilize the statistics of the video signal without knowledge of the semantic content of the frames and can therefore robustly be used for arbitrary scenes.

Higher coding efficiency can be expected from model-based video coders when the semantic information of the scene can be exploited. This is often done by using 3-D models that describe the shape and texture of the objects in the scene. The 3-D object descriptions are encoded only once. Individual video frames are characterized by 3-D motion and deformation of theseobjects that can be transmitted at extremely small bit-rates making the scheme amenable for very low bit-rate applications.

However, this type of coder is restricted to scenes that can be composed of objects that are known by the decoder. One typical class of scenes are head-and-shoulder sequences frequently found in applications such as video-telephone or video-conferencing systems. For such head-and-shoulder scenes bit-rates of about 1 kbit/s with acceptable quality can be achieved. This has also motivated the recently determined synthetic and natural hybrid coding (SNHC) part of the MPEG-4 standard. SNHC allows the transmission of a 3-D face model that can be animated to generate different facial expressions.

On this webpage we demonstrate an extension of an H.263 video coder that utilizes information from a model-based coder. Instead of exclusively predicting the current frame of the video sequence from the previously decoded frame, prediction from the synthetic frame of the model-based coder is additionally allowed. The coder decides which prediction is efficient in terms of rate-distortion performance. Hence, the coding efficiency does not decrease below H.263 in the case the model-based coder cannot describe the current scene. On the other hand, if the objects in the scene are compliant to the 3-D models in the codec, a significant improvement in coding efficiency can be achieved.

Experimental Results

Experiments are conducted using two self-recorded natural CIF sequences Peter and Eckehard consisting of 200 and 100 frames, respectively. Both sequences are encoded at 8.33 Hz. Rate-distortion curves are generated by varying the DCT quantizer over values 10, 15, 20, 25, and 31. Bit-streams are generated that are decodable producing the same PSNR values as at the encoder. The data for the first INTRA frame and the initial 3-D model are excluded from the results thus simulating steady-state behavior. For the transmission of the face model description that is not changed during the sequence, the position of 316 control points and a texture map of size 450 x 512 pixels have to be encoded.

We first show rate-distortion curves for the proposed coder in comparison to the H.263 test model, TMN-10. The following abbreviations are used for the two cases:

  • TMN-10: The result produced by the H.263 test model, TMN-10, using Annexes D, F, I, J, and T.
  • MAC: Model-aided coder: H.263 extended by model-based prediction.

figures



fig1



fig1



Rate-distortion plot for the sequence Eckehard (left) and Peter (right).

fig2



fig2



Frame 27 of the Eckehard sequence coded at the same bit-rate using the TMN-10 and the MAC, upper image: TMN-10 (34.4 dB PSNR, 1264 bits), lower image: MAC (37.02 dB PSNR, 1170 bits). / MPEG movie (1.8 Mb) of the sequence.

fig3



fig3



Frame 120 of the Peter sequence coded at the same bit-rate using the TMN-10 and the MAC, upper image: TMN-10 (33.88 dB PSNR, 1680 bits), lower image: MAC (37.34 dB PSNR, 1682 bits). / MPEG movie (1.9 Mb) of the sequence.

fig4



fig4



Rate-distortion plot for the sequence Akiyo.

fig5



fig5



Frame 150 of the Akiyo sequence coded at the same bit-rate using the TMN-10 and the MAC, upper image: TMN-10 (31.08 dB PSNR, 720 bits), lower image: MAC (33.19 dB PSNR, 725 bits). MPEG (1.4 Mb).

fig6



fig6



Frame 60 of the Claire sequence coded at the same bit-rate using the TMN-10 and the MAC, upper image: TMN-10 (33.21 dB PSNR, 752 bits), lower image: MAC (35.05 dB PSNR, 761 bits). Quicktime movie (2.2 Mb) of the sequence.

fig7



fig7



Frame 99 of the Illumination sequence coded at the same bit-rate using the TMN-10 and the MAC, upper image: TMN-10 (31.91 dB PSNR, 1960 bits), lower image: MAC (34.55 dB PSNR, 1868 bits). Quicktime movie (2.1 Mb) of the sequence.

fig8



fig8



Head-and-shoulder scene with an object hiding the face. lower left image: MAC (36.54 dB PSNR, 12.38 kbps) lower right image: TMN-10 with same bit-rate than MAC (33.54 dB PSNR, 12.27 kbps) upper left image: TMN-10 with same PSNR than MAC (36.43 dB PSNR, 18.13 kbps) upper right image: Model-based coder (about 0.5 kbps) Quicktime movie (5.8 Mb) of the sequence. (Separate Quicktime movie (2.8 Mb) or MPEG movie (2.3 Mb) of the lower two frames. The original frames and the corresponding model frames are shown in this MPEG movie (1.5 Mb))