In conventional encoder control algorithms, the distortion between an original picture s and the reconstructed picture s′ is measured by the sum of squared Errors $DSSE(s,s')$ or its normalized logarithm, the PSNR. However, it is well known that the PSNR in general does not correlate well with subjective judgment of image quality. We developed an encoder control algorithm that is based on a weighted distortion measure where the weights are spatio-temporally varying and reflect the local visual error sensitivity of the underlying content. For the definition of the weighted distortion measure $D_{WSSE}$, we partition a video frame into blocks $B_k$ and define

$$ D_{WSSE}(s,s')=\sum_{k}w_k(s)\cdot D_{SSE,k}(s,s').$$

Here, $D_{SSE,k}$ is the sum of squared errors on $B_k$ and w_{k}(s) are local weights. The less activity, measured by a high-pass filter in the spatial or temporal direction, is present on $B_k$, the larger the weights w_{k}(s) become. This reflects the well known fact that, due to reduced perceptual masking capabilities, low-frequency content is subjectively more sensitive to reconstruction errors than high-frequency content.

The well-established block-based encoder control for the unweighted sum of squared errors can still be used for our perceptual error measure. The only difference is that the Lagrangian parameter is spatio-temporally varying depending on the weights $w_k$. More precisely, if $\lambda$ is a fixed Lagrangian parameter that defines an operational point on the rate-distortion curve for the error measure $D_{WSSE}$ and if $R_k$ denotes the rate on the block $B_k$, then under some simplified assumptions, on each block $B_k$_{, }the encoder needs to minimize the well-known term

$$D_{SSE,k}(s,s')+\lambda_k\cdot R_k,\:\:\:\lambda_k=\frac{\lambda}{\omega_k}.$$

The effect of our perceptually optimized encoder control is illustrated below. In the middle image, which arises using a conventional encoder control, highly visible transmission errors in the water on the top right part can be observed. With our perceptutally optimized encoder control, such errors are no longer visible, as can be seen from the bottom picture.

## References

- C. R. Helmrich, S. Bosse, M. Siekmann, H. Schwarz, D. Marpe, and T. Wiegand, “Perceptually Optimized Bit Allocation and Associated Distortion Measure for Block-Based Image or Video Coding,” in Proc. IEEE Data Compression Conf. (DCC), Snowbird, Mar. 2019.
- J. Erfurt, C. R. Helmrich, S. Bosse, H. Schwarz, D. Marpe, and T. Wiegand, “A Study of the Perceptually Weighted Peak Signal-to-Noise Ratio (WPSNR) for Image Compression,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Taipei, Sep. 2019.