Video compression takes advantage of your eyes

How video compression takes advantage of your eyes

The real world inherently presents an analog, continuous stream of visual information. Due to how modern electronics works, we need to encode these analog information into digital ones. As any visual representation in the field of images and videos has humans as ultimate consumers, the solutions are equally shaped by the technological state-of-the-art and the limitations and peculiarities of the Human Visual System. The Human Visual System is the human’s visual interface with the surrounding world and having a thorough knowledge of it allows us to focus our effort on aspects that can actually be perceived by our eyes rather than ones that go unnoticed.

An example of how technology evolved in accordance to the human eyes are the color spaces that we use to digitally represent colours. The most common color spaces: sRGB, Rec. 709 and Rec. 2020 are all additive color models where the primary colours red, green and blue are summed together to obtain any other color. The choice to use red, green and blue as primary colours is not by chance but is determined by the physical characteristics of the Human Visual System. As originally described by the Young–Helmholtz theory Human eyes are said to be trichromats which means that they posses three independent channels for conveying color information. Three cells respond most to yellow, green, and violet and the reason we use RGB is because these frequencies can be efficiently stimulated using Red, Green and Blue. These cells are usually identified using respectively the letters S, M and L. The way the wavelength and the cell’s perception work is visible below.

Left: in the RGB color space, any color is created by combining different intensities of Red, Green and Blue. Right: the sensitivity of human cone cells to different frequencies of colors. Image source: Wikipedia.

When examining carefully the video and photo standards it’s easy to come to appreciate how design decisions that seem totally arbitrary at first glance are in reality dictated by how the human’s eyes and nervous system work. Video Compression Algorithms, Video Management System and video formats are no exception and to understand the state-of-the-art in these fields we also need to understand the Human Visual System internals. A gap exists between the human’s perception of reality and the physical reality: modern compression algorithms exploit this gap to dramatically reduce the amount of data that needs to be transmitted to convey a certain information. The following sections are going to analyze aspects of the Human Visual System which profoundly impact how the previously mentioned field evolved.

This post is adapted from the background chapters of my master thesis where I used video compression metadata for efficient motion detection using machine learning.

Human Perception of Luminance and Chrominance

In the field of digital videos and photos, the most used color space is YCbCr. Essentially, YCbCr is a way to encode an RGB color in a way that is more efficient for the Human Visual System.

As the name suggests, the YCbCr color space is composed of three distinct components:

Y : represents the luminance. This is the weighted sum of the individual components of RGB and is similar to the black and white version of the image being encoded. In the ITU601, the Y is calculated using the formula: Y = 0.299 × R + 0.587 × G + 0.114 × B The weight assigned to each color, maps how sensitive the Human Visual System is to that color.

Cb : represents the difference between the blue component and the luminance. When it’s positive, the color leans towards blue. When it’s negative it leans towards yellow. Cb is computed as: Cb = 0.564 × (B - Y) + 128

Cr : represents the difference between the red component and the luminance. Cr is computed as: Cr = 0.713 × (R - Y) + 128

Left: the Y, Cb and Cr channels of the top image. Right: the R, G and B channels of the right image. Image adapted from gnome.org.

The key difference between RGB and YCbCr is that in RGB, both luminance and chrominance information are distributed across all three channels, while YCbCr separates these components: luminance is entirely conveyed in the Y channel and chrominance in the Cb and Cr channels. This is, once again, dictated by the way the Human Visual System perceives the physical world. In particular, chrominance and luminance, are perceived using different kind of photoreceptors and the ones responsible for the chrominance are scarcer than the ones responsible for the perception of the chrominance. As a consequence, humans are more sensitive to variation of brightness rather than variation of color.

Under these circumstances, it is not beneficial to encode the same amount of information for chrominance and luminance as in RGB. Empirically, it is possible to see from the image above how in RGB the visual information are evenly spread among all the three channels while for YCbCr,...

Video compression takes advantage of your eyes

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

Italy's Meloni says Trump 'made up' story that she 'begged' him for photo at G7