Understanding transform, quantisation and entropy encoding in H.264 video compression

Download PDF version Contact company

Understanding the complexities of the transform matrix

In the concluding part of this 3-part review of the H.264 video compression standard, Kate Huber, Peter de Konink and Piet Nieuwets of Siqura discuss the transform, quantisation and entropy encoding - various block-encoding steps following motion estimation.

Describing data in the transform matrix

In contrast to the motion estimation step, the transform phase in the encoding process is relatively similar in H.264 and the MPEG standards. At this point in encoding, all the residual data collected during motion estimation is described using the Discrete Cosine Transformation (DCT) method.

Initially, information from each residual block is depicted as one 16x16 pixel brightness (luma) block and two 8x8 pixel colour (chroma) blocks. These image data blocks are analyzed and replaced by a DCT pattern with corresponding coefficients that precisely represent the original information. This transform process results in a matrix of coefficients reflecting the amount of data to be encoded. So, the fewer matrix values there are and the lower their values, the less residual image data there is. Accordingly, this results in a better image with fewer bits.

How transform works is easy to understand if you think about it in terms of modern art. It's quite easy to describe a new painting that is just a canvas covered in solid blue paint. It is, however, much more difficult to tell someone in detail about the intricacies of a Jackson Pollock painting.

Similarly, DCT coefficients ideally describe a solid grey block, or a block with little or no residual data. The more coefficients that need to be used, the more residual details there are.

Quantisation and the Q value: Controlling the bit rate

H.264 quantisation also doesn't differ all that much from MPEG-2/4 quantisation. This step in the video encoding process consists of first dividing the transform coefficients by a dynamic Q value, used to manage the size of the bit stream, and then discarding trivial coefficients in a specified value range by reducing them to zero.

view larger image
The Q value varies in order to control the bit rate

The transform coefficients are initially divided by the Q value, which varies depending how large the bit rate is allowed to be. A higher Q value results in lower coefficients and fewer bits, but it also diminishes the quality of the image.

In scalar quantisation, values within a predetermined range around zero are deemed inconsequential and are therefore reduced to zero. This lowers the bit rate without necessarily impacting the perceived quality of the image.

Both the transform and quantisation stages depend on an adept motion estimation process. The advancements H.264 encoding makes in motion estimation are what, in the end, lower the residual image data and allow high quality images to be transformed and quantised to coefficients nearing zero. Therefore, improvements in motion estimation are what ultimately allow better video quality at a lower bit rate. However, H.264 encoding includes additional developments that augment the effectiveness of this streaming standard.

view larger image
Quantisation condenses a value range of insignificant transform coefficients by reducing them to zero

Recognising and reducing data repetition

The next step in block-based encoding is entropy encoding. At this point, data is prepared for transmission in such a way that it can be reconstructed in its entirety by the decoder. This is also known as lossless encoding. Entropy encoding is carried out with the help of a variable length encoder (VLC), which condenses the bit rate by recognising frequently recurring data patterns and replacing them with simplified instructions, or codewords.

In MPEG-2/4, the VLC sends every value in the quantised transform matrix to the decoder. H.264 alternatively offers more varied and advanced entropy encoder options in two types of VLCs: the Context-Adaptive Variable-Length Codes (CAVLC) and the Context Based Arithmetic Coding (CABAC). While CAVLC only compresses data for the quantised transform coefficients, CABAC compresses all data streamed to the decoder into codewords.

view larger image
Variable-length encoders condense recurring data into codewords

H.264 VLCs ultimately make streaming redundant data more efficient even though they increase the processing power requirements. CAVLC and CABAC reduce the bit rate by adapting to repeatedly received data sequences when that is statistically proven to be more efficient. So, knowing how and when to implement a particular VLC is just another challenge put to H.264 engineers.

A simple example may help to explain how CAVLC works. Suppose that every time you said, "I'd like a cup of coffee", you received one, and so, after a while, you started just saying "I'd like". While this is a very easy way to satiate your coffee craving, should you ever just want a glass of water, you would need to explain yourself without saying "I'd like".

CAVLC works similarly. If the entropy encoder receives recurring data patterns, it replaces them with a codeword, like 1. However, other sequences then need to be described without using a 1. This can sometimes lead to longer codewords in unique data streams.

Blurring block borders in the video encoding process

One problem plaguing MPEG-2/4 encoders is errors in the image data caused by macroblock edges that are incongruous with adjacent blocks. This is not only disruptive to the viewer but it can also hinder motion estimation.

While these block-edge blunders are easily evened out with a deblocking filter, MPEG-2/4 only applies this deblocking filter in the decoding process. Although the deblocking filter more or less erases blocky edges for the viewer, the original distortions still impede motion estimation during encoding because the reference frames retain the block-edge errors. This ultimately reduces the efficiency of residual data recognition and, therefore, it also diminishes the efficiency of the encoder in general.

H.264 provides a solution for both the visual effects of block edges as well as the implications they can have for motion estimation by applying a deblocking filter in the encoding process, also known as in-loop deblocking. This allows motion estimation to use reconstructed frames when searching reference frames rather than the initial frames from the camera, thereby reducing the discovered residual differences. In loop deblocking thus further facilitates adroit motion estimation in H.264 encoding.

	Kate Huber Technical Writer Siqura
	Peter de Konink Product Line Manager, Codec/Analytics Siqura
	Piet Nieuwets Senior Hardware Engineer Siqura

Download PDF version Download PDF version

Related companies
Siqura B.V.

View all news from
Siqura B.V.

In case you missed it

Upskilling and evolving: the changing role of systems integrators

Technology advances in the security industry are transforming the way modern systems are designed and installed. Customers today are looking for greater scalability and flexibility...

Hanwha Vision enhances security at Naval Park, New York

The Buffalo and Erie County Naval & Military Park, in Buffalo, N.Y., has a clear mission based on four pillars: to honour all veterans, educate the public about the sacrifices...

Verkada Enhances security for Salvation Army operations

The Salvation Army's mission is to meet human needs wherever and whenever necessary. Their presence spans 13 Western states, plus Guam, the Marshall Islands, and Micronesia. This...