Lecture 16 - Introduction to Image and Video Coding【Lecture Notes】
This article is a lecture note for Lecture - 16 Introduction to Image and Video Coding
Intro
In previous lectures we have covered various speech codings:
- Waveform-based approach: PCM, APCM, DPCM, etc.
- In which 1-dimensional signals variable respect to time
- Parametric approach: LPC Analysis
How does image coding works?
We are taking an image as a 2-dimensional array, and the signal is represented as \(s(n_1, n_2)\), which \(n_1\) and \(n_2\) are pixel indices.
Instead of having variability respect to time, image has variability respect to space.
\(s(n_1, n_2)\) is the sampled form of the analog 2D waveform.
Then how about videos?
Video could be taken as a sequence of images, which contains a 3rd dimension: time.
It has variability respect to time.
We could denote this as \(s(n_1, n_2, k)\), where \(k\) indicates the frame number.
The fundamental bitrate in case of image and video is much higher than that of the speech signal.
Therefore, image and video must be compressed to a very significant extent.
Image Coding
The redundancy in almost all types of natural images is very high.
Basically there are two types of redundancy in images:
- Statistical redundancy: The redundancy existing in space in between the image samples which are in he close neighborhood
- Phychovisual redundancy: Information that is relatively unimportant to the human visual system
How do we make use of these types of redundancy?
First suppose we have a patch of images which is represented by numbers.
We could scan these numbers from the top left to the bottom right, which is called row-scan ordering.
Therefore a sequence of one dimensional pixels will be available to us, and it should be possible for us to exploit the redundancy.
A Generic Block Diagram of Image processing
-------------- ----------- -----------------
s(n1, n2) --> | Transformer | --> | Quantizer | --> | Entropy Encoder |
-------------- ----------- -----------------
Lossless Lossy Loseless
[Statistical] [Psychovisual]
Every symbol at the quantizer output will be associated with some probability of occurence, and based on the probability of occurence we could compute the entropy.
According to Shannon’s entropy coding theorem, the minimum attainable bit rate that a system can have is going to be its entropy, so we cannot compress beyond its entropy value.
The diagram above is called our encoder, and the decoder will do the exact reversal of this.
------------------ -------------- --------------------
--> | Entropy Decoder | --> | Dequantizer | --> | Invert Transformer | -> s(n1, n2)
------------------ -------------- --------------------
Lossless Compression & Lossy Compression
Some examples of lossless compression:
- Huffman Coding: Choose a representation for each symbol, resulting in a prefix code
- Arithmetic Coding: Encode the sequence using a number
- Lempel - Ziv Coding
An example of arithmetic coding:
Demerit of using lossless compression:
- Only achieve a limited amount of compression using the lossless techniques
- Actually it is hard to detect the degradation in the quality of the image
- Accepting some lossy image compression scheme will lead to more efficiency.
Transformation Block
In order to know what lossy element is, we need to dig deeper in the transformation block.
What the transformation process tends to do is to exploit the correlation that exists in the signal. This is called energy packing.
Transformer should offer some energy compaction. Energy compaction means that the energy is more concentrated using only a few spectral coefficients.
Two requirements for transformation:
- It must be energy preserving
- It must be an orthogonal transform
Some popular transformations:
- Karhunen-Loeve transform
- Discrete Fourier Transform
- Discrete Cosine Transform
- Discrete Wavelet Transform