Skip to content

Data Compression

Source Coding

  • Shannon's Source Coding, 信源編碼、符號源編碼
  • information mapping (bits, characters, ....)

Entropy

Self-Information

  • by three assumptions
    1. I(p)0
    2. I(p1p2)=I(p1)+I(p2)
    3. I(p) is continuous to p

thus, gives I(p)=log(p)

  • proof at p.25

Entropy

Given,

S={si|p(si)=pi}

then,

Hr(S)=piI(si)=pilogr(pi)
  • note that r is the base of log

Gibbs' Inequality

pilog(pi)pilog(qi)
  • the lower bound of cross entropy of qi and pi are H(pi).

Unique Decodable & Instantaneous Code

  • p32

Unique Decodable

  • not unique decodable example
(s1,s2,s3,s4)=(0,01,11,00)

then for the message 0011 could be decoded as

0011={s4s3s1s1s3

Instantaneous Code

  • instantaneous iff 沒有一個符號是另一個符號的字首
  • instantaneous ⇒ unique
  • the lengths of codes have to follow Kraft Inequality

Kraft Inequality

  • If instantaneous, then the code have to end at leaves.
    thus, considering binary case only first we cannot build a binary tree for many short path(short coding length).
    Therefore, in order to build an binary tree, we must follow
12Li1
  • intuitive thinking – Considering taking a branch of binary tree have prob. 0.5 and 0.5, then the summation of prob. of all leaves is 1.

  • intuitively, the lowest bound of average lengths of codes is its entropy. (proof at p36.)

Hr(S)Lavg

Shannon-Fano Coding

  • Since we know the lower bound of coding length is its entropy, then we can have the coding length (known as Shannon-Fano Length) as
log(pi)i<log(pi)+1i=log(pi)
  • drawback example
    Given S={s1,s2}, and k is large
p1=2k,p2=1p1l1=k,l2=1
  • in Huffman coding, both l1,l2=1, but since p1 is small when k is large, this drawback isn't very critical.

Extension Code

  • quick concept – if we have an source code S, then we can define an new source code T=Sn (then the # of symbol in T would be |S|n)
  • The H(T)=nH(S), and denote Ln(average length of n-order S.F. code), then
H(T)Ln<H(T)+1H(S)Lnn<H(S)+1n

thus when n, then LnnH(S)
aka Shannon's noiseless coding theorem.

Adaptive Huffman

  1. start with one EOF and one ESC (always one in Huffman tree).
  2. EOF – send when end of file.
  3. ESC – send when new symbol is added to tree (send following with ascii of that symbol so that decoder can know what to decode).

JBIG & JBIG2

  • code for binary data directly (S{0,1})

  • JBIG – high order adaptive arithmetic (for P(st|s(tn):(t1)).

    • since the target is binary, the high order coding table would be 2n, much less than high order Huffman.
  • JBIG2 – Define decoding protocol only. That is the encoder side can be any algorithm, even lossy.

LZ

LZ77

  • sliding window and look-ahead buffer
  • find phrase in window, and encode text in look-ahead buffer
  • send (displacement, length, next_char)
  • in no match case, send (0, 0, next_char)
  • pros
    • fast decode
  • cons
    • slow encode
      • O(n), (n is windows size)
      • O(m), (m is look-ahead buffer size)
    • worst case when no match
    • prefer larger window size but cost too much
      • time efficiency ↓
      • compressed phrases size ↑
    • loss memory (after dictionary is full of phrases, some of them have to be removed)

LZSS

  • improving version of LZ77
  • send 1 bit indicating whether is no match instead of send (0, 0 next_char)
  • Circular queue with WINDOW_SIZE = 2n
  • Binary Search Tree for storing phrases (actually storing pointer to particular position of window).
    • Node stores fixed-size length
    • e.g. ("LZSS is better than LZ77") with fixed length 5, (search by comparing char-wise order)
graph TD
root(LZSS^)
l(^is^b)
ll(^bett)
lr(is^be)
r(ZSS^i)
rl(SS^is)
rr( )
rll(S^is^)
rlr(tter^)
root --- l
root --- r
l --- ll
l --- lr
r --- rl
r --- rr
rl --- rll
rl --- rlr

LZ78

  • create new phrase in dictionary at both encoder size and decoder size whenever no match
    1. initially, there is only null string in dictionary.
    2. thus whenever create a new phrase with length n, there must exists a phrase with length n1.
    3. therefore, using multiway search tree to store phrases

graph TD
r('\0' <br> 0)
p1("'D' <br> 1")
p2("'A' <br> 2")
p3("'^' <br> 3")
p4("'A' <br> 4")
p5("'^' <br> 5")
p6("'D' <br> 6")
p7("'Y' <br> 7")
p8("'^' <br> 8")
p9("'O' <br> 9")

r --- p8
r --- p2
r --- p1
p1 --- p3
p1 --- p4
p1 --- p7
p4 --- p5
p4 --- p6
p6 --- p9
- in this case p1 = D, p3 = D^, p5=DA^, and so on...

  • example at p119
    • thus encoder side only have to send phrase code (phrase id), without phrase's length.
      • i.e. (phrase_id, next_char)
  • cons
    • slow decoding (have to maintain dictionary tree)

LZW

  • improving version LZ78
    • defined every characters as phrases first, then send only (phrase_id, )
  • examples: PNG, ...
  • algorithm p122
    • encoder:
      i := 0;
      Dictionary phrases;
      string in_buff;
      in_buff.Add(str[i])
      
      while
          in_buff.Add(str[++i])
          if in_buff not in phrases
              phrases.Add(in_buff)
              OUTPUT << phrases.IndexOf(in_buff.PopLast())
              in_buff := string(in_buff[-1])
      
    • decoder
      i := 0
      Dictionary phrases;
      int in_buff;
      string last_phrase
      
      // init
      INPUT >> in_buff
      last_phrase := phrases[in_buff]
      OUTPUT << last_phrase
      
      while true
          INPUT >> in_buff
          OUTPUT << phrases[in_buff]
          phrases.Add(last_phrase + phrases[in_buff][0])
          last_phrase = phrase[in_buff]
      

Lossy

  • RMSE (Root Mean Square Error)
  • SNR (Signal-to-Noise Ratio)

    SNR=E[S2]E[(Xμ)2]=E[S2]σr2SNRdB=10log10SNR
    • in 2d media case
SNR=Q2σr2

in which Q=255 in 8-bits case.

DM

  • Delta Modulation
  • Adaptive DM (ADM)
    • adaptive for the magnitude of Δ

DPCM

  • Differential Pulse Code Modulation

Predictor Optimization

  • objective
x^m=argminx^mσe2=argminx^mE[(xmx^m)2]

in which,

x^m=i[0,m)αixi

then solve by differentiation, we have

E[(xmx^m)xi]=0

thus

E[xmxi]=E[x^mxi]Rmi=E[x^mxi]

and further when i=m,

E[xm2]=E[x^mxm]σe2=E[(xmx^m)2]=E[(xmx^m)x^m]=E[xm2]E[x^m2]σ2<E[xm2]
  • p141

Quantizer Optimization

  • objective (N is number of order of quantizer)
D=i[0,N)didi+1p(e)(eri)2de
  • p143

Adaptive DPCM (ADPCM)

  • 映射量化器
    • e=xmx^m
    • xm[0,2n), thus normally e(2n,2n)
    • however with known x^m, then e[x^m,2nx^m)
  • 替換量化器
    • use multi quantizers, and encode with the best quantizer, and send which of quantizers.

Lossless DPCM

  • without quantizers, send the e directly (or further using other lossless algorithm).
  • thus can be used as preprocessor for others like Huffman, Arith. (since quite popular).
  • more efficient then using adaptive like AHuff, AArith, but perform closely.

Non-Redundant Sample Coding

  • aka adaptive sampling coding

Polynomial Predictor

xt=xt1+Δxt1+Δ2xt1+

in which

Δ2xt1=Δxt1Δxt2

Polynomial Interpolator

  • 1次內插法 = 扇形演算法 = SAPA2

AZTEC

  • p173
  • rules
    1. use horizontal line if 3 successive samples satisfy xmaxxmin<λ
    2. otherwise, use slope. Further, if next sample have the same signed of slope and |xmxm1|>λ, then keep redundant.

CORNER

  • CORCER > SAPA2 > AZTEC
  • algorithm
    1. 2-order diff, x(i)=x(i+1)+x(i1)2x(i)
    2. i, if x(i)>λ1 and x(i) is local maximum, then make x(i)
    3. then now have redundant samples x(m1),x(m2),x(m3),
    4. for all redundant, find if x(m1+m22)x(m1)+x(m2)2>λ2 (that is, if the x very concave (凹))
    5. if not, add x(m1+m22), as x(m1_2) , and do {x(m1), x(m1_2)} and {x(m1_2),x(m2)} as step 4.
    6. if step4. so, continue

BTC

-Block Truncation Coding

Moment-Preserving Quantizer

  • target – find quantizer that make 1st and 2nd moment unchanged, and since

    σ2=E[X2]E[X]2

    the variance σ2 would also unchanged.

  • solution – Given Xth as the threshold (normally, mean of all pixels), and q is # of b cases

x^i={aif xi<Xthbotherwise
E[X]=(mq)b+qamE[X2]=(mq)b2+qa2m

then,

a=E[X]σqmqa=E[X]+σmqq
  • solution2 (Absolute Moment BTC, AMBTC)

    • since the square root is hard to compute, thus we maintain the absolute moment instead of 2nd moment.
    α=E[|Xiμ|]

    then, we can have,

    a=E[X]mα2(mq)b=E[X]+mα2q

Transform Coding

  • Rotation Matrix
M(θ)=[cosθsinθsinθcosθ]

### Zonal Sampling - 區域取樣 1. 保留低頻,高頻通常小,不留 2. 低頻用比較多 bits 3. 固定 bits per block 但讓 variance 大(對其他 blocks 的同位置)的係數有比較多 bits - cons - 可能有很大難以忽略的係數

Threshold Sampling

  • 臨界取樣
    1. 設 threshold ,以下為 0,以上送位置與值

JPEG encode

  1. get F(u,v)
  2. scan with z order
  3. First coefficient(DC) encode with DPCM and Huffman.
  4. encode remains coefficients(AC) with the following law
    1. omit 0
    2. lookup category k at p220
    3. according to category k and # of 0 before this coefficient, lookup table in p224.
    4. append k bits which indicates the index of AC coefficient in that category.

DCT

  • p209

Vector Quantization

  • Cost of encoding O(nNc), where n is dimension of vector, Nc is number of codebooks.
  • need memory nNc

LBG

LBG演算法是由Linde,Buzo,Gray三人在1980年提出的。其算法與K-means雷同,根據當前劃分之群集計算誤差量,不斷調整映射區間(Mapping Region)及量化向量之量化點:

  1. 給定訓練樣本以及誤差閾值
  2. 訂定初始碼向量
  3. 將疊代計數器歸零
  4. 計算總誤差值,若不為第一次,則檢查與前一次誤差值差距是否小於閾值。
  5. 根據每一個訓練樣本與碼向量的距離d,找其最小值,定義為映射函數Q
  6. 更新碼向量:將對應到同一個碼向量的全數訓練樣本做平均以更新碼向量。

i為疊代計數器,C為該群集之代表,x為資料點,Q(x)為x量化後之群集代表C

  1. 疊代計數器加一

  2. 會到步驟四,直至誤差值小於閥值

LBG演算法十分依賴起始編碼簿,產生起始編碼簿的方法有以下幾種:

Tree-Structured Codebooks

  • m-ways tree
  • Cost now decrease to O(nmlogmNc)
  • Memory increase to nmNc1m1
    • Because there are Nn=(m+m2+m3+) nodes,
      and that Nn=mmlogmNc1m1=mNc1m1
  • cons
    • perform worse then full-search, since taking branch.

Product Code

  • use codebook represent vector direction with size N1, and codebooks represent vector length with size N2.
    • this case, we can represent N1N2 vectors with only size N1+N2
      • therefore, in same time complexity and bit rate, perform better the full-search

M/RVQ

  • 平均/餘值 VQ
    1. 對每個 blocks (e.g. n=4×4=16) 減去平均
    2. 傳送平均 (with DPCM or something)
    3. do VQ and send

I/RVQ

  • 內插/餘值 VQ
    1. do subsampling to original image (assume N×N) and would get sub-image (N/×N/, normally =8).
    2. do up-sampling by interpolation, and get the residual image.
    3. split residual image to blocks and do VQ.
  • pros
    • perform better (less blocking artifacts) than M/RVQ.

G/SVQ

  • Gain/Shape VQ
    • find and send the vector that match the most (has greatest dot product value)

CVQ

  • Classification VQ
    1. split the image to blocks
    2. classify blocks to categories
    3. for every categories, there are some specific codebooks
  • normally use many small codebooks, but can reach similar performance with the normal VQ using large codebooks.

FSVQ

  • Finite State VQ

    1. Given
      • codebooks per state C(si)
      • transition function si=f(si1,Yi1)
    2. given s0, and find Y0 by C(s0)
    3. get s1=f(s0,Y0), and find Y1
    4. until all done
  • cons

    • one connection drop can cause serious consequence.