Tunstall Coding -- Examples

In [22]:
using PyPlot
In [2]:
using NBInclude
nbinclude("Tunstall Coding.ipynb");
nbinclude("Binary Huffman Coding.ipynb");
nbinclude("Non-binary Huffman Coding.ipynb");
nbinclude("Basic Functions.ipynb");
WARNING: Method definition entropy(Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[1]:2 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[1]:2.
WARNING: Method definition entropy(Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[1]:5 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[1]:5.
WARNING: Method definition avgLength(Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[2]:1 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[2]:1.
WARNING: Method definition blockPmf(Any, Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[3]:5 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[3]:5.
WARNING: Method definition blockPmf(Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[3]:16 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[3]:16.
WARNING: Method definition minValues(Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[4]:4 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[4]:4.
WARNING: Method definition testScheme(Any...) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[5]:2 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[5]:2.
WARNING: Method definition testMultiSchemes(Any, Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[6]:2 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[6]:2.
WARNING: Method definition daryHuffman(Any, Any) in module Main at /home/juser/ECE 154C/Binary Huffman Coding.ipynb:In[4]:5 overwritten at /home/juser/ECE 154C/Non-binary Huffman Coding.ipynb:In[2]:5.
WARNING: Method definition daryHuffman2(Any, Any) in module Main at /home/juser/ECE 154C/Binary Huffman Coding.ipynb:In[5]:5 overwritten at /home/juser/ECE 154C/Non-binary Huffman Coding.ipynb:In[3]:5.
WARNING: Method definition entropy(Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[1]:2 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[1]:2.
WARNING: Method definition entropy(Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[1]:5 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[1]:5.
WARNING: Method definition avgLength(Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[2]:1 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[2]:1.
WARNING: Method definition blockPmf(Any, Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[3]:5 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[3]:5.
WARNING: Method definition blockPmf(Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[3]:16 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[3]:16.
WARNING: Method definition minValues(Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[4]:4 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[4]:4.
WARNING: Method definition testScheme(Any...) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[5]:2 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[5]:2.
WARNING: Method definition testMultiSchemes(Any, Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[6]:2 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[6]:2.

Examples

In [26]:
src_pmf = [0.9, 0.1]
# src_pmf = [0.5, 0.3, 0.1, 0.1]
avg_lens = zeros(500)
for k = 2:500
    phrases, pmf = Tunstall(src_pmf, k)
    avg_len_codewords = log2(length(phrases))
    avg_len_phrases = avgLength(phrases, pmf)
    avg_lens[k] = avg_len_codewords/avg_len_phrases
end
# [pmf phrases]
In [27]:
entropy(src_pmf)
Out[27]:
0.46899559358928133
In [28]:
plot(avg_lens)
Out[28]:
1-element Array{Any,1}:
 PyObject <matplotlib.lines.Line2D object at 0x7fc6e8db9390>
In [29]:
avg_lens[end]
Out[29]:
0.4908682815017492
In [10]:
phrases, pmf = Tunstall(src_pmf, 6)
[pmf phrases]
Out[10]:
6×2 Array{Any,2}:
 0.1      "2"    
 0.09     "12"   
 0.081    "112"  
 0.0729   "1112" 
 0.59049  "11111"
 0.06561  "11112"
In [11]:
phrases, pmf = Tunstall(src_pmf, 10)
[pmf phrases]
Out[11]:
10×2 Array{Any,2}:
 0.1        "2"        
 0.09       "12"       
 0.081      "112"      
 0.0729     "1112"     
 0.06561    "11112"    
 0.059049   "111112"   
 0.0531441  "1111112"  
 0.0478297  "11111112" 
 0.38742    "111111111"
 0.0430467  "111111112"
In [18]:
entropy(src_pmf)
Out[18]:
0.46899559358928133
In [12]:
# Average length of codewords
avg_len_codewords = log2(length(phrases))
Out[12]:
3.321928094887362
In [13]:
# Average length of source phrase
avg_len_phrases = avgLength(phrases, pmf)
Out[13]:
6.12579511
In [14]:
# Average # code symbols per average # source symbols
avg_len_codewords/avg_len_phrases
Out[14]:
0.5422852111825467
In [15]:
# Average # code symbols per sourse symbol
sum(pmf.*avg_len_codewords./map(length, phrases))
Out[15]:
0.9121646330627423

Tunstall + Hufman: Variable-to-varialbe encoding

In [9]:
src_pmf = [0.9, 0.1]
phrases, pmf = Tunstall(src_pmf, 5)
code = daryHuffman(pmf, 2)
[pmf phrases code]
Out[9]:
5×3 Array{Any,2}:
 0.1     "2"     "100"
 0.09    "12"    "101"
 0.081   "112"   "110"
 0.6561  "1111"  "0"  
 0.0729  "1112"  "111"
In [10]:
avg_len_codewords = avgLength(code, pmf)  # Average length of codewords
avg_len_phrases = avgLength(phrases, pmf)  # Average length of source phrase
avg_len_codewords/avg_len_phrases  # Average # code symbols per average # source symbols
sum(pmf.*map(length, code)./map(length, phrases))  # Average # code symbols per sourse symbol
Out[10]:
0.7347000000000001
In [11]:
src_pmf = [0.9, 0.1]
num_phrases = 1000
avg_lengths = zeros(2,num_phrases)
for k = 2:num_phrases
    phrases, pmf = Tunstall(src_pmf, k)
    code = daryHuffman(pmf, 2)
    avg_len_codewords = avgLength(code, pmf)  # Average length of codewords
    avg_len_phrases = avgLength(phrases, pmf)  # Average length of source phrase
    avg_lengths[1,k] = avg_len_codewords/avg_len_phrases  # Average # code symbols per average # source symbols
    avg_lengths[2,k] = sum(pmf.*map(length, code)./map(length, phrases))  # Average # code symbols per sourse symbol
end
In [12]:
plot(2:num_phrases, avg_lengths[1, 2:end])
plot(2:num_phrases, avg_lengths[2, 2:end])
axhline(entropy(src_pmf), color = "r", linestyle = "--")
Out[12]:
PyObject <matplotlib.lines.Line2D object at 0x7f8de7a0b8d0>
In [13]:
# Does it converge?
avg_lengths[2, end] - avg_lengths[2, end-1]
Out[13]:
-0.0001079600722981855

Tunstall+Huffman vs. the optimal tree

Find the optimal tree for the given number of phrases.

Suppose a Bernoulli source, say $\mathcal{X} = \{0,1\}$. How can we evaluate all possible trees? Here we assume a binary tree. Let $d$ be the number of phrases. we can enumerate all possibilites by induction.

What is the input? pmf = $[p, 1-p]$, num_phrases = number of phrases

What is the output? all possible pmf of size num_phrases, and corresponding phrases

Example: How can we generate all possiblities of trees (or pmfs)?

Let pmf1 = [9, 1].

k=1, [[9,1]]

k=2, we want [[81,9,1], [9,9,1]], 1 submatrix of size (2, 3).

k=3, we want [[729, 81, 9, 1], [81, 81, 9, 1], [81, 9, 9, 1]] + [[81, 9, 9, 1], [9, 81, 9, 1], [9, 9, 9, 1]], 2 submatrices of size (3, 4).

k=4, 6 submatrices of size (4, 5).

k=5, 24 submatrices of size (5, 6).

For general k, (k-1)! submatrices of size (k, k+1) is needed.

Suppose we have pmf_list from the previous iteration, which is a list of pmfs. What we want to do is to pick each pmf in the list, enumerate all extension, and add it.

Code

In [14]:
# Suppose we keep updating pmf_list through the iterations.
# We save newly extended pmfs in pmf_list_new.
pmf1 = [0.9, 0.1]
symbol1 = ["A", "B"]
phrases_list = [symbol1]
pmf_list = [pmf1]
num_phrases = 10
for k = 3:num_phrases
    pmf_list_new = Array{Float64,1}[]
    phrases_list_new = Array{String,1}[]
    for j = 1:length(pmf_list)
        for i = 1:length(pmf_list[j])
            pmf_new = copy(pmf_list[j])
            phrases_new = copy(phrases_list[j])
            splice!(pmf_new, i, pmf_new[i]*pmf1)
            splice!(phrases_new, i, map(x -> "$(phrases_new[i])$x", symbol1))
            # to check whether it generate a correct pmf
#             print("kji: ", k, j, i, " ")
#             print("newly generated ", pmf_new, " ")
#             print("newly generated phrases ", phrases_new, "\n")
            push!(pmf_list_new, pmf_new)
            push!(phrases_list_new, phrases_new)
        end
    end
    pmf_list = copy(pmf_list_new)
    phrases_list = copy(phrases_list_new)
    # if you want to check the pmfs generated
#     print(pmf_list, "\n")  

    # Below is to compare the performance of Tunstall+Huffman with the optimal tree.
    avg_lengths = zeros(length(pmf_list))
    for l in 1:length(pmf_list)
        phrases, pmf = phrases_list[l], pmf_list[l]
        avg_lengths[l] = avgLength(binaryHuffman(pmf), pmf)/avgLength(phrases, pmf)
    end
    phrases, pmf = Tunstall(pmf1, k)
    avg_length_Tunstall = avgLength(binaryHuffman(pmf), pmf)/avgLength(phrases, pmf)
    print("------------------------------\n")
    print("For the number of phrases = ", k, ", \n")
    print("pmf by Tunstall: ", pmf, "\n")
    print("Tunstall+Huffman average length: ", avg_length_Tunstall, "\n")
    print("The optimum average length: \t ", minimum(avg_lengths), "\n")
    if abs(avg_length_Tunstall-minimum(avg_lengths)) < 1e-7
        print("Tunstall+Huffman is optimal!\n")
    else
        print("There is a better tree when the number of phrases = ", k, "!\n")
        break
    end
end
------------------------------
For the number of phrases = 3, 
pmf by Tunstall: [0.1,0.81,0.09]
Tunstall+Huffman average length: 0.6263157894736842
The optimum average length: 	 0.6263157894736843
Tunstall+Huffman is optimal!
------------------------------
For the number of phrases = 4, 
pmf by Tunstall: [0.1,0.09,0.729,0.081]
Tunstall+Huffman average length: 0.5321033210332103
The optimum average length: 	 0.5321033210332103
Tunstall+Huffman is optimal!
------------------------------
For the number of phrases = 5, 
pmf by Tunstall: [0.1,0.09,0.081,0.6561,0.0729]
Tunstall+Huffman average length: 0.4907822041291073
The optimum average length: 	 0.4907822041291073
Tunstall+Huffman is optimal!
------------------------------
For the number of phrases = 6, 
pmf by Tunstall: [0.1,0.09,0.081,0.0729,0.59049,0.06561]
Tunstall+Huffman average length: 0.478017630827086
The optimum average length: 	 0.4780176308270861
Tunstall+Huffman is optimal!
------------------------------
For the number of phrases = 7, 
pmf by Tunstall: [0.1,0.09,0.081,0.0729,0.06561,0.531441,0.059049]
Tunstall+Huffman average length: 0.47287043894152053
The optimum average length: 	 0.47287043894152064
Tunstall+Huffman is optimal!
------------------------------
For the number of phrases = 8, 
pmf by Tunstall: [0.1,0.09,0.081,0.0729,0.06561,0.059049,0.478297,0.0531441]
Tunstall+Huffman average length: 0.4725119133852185
The optimum average length: 	 0.4725119133852186
Tunstall+Huffman is optimal!
------------------------------
For the number of phrases = 9, 
pmf by Tunstall: [0.1,0.09,0.081,0.0729,0.06561,0.059049,0.0531441,0.430467,0.0478297]
Tunstall+Huffman average length: 0.47558251562653664
The optimum average length: 	 0.47558251562653664
Tunstall+Huffman is optimal!
------------------------------
For the number of phrases = 10, 
pmf by Tunstall: [0.1,0.09,0.081,0.0729,0.06561,0.059049,0.0531441,0.0478297,0.38742,0.0430467]
Tunstall+Huffman average length: 0.47807915403817675
The optimum average length: 	 0.4737265617068646
There is a better tree when the number of phrases = 10!