Homework #2 Programming Assignment Solution

Most of functions in this homework are saved in the following notebooks.

Basic Functions.ipynb

Non-binary Huffman Coding.ipynb

Tunstall Coding.ipynb

In [1]:
using NBInclude
nbinclude("Basic Functions.ipynb");
nbinclude("Non-binary Huffman Coding.ipynb");
nbinclude("Tunstall Coding.ipynb");
using PyPlot
WARNING: Method definition entropy(Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[1]:2 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[1]:2.
WARNING: Method definition entropy(Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[1]:5 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[1]:5.
WARNING: Method definition avgLength(Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[2]:1 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[2]:1.
WARNING: Method definition blockPmf(Any, Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[3]:5 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[3]:5.
WARNING: Method definition blockPmf(Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[3]:16 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[3]:16.
WARNING: Method definition minValues(Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[4]:4 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[4]:4.
WARNING: Method definition testScheme(Any...) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[5]:2 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[5]:2.
WARNING: Method definition testMultiSchemes(Any, Any, Any) in module Main at /home/juser/ECE 154C/Basic Functions.ipynb:In[6]:2 overwritten at /home/juser/ECE 154C/Basic Functions.ipynb:In[6]:2.

Problem 1

Write a program for a function $\texttt{daryHuffman(pmf, d)}$ that takes a probability vector $\texttt{pmf}$ and the size of alphabet $d$ as inputs, and outputs a $d$-ary Huffman code. For example,

$$\tt ["0","1","20","21","22"] = daryHuffman([0.6,0.2,0.1,0.05,0.05],3)$$

(a) Run the program for $\texttt{pmf} = \{0.6, 0.3, 0.1\}$ with $d=3$ and compute the average length of the resulting Huffman code.

In [2]:
pmf = [0.6, 0.3, 0.1]
code = daryHuffman(pmf, 3)
Out[2]:
3-element Array{String,1}:
 "0"
 "1"
 "2"
In [3]:
avgLength(code, pmf)
Out[3]:
1.0

(b) Consider block lengths $n=2,3,5$. For each block length, find the ternary Huffman code and the average length per symbol.

In [4]:
pmf1 = pmf
print("==========================================================================================\n")
print("pmf1: ", pmf1, "\n")
print("entropy of base 3: ", entropy(pmf1)/log2(3), "\n")
pmfn = copy(pmf1)
for n in [2,3,5]
    pmfn = blockPmf(pmf1, n, pmfn)
    code = daryHuffman(pmfn, 3)
    avg_length = avgLength(code, pmfn)
    print("------------------------------\n")
    print("For the block length = ", n, ", \n")
    print("The entropy of base 3 is                   ", n*entropy(pmf1)/log2(3), "\n")
    print("The average length of Huffman code is \t   ", avg_length, "\n")
end
==========================================================================================
pmf1: [0.6,0.3,0.1]
entropy of base 3: 0.8173454221465103
------------------------------
For the block length = 2, 
The entropy of base 3 is                   1.6346908442930206
The average length of Huffman code is 	   1.71
------------------------------
For the block length = 3, 
The entropy of base 3 is                   2.452036266439531
The average length of Huffman code is 	   2.511
------------------------------
For the block length = 5, 
The entropy of base 3 is                   4.086727110732551
The average length of Huffman code is 	   4.135109999999998

Problem 2

Consider a binary source $\{A,B\}$, and suppose we have a probability vector $\texttt{src\_pmf}$ on this set.

Write a program for a function $\texttt{Tunstall(src\_pmf, num\_phrases)}$ that takes the probability vector $\texttt{src\_pmf}$ and the desired number of phrases $\texttt{num\_phrases}$ as inputs and outputs the resulting phrases of Tunstall coding and the probability vector of it. For example,

$$

\tt ["B", "AB", "AAA", "AAB"], [0.1,0.09,0.729,0.081] \ = Tunstall([0.9, 0.1], 4), $$

and

$$

\tt ["B", "AB", "AAB", "AAAA", "AAAB"], [0.2,0.16,0.128,0.4096,0.1024] \ = Tunstall([0.8, 0.2], 5). $$

(a) For $\texttt{src\_pmf} = \{0.9,0.1\}$, find the length of codewords per average number of symbols for $\texttt{num\_phrases}=4,8,16, \ldots, 2^{10} = 1024$.

In [5]:
src_pmf = [0.9, 0.1]
avg_lengths = zeros(10)
for k = 2:10
    phrases, pmf = Tunstall(src_pmf, 2^k)
    len_codewords = log2(length(phrases))
    avg_len_phrases = avgLength(phrases, pmf)
    avg_lengths[k] = len_codewords/avg_len_phrases
    print("For num_phrases = ", 2^k, ", the length of codewords per average number of symbols is ", avg_lengths[k], ".\n")
end
plot(map(x -> 2^x, 2:10), avg_lengths[2:10])
For num_phrases = 4, the length of codewords per average number of symbols is 0.7380073800738006.
For num_phrases = 8, the length of codewords per average number of symbols is 0.5750397112840617.
For num_phrases = 16, the length of codewords per average number of symbols is 0.5037092723257128.
For num_phrases = 32, the length of codewords per average number of symbols is 0.509938600712527.
For num_phrases = 64, the length of codewords per average number of symbols is 0.5134824236903565.
For num_phrases = 128, the length of codewords per average number of symbols is 0.5025554332592242.
For num_phrases = 256, the length of codewords per average number of symbols is 0.4917191965145663.
For num_phrases = 512, the length of codewords per average number of symbols is 0.4908662679449177.
For num_phrases = 1024, the length of codewords per average number of symbols is 0.49127881871156076.
Out[5]:
1-element Array{Any,1}:
 PyObject <matplotlib.lines.Line2D object at 0x7f678ee31050>

Note that the entropy of the source pmf is

In [6]:
entropy(src_pmf)
Out[6]:
0.46899559358928133

(b) For $\texttt{src\_pmf} = \{p,1-p\}$, plot the length of codewords per average number of symbols for $\texttt{num\_phrases}=1024$ as a function of $p$. The plot should be evaluated at 100 values of $p$.

In [7]:
num_phrases = 1024
plist = collect(0.01:0.01:0.99)
avg_lengths_Tunstall = zeros(length(plist))
for i in 1:length(plist)
    phrases, pmf = Tunstall([plist[i], 1-plist[i]], num_phrases)
    len_codewords = log2(length(phrases))
    avg_len_phrases = avgLength(phrases, pmf)
    avg_lengths_Tunstall[i] = len_codewords/avg_len_phrases
end
In [8]:
plot(plist, avg_lengths_Tunstall, marker = "*", label = "Tunstall")
plot(plist, map(x->entropy([x,1-x]), plist), color = "r", label = "Binary entropy")  # binary entropy function
axhline(1, color = "g", linestyle = "--")
xlabel("Probability p")
ylabel("Average lengths")
legend(loc = "upper right", fancybox = "true")
Out[8]:
PyObject <matplotlib.legend.Legend object at 0x7f678ed40a90>

Note that the average lengths are lower bounded by the binary entropy function.

(c) Compare the result in part (b) with the Huffman code rate in Q4 of Homework Set #1.

Recall the last homework:

In [9]:
n = 10
plist = collect(0.01:0.01:0.99)
avg_lengths_Huffman = zeros(length(plist))
for i in 1:length(plist)
    pmfn = blockPmf([plist[i], 1-plist[i]], n)
    avg_lengths_Huffman[i] = avgLength(daryHuffman(pmfn, 2), pmfn)/n
end

Plot the results in one plot:

In [10]:
plot(plist, avg_lengths_Tunstall, marker = "*", label = "Tunstall")
plot(plist, avg_lengths_Huffman, marker = "*", label = "Binary Huffman")
axhline(1, color = "g", linestyle = "--")
xlabel("Probability p")
ylabel("Average lengths")
legend(loc = "upper right", fancybox = "true")
Out[10]:
PyObject <matplotlib.legend.Legend object at 0x7f678ec9a750>