Most of functions in this homework are saved in the following notebooks.
using NBInclude
nbinclude("Basic Functions.ipynb");
nbinclude("Binary Huffman Coding.ipynb");
nbinclude("Shannon-Fano Coding.ipynb");
using PyPlot
pmf = [0.9, 0.1]
code_Huffman = binaryHuffman(pmf)
avgLength(code_Huffman, pmf)
code_SF = ShannonFano(pmf)
avgLength(code_SF, pmf)
pmf1_list = [[0.9, 0.1],
[0.5, 0.25, 0.25],
[0.53, 0.28, 0.1, 0.05, 0.04]]
nlistlist = [[2, 5, 10],
[1, 3, 5],
[1, 2, 3]]
for i in 1:length(pmf1_list)
pmf1 = pmf1_list[i]
nlist = nlistlist[i]
print("==========================================================================================\n")
print("pmf1: ", pmf1, "\n")
print("entropy: ", entropy(pmf1), " bits\n")
pmfn = copy(pmf1)
for n in nlist
pmfn = blockPmf(pmf1, n, pmfn)
code_H, code_SF = binaryHuffman2(pmfn), ShannonFano(pmfn)
avg_length_H, avg_length_SF = avgLength(code_H, pmfn), avgLength(code_SF, pmfn)
print("------------------------------\n")
print("For the block length = ", n, ", \n")
print("The entropy is ", n*entropy(pmf1), " bits\n")
print("The average length of Huffman code is \t ", avg_length_H, " bits.\n")
print("The average length of Shannon-Fano code is ", avg_length_SF, " bits.\n")
# [pmfn code_H code_SF]
end
end
function compareHandSF(p)
pmf1 = [p, 1-p]
print("==========================================================================================\n")
print("pmf1: ", pmf1, "\n")
print("entropy: ", entropy(pmf1), " bits\n")
pmfn = copy(pmf1)
n = 0
while true
n += 1
pmfn = blockPmf(pmf1, n, pmfn)
code_H, code_SF = binaryHuffman2(pmfn), ShannonFano(pmfn)
avg_length_H, avg_length_SF = avgLength(code_H, pmfn), avgLength(code_SF, pmfn)
print("------------------------------\n")
print("For the block length = ", n, ", \n")
print("The entropy is ", n*entropy(pmf1), " bits\n")
print("The average length of Huffman code is \t ", avg_length_H, " bits.\n")
print("The average length of Shannon-Fano code is ", avg_length_SF, " bits.\n")
if (avg_length_SF - avg_length_H > 1e-10) # be careful when comparing float variables
print("\nHuffman is strictly better than Shannon-Fano for the block length n=", n, "!\n")
break
end
end
end
compareHandSF(0.9)
n = 10
plist = collect(0.01:0.01:0.99)
avg_lengths = zeros(length(plist))
for i in 1:length(plist)
pmfn = blockPmf([plist[i], 1-plist[i]], n)
avg_lengths[i] = avgLength(binaryHuffman(pmfn), pmfn)/n
end
plot(plist, avg_lengths, marker = "*", label = "Binary Huffman")
plot(plist, map(x->entropy([x,1-x]), plist), color = "r", label = "Binary entropy") # binary entropy function
axhline(1, color = "g", linestyle = "--")
xlabel("Probability p")
ylabel("Average lengths")
legend(loc = "upper right", fancybox = "true")
Note that the average lengths are lower bounded by the binary entropy function.
The average length per symbol is
sum(avg_lengths)/length(avg_lengths)
This value approximates the optimal compression ratio of the random Bernoulli source on average.
To be more precise, consider the random Bernoulli source $X|\{P=p\} \sim \textrm{Bern}(p)$ where $P\sim\textrm{Unif}[0,1]$. Then, given $P=p$, the optimal compression ratio of the source $X|\{P=p\}$ is $H(X|P=p)=H(p)$, and therefore on average the optimal compression ratio is $$H(X|P) = \int_0^1 H(X|P=p)dp = \int_0^1 H(p)dp = \frac{1}{2\ln(2)}.$$
1/2*log2(exp(1))
You can check that our estimate is close to the true value.