In this assignment, first you read a text file as input and obtain the frequencies (count) of each character in a given text file, including the blank spaces. Next, using the frequency information you obtained from the file you build a Huffman tree.
- By tracing the branches of the Huffman tree your program outputs each character and the corresponding binary string.
- Challenge (100 pts bonus):
- Your program reopens the text file and encode each character as you read from the file using the Huffman tree you constructed in (a). The encoded file should appear in a file called encoded.txt in the same directory as the original file. Recall that each character you read cannot have a value greater than 255.
- Using the encoded.txt binary file, your program decodes the encoded file and outputs the text data in decode.txt file. This file should be exactly the same as the original file.
Here is a brief discussion of Huffman encoding. Consider set of numbers and their frequency of use and want to create a Huffman encoding for them:
FREQUENCY VALUE
5 1
7 2
10 3
15 4
20 5
45 6
Creating a Huffman tree is simple. Sort this list by frequency and make the two-lowest elements into leaves, creating a parent node with a frequency that is the sum of the two lower element's frequencies:
12:*
/ \
5:1 7:2
The two elements are removed from the list and the new parent node, with frequency 12, is inserted into the list by frequency. So now the list, sorted by frequency, is:
10:3
12:*
15:4
20:5
45:6
You then repeat the loop, combining the two lowest elements.
This results in:
22:*
/ \
10:3 12:*
/ \
5:1 7:2
and the list is now:
20:5
22:*
45:6
You repeat until there is only one element left in the list.(放图片里了,格式有问题)
Now the list is just one element containing 102:*, you are done. This element becomes the root of your binary Huffman tree.
To generate a Huffman code you traverse the tree to the value you want, outputting a 0 every time you take a left hand branch, and a 1 every time you take a right hand branch. (normally you traverse the tree backwards from the code you want and build the binary huffman encoding string backwards as well, since the first bit must start from the top).