Round2: RLE Compression
This is the second round of the programming competition held by Lebgeeks.
Introduction
Data compression is a very important field in computer science research. It consists mainly of reducing disk space taken by a file. You probably heard of zip, rar, 7z or tar.gz (if you work on linux).
This next round of the competition will be about implementing a compression algorithm for text files. We will be using for this a simple algorithm called
Run-Length Encoding (RLE). RLE is usually used for images (like .bmp), but we're going to keep it simple by manipulating text.
Run_Length Encoding
Let's take a quick look at how RLE works. Imagine having this string:
AAAAAAAAAAAAAS33333333333#SSSSSSFF
Notice how some characters repeat themselves? It will be easy then to save space by indicating how many time each character repeats itself. For example:
13A 1S 113 1# 6S 2F
As you can see, we have already saved a lot of space. But there's a problem. Take a look at "113". How can we know if we mean "1 time 13" or "11 times 3"? This is why we chose to use a character as a special separator, we call it a
flag. We usually chose a flag that is not often used, I suggest we took #. Our compressed file would be:
13#A 1#S 11#3 1## 6#S 2#F
Another problem! What if the flag just happens to be part of the text? We will use the speical notation "flag flag" or "##". Notice also that for characters that are repeated once or twice, the compression algorithm is actually adding space instead of reducing it. This is why we do not transform characters that repeat less than 3 times. So:
13#A S 11#3 ## 6#S FF
Final problem: look at 13#A. How can we differentiate between "1 + 3*A" and "13*A"? This is why we chose to limit ourselves to 9 repetitions per block. Which would give us:
9#A 4#A S 9#3 2#3 ## 6#S FF
or in its final form:
9#A4#AS9#333##6#SFF
Notice how this string is smaller than the original and did not lose at all any information (some compression algorithms do, but that's another story). Decompression happens in the opposite order.
Your program
You are asked to develop a compression program that uses RLE to compress and decompress text files. I would suggest you start by practicing on the string I gave as an example before moving on to files.
Rules, deadlines, and notation systems
- You are allowed to submit your work in pairs. Each participant will then get 75% of their total grade.
- Languages permitted are: C, C++, C#, Java, PHP, Python, Perl, Ruby.
- Your answers should be sent in a zipped file to
lebgeeks.rahmu at gmail.com. Include in your zip file a README file including your name (and your partner's if any), the compiler/IDE you used, any instructions that could be helpful for me to run your program.
- Here's how the grading system goes:
- 10 points for the implementation of the algorithm.
- 10 points for developing a UI. It can be console-based, graphical or web-based.
- 10 points for creativity bonuses. It will be discussed below.
- As usual, I would ask you to send me a code that compiles, with lots of comments and explicit variable names. I will be reviewing your code mainly, more than your end program, so please help me out.
- 3 point bonus for the first project to be turned in.
- Deadline for submission is on
Sunday November 7th, 2010. It will give you two weeks to work on your project.
Creativity bonus
You should focus on developing your program flawlessly, and with a nice UI. However, if you finish beforehand, there is a way for you to score extra points. There is no definite rule on how these extra points will be given. It is basically anything "extra" to the program. Here are some example:
- Creative name for your program
- A logo
- A 10 sec video presenting your program
- Added functionalities like compression ratio calculation. (If the file was 2KB and now is 1KB, it's a 50% conversion), or user can chose their own flags.
- Any sorts of statistic modules to your program.
- Make your program a Firefox/Chrome extension
etc.
There you go, you should have all the information you need to start as soon as possible. However, I'll leave this thread open in case you have any questions (about the algorithm or the organization of the competition). You can also use this thread to post a request for partnership. We will try to match pairs according to skills.