Qpress: an ultimate compression program

* This article is written privately, and not related to my current or previous job in any means.

I'd like to introduce Qpress, a surprisingly powerful reversible compression program. The key fact is that, by choosing an appropriate compression algorithm, Qpress can compress any file, including one that is already compressed. In other words, by repeatedly compressing a file, we can reduce the file size as less as we want. With Qpress, we no more need to be worried about file size problems.

Although many great compression algorithms have been proposed, we know that, for every algorithm, there is a data that cannot be further compressed. For example, it is generally considered as impossible to reduce a size of an already-compressed file by compressing it again. Using the basic concept of informatics, it is easy to prove a theorem that there is no reversible compression algorithm that can reduce the size of any data.

However, we'd like to point out that this theorem has a hidden precondition that we use one algorithm for any possible file. What happens if we have a collection of compression algorithms, and select the best one for a given data? This breaks the precondition. If the collection is large enough, for any given data, we may be able to select an algorithm from the collection that can reduce size by compression. And indeed, we can.

The Qpress compression program equips a collection of as much as 257 compression algorithms, and automatically selects the best algorithm for the given program. The algorithms are carefully selected to guarantee that, for any given file, at least one algorithm in the collection can reduce its size by compression. To be strict, there is only one exception. Theoretically, a file with zero-byte length cannot be compressed. However, fot any other file, Qpress is able to reduce the size by compression. Even the file that has been compressed by Qpress can be compressed again.

You may feel this is crazy, but whether you believe or not, Qpress works as described. I actually developed a prototype Ruby script that proves the concept. Let me demonstrate the real power of the Qpress. First we prepare a file to be compressed.

$ ruby -e 'print "abc"*87' > test
$ cat test
abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc
$ cp test test-orig
$ ls -la test*
-rw-rw-r-- 1 t Users 261 Apr  1 00:01 test
-rw-rw-r-- 1 t Users 261 Apr  1 00:01 test-orig

This file seems easily compressed, but it is enough for the first demonstration.

Now, run Qpress to compress the file.

$ ./Qpress.rb test
File written to test.gzQ (8 bytes)
$ ls -la test*
-rw-rw-r-- 1 t Users   8 Apr  1 00:02 test.gzQ
-rw-rw-r-- 1 t Users 261 Apr  1 00:01 test-orig

A compression algorithm .gzQ is selected, and the file size is drastically reduced. However, as described before, this compressed file can be compressed again.

$ ./Qpress.rb test.gzQ
File written to test.gzQ.elQ (7 bytes)
$ ls -la test*
-rw-rw-r-- 1 t Users   7 Apr  1 00:02 test.gzQ.elQ
-rw-rw-r-- 1 t Users 261 Apr  1 00:01 test-orig

At this time the program selects .elQ algorithm, which further reduces the file size. It is only one byte that is reduced, but it is true that the re-compression further reduces the file size.

Of course, we are able to compress this file too.

$ ./Qpress.rb test.gzQ.elQ
File written to test.gzQ.elQ.ahQ (6 bytes)
$ ls -la test*
-rw-rw-r-- 1 t Users   6 Apr  1 00:02 test.gzQ.elQ.ahQ
-rw-rw-r-- 1 t Users 261 Apr  1 00:01 test-orig

Repeating this process can reduce the file size as small as we want.

$ ./Qpress.rb test.gzQ.elQ.ahQ
File written to test.gzQ.elQ.ahQ.agQ (5 bytes)
$ ./Qpress.rb test.gzQ.elQ.ahQ.agQ
File written to test.gzQ.elQ.ahQ.agQ.aeQ (4 bytes)
$ ./Qpress.rb test.gzQ.elQ.ahQ.agQ.aeQ
File written to test.gzQ.elQ.ahQ.agQ.aeQ.fcQ (3 bytes)
$ ./Qpress.rb test.gzQ.elQ.ahQ.agQ.aeQ.fcQ
File written to test.gzQ.elQ.ahQ.agQ.aeQ.fcQ.onQ (2 bytes)
$ ./Qpress.rb test.gzQ.elQ.ahQ.agQ.aeQ.fcQ.onQ
File written to test.gzQ.elQ.ahQ.agQ.aeQ.fcQ.onQ.pjQ (1 bytes)
$ ./Qpress.rb test.gzQ.elQ.ahQ.agQ.aeQ.fcQ.onQ.pjQ
File written to test.gzQ.elQ.ahQ.agQ.aeQ.fcQ.onQ.pjQ.aiQ (0 bytes)
$ ls -la test*
-rw-rw-r-- 1 t Users   6 Apr  1 00:03 test.gzQ.elQ.ahQ.agQ.aeQ.fcQ.onQ.pjQ.aiQ
-rw-rw-r-- 1 t Users 261 Apr  1 00:01 test-orig

Eventually, the 1-byte file is compressed and resulting to 0-byte file. Although the 0-byte file cannot be compressed, I believe you understand this limitation.

$ ./Qpress.rb test.gzQ.elQ.ahQ.agQ.aeQ.fcQ.onQ.pjQ.aiQ
Zero-size file cannot be compressed.

You may doubt whether we can really get back the original file by extracting this 0-byte file. Now, let us show the extracting process with unQpress program.

$ ./unQpress.rb test.gzQ.elQ.ahQ.agQ.aeQ.fcQ.onQ.pjQ.aiQ
File written to test.gzQ.elQ.ahQ.agQ.aeQ.fcQ.onQ.pjQ (1 bytes)
$ ./unQpress.rb test.gzQ.elQ.ahQ.agQ.aeQ.fcQ.onQ.pjQ
File written to test.gzQ.elQ.ahQ.agQ.aeQ.fcQ.onQ (2 bytes)
$ ./unQpress.rb test.gzQ.elQ.ahQ.agQ.aeQ.fcQ.onQ
File written to test.gzQ.elQ.ahQ.agQ.aeQ.fcQ (3 bytes)
$ ./unQpress.rb test.gzQ.elQ.ahQ.agQ.aeQ.fcQ
File written to test.gzQ.elQ.ahQ.agQ.aeQ (4 bytes)
$ ./unQpress.rb test.gzQ.elQ.ahQ.agQ.aeQ
File written to test.gzQ.elQ.ahQ.agQ (5 bytes)
$ ./unQpress.rb test.gzQ.elQ.ahQ.agQ
File written to test.gzQ.elQ.ahQ (6 bytes)
$ ./unQpress.rb test.gzQ.elQ.ahQ
File written to test.gzQ.elQ (7 bytes)
$ ./unQpress.rb test.gzQ.elQ
File written to test.gzQ (8 bytes)
$ ./unQpress.rb test.gzQ
File written to test (261 bytes)

Surprisingly, by applying unQpress repeatedly to the 0-byte file, we can extract the content of the file. Let me check the result.

$ ls -la test*
-rw-rw-r-- 1 t Users 261 Apr  1 00:04 test
-rw-rw-r-- 1 t Users 261 Apr  1 00:01 test-orig
$ cat test
abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc
$ cat test-orig
abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc
$ diff -s test test-orig
Files test and test-orig are identical

As you can see, the original file is reproduced as expected.

I don't want to keep this idea in secret. Thus, to serve the mankind, I'd like to distribute the ruby scripts that implement Qpress and unQpress program. Please download from the link below, and try them to confirm the result. I recommend you to try with small files first; The scripts are still a prototype, and the compression algorithm requires a lot of computational resource such as memory.

Let me emphasize again that Qpress can compress any file as much as you want. I'm sure that this is a key technology of the oncoming era of data explosion. I don't intend to make profit by using this idea, so it is public domain, without patented.

I appreciate very much if you could share this great idea with your friends.