92. Magical shrinking files
By Andrew D. Wright
Whether you call it a jpeg, an mp3 or a DVD, what you are talking about is
a magic trick that works.
Data compression shrinks the size of information, saving money on its
storage and the time needed to transfer it. Less is more.
There are two kinds of data compression, lossless and lossy.
In lossless data compression the original file information can be
extracted intact from a smaller file. A ZIP file would be an example: the
file you unzip is identical to the original file that was compressed.
Lossless data compression uses a number of different techniques to shrink
information size. The most basic one is to remove redundant information.
Let's say this column you're now reading was being compressed. The words
"data compression" could be replaced by something smaller: a symbol or a
reference to where those characters first appeared in the text.
Most lossless compression programs use variations on this theme, keeping
indexes of the information that has been compressed then removing the
redundant information to compress the indexes in turn. The end user just
needs to be able to interpret the particular index format in order to
restore the original data.
While ZIP files are the most popular lossless format, others such as RAR,
CAB, ACE, and JAR files are also in wide use. There are many programs
available that use one or more of these formats to create reduced-size
archives of multiple files and folders such as hard drive backups.
How much the data can be compressed, how long will it take to compress and
uncompress the data and how well a program can handle different kinds of
data vary from program to program.
For the last two years the most efficient and fastest lossless compression
software has been a Windows-based shareware program from New Zealand
called WinRK using its own proprietary .rk file extension. Unfortunately
unlike the other lossless compression formats mentioned above, there is no
free program to open the .rk compressed files. The shareware program is
free to try for thirty days.
Lossy data compression selectively deletes some information to make files
smaller. The popular mp3 audio format removes sound the human ear is
unlikely to miss so a lot less data is needed to make a reasonably good
copy of the original sound.
Lossy data compression is everywhere around us. The MPEG-2 format used to
encode satellite TV and the video on DVDs uses statistical analysis of the
building blocks of the image to remove redundant information. Near
duplicate video frames can be merged, visual detail can be blurred and
substantially less bandwidth is needed to send a close copy of the
original video signal.
Psychoacoustics is the science of understanding how humans hear sound and
is used to quantify the kind of audio information that can be removed from
a sound file without producing a discernible difference to most people. A
sound file can be made ten to twelve times smaller than its original size
after lossy compression. Your dog and cat will notice the difference right
away but you probably won't.
WinRK (shareware): http://www.msoftware.co.nz/
The Mousepad runs every two weeks. It's a service of Chebucto Community
Net, a community-owned Internet provider. If you have a question about
computing, email mousepad@chebucto.ns.ca or
click here. If we use your question
in a column, we'll send you a free mousepad.
Originally published 10 September 2006