$ &
Overview
1. Data representation / 1.3 Data storage and compression
Types of compression
Table of
contents
!
Notebook
*
As discussed in the previous section , compressing data as much as possible
Glossary is very important, and in some cases necessary for data storage. The act of
compression usually takes place within three different file types:
Reading
assistance
Sound files (e.g. MP3, FLAC, WAV)
Image files (e.g. JPG, GIF, PNG)
Video files (e.g. MP4, MPEG, AVI)
Most of the file types mentioned above fall into one of two categories of
compression: ‘lossy’ or ‘lossless’. In this section you will learn about both
types of compression, the benefits and drawbacks. Some file types apply no
compression and are known as RAW files.
Lossy compression
Lossy compression permanently removes some data from the original source
file. For example, in an image file, some pixels may be removed, and the
computer then uses an algorithm (based on the particular compression
technique) to ‘guess’ the removed content. Compressing sound files removes
sounds that are outside the human hearing range.
" Case study
Take a look at this code snippet:
sa ple = i t(i put("Input t e s mple ate in k z"))
le gt = in (inp t(" np t the length f t so d f
e"))
b t = t(in ut("In t the s nd bit de h"))
size = sa e * l th * bi
iz = st (size)
byt = tr(by s)
pri ("The so nd file size in bits is: " + (size))
Is lossy compression a good idea for computer programs? Think
about why or why not.
Lossy compression removes data permanently. This means smaller files
sizes and lower storage requirements. However, the trade-off for this is
much lower-quality files. If the level of compression is too high, the reduced
quality is very noticeable, particularly when viewing image files and video
files.
Interactive 1. Comparing an uncompressed image with a compressed image.
As you can see in the example above, the original uncompressed image
(visible when you move the slider to the right) is much higher quality than
the compressed version of the image (visible when you move the slider to
the left). But the compressed image file size is roughly 90% smaller than the
original. At what point does loss of quality become acceptable in favour of
saving storage space?
Lossless compression
The alternative, lossless compression, does not lose any data from the
original file. After the file has been uncompressed, the result is exactly the
same as the original file. A lossless compression algorithm achieves this by
finding groups of repeating data and recording this data only once, along
with the number of times it was repeated. This is demonstrated below using
an algorithm known as run-length encoding (RLE).
Figure 1. Lossless compression.
Instead of storing all 8 pixels, only the pattern is stored. When the data is
uncompressed, the algorithm will take this pattern and recreate it exactly as
the original file.
This method works well with some images, but particularly well with text
compression, as it encodes each pattern (word) in a dictionary. Look at an
Uncompressed Compressed
example below, which stores the phrase ‘it will be what it will be.’:
it 0 000
will 1 001
be 2 010
what 3 011
. 4 100
This phrase has 27 characters, including spaces and punctuation, which
means it requires 27 bytes of storage (assuming an 8-bit ASCII table is
used). If lossless compression is applied, the storage required is only 24 bits:
0 1 2 3 0 1 2 4
000 001 010 011 000 001 010 100
As we know, 24 bits is equal to 3 bytes, which is a compression of roughly
89%. This is a huge saving on storage requirement, and the original file is
kept intact.
Activity
Calculating storage requirements before and after using lossless
compression
Consider the opening lyrics to Taylor Swift’s song Blank Space :
‘Nice to meet you. Where you been?’
Using the same method described above, calculate the storage
requirements of this phrase before and after using lossless
compression.
Show solution
Lossless compression tends to produce larger files than lossy compression,
but with the advantage of no reduction in file quality. It is also worth noting
that lossless compression relies on patterns of data existing in the original
file – if there is little or no repeated data, the ‘compressed’ file could
actually be larger than the original!
Reflection
There are two types of compression methods: lossy and lossless. Lossy
compression can greatly reduce file sizes, but the quality of the file is also
greatly reduced. Lossless compression stores patterns of data and rebuilds a
perfect replica of the original file, but the compressed file will be larger than
one using lossy compression.
Complete section with 2 questions
Start questions
〈 Previous Next 〉