File Management and Compression
File Management and Compression
FILE EXTENSIONS
A file extension is part of the file name and uniquely identifies the type of file,
also referred to as the format. When your instructor receives your email and sees
your file with a .doc extension, she knows that you are sending her a Word
document. But, not all file extensions are quite as intuitive.
File extensions are used so that the operating system, or OS, of a computer can
recognize the file type. When your OS sees a file with a .doc extension, it knows
that this file is the native format of Microsoft Word. So, when you double-click on
a file, your OS will automatically launch the correct software application and open
up the file in this application.
Most file extensions consist of three characters, but the number of characters
can vary. For example, the file extension .py only has two characters. This file
extension is used for files in the widely used Python programming language.
Letters are most common, but some file extensions use numbers or special
characters. For example, the file extension .wp5 is used for files created using
version 5 of the word processing application, WordPerfect.
There are hundreds of file extensions - from .a to .z. And then there are special
characters and numbers, too. Some of this results from the fact that different
files can contain very different data. For example, a .txt file contains text, while a
.mus file contains music. So, file extensions can be data-contents specific.
However, if that was the only reason, you wouldn't need hundreds of different
ones. So, a second reason for so many different file extensions is that different
software applications organize similar data in different ways. For example,
documents created using word processing software use mostly text, but Microsoft
Word saves files as a .doc, while WordPerfect saves files as a .wp. Each application
is just a little different in how the contents of a document are stored in a file. File
extensions can therefore be software-application specific.
There are some very specific technical reasons for this. For example, digital
photographs require lots of storage. Clever algorithms have been developed to
compress the data, and there a number of different algorithms - each
corresponding to a different file extension. So, you can store a digital image as a
.jpg, .png or .tif, and each one represents a different compression algorithm for
the same type of data. These formats are not specific to a particular software
program, and most photo editing software can open all these different formats.
File extensions can therefore be algorithm specific.
Some formats are also very specific to a single operating system. For example,
some multimedia files may only play on a computer with Windows and some only on a
Mac. Most formats, however, can be used on multiple platforms. Finally, different
companies may have developed proprietary format. So, while some formats are
almost the same in terms of functionality, different file extensions are used to
recognize their proprietary nature.
There is no need to remember all the different file extensions. Remember that
your computer's OS recognizes the file extensions and knows which applications to
use for which format. However, it is good to be aware of some of the most widely
used ones.
File extensions like .exe represent executable files. In other words, they can run
by themselves and don't need to be opened by a software application. When you
download software, it often comes as a .exe file. When you double-click the file,
the software installation starts. You should be very careful with .exe files,
especially if you receive them by email. Operating systems have difficulty
determining exactly what is inside a .exe file, and computer hackers often hide
viruses inside these executable files. Unsuspecting users run the executable file
and end up with an infected computer.
One of the most widely used productivity suites is Microsoft Office. Each
software application has its own file extension, including a .doc (Word), a .xls
(Excel) and .ppt (PowerPoint). In the most recent version of these software
applications, the format was updated and the extensions changed to .docx, .xlsx
and .pptx, respectively.
Digital photographs can be stored in many different formats. These include .bmp
(bitmap picture), .gif (graphics interchange format), .jpg (joint photographic
experts group), .png (portable network graphics) and .tiff (tagged image format
file). Photo editing software applications can typically work with all these formats.
There are also a number of different audio and video formats. Commonly used
audio formats include .aiff (audio interchange format), .mp3 (moving picture
experts group version 2 audio layer 3) and .wma (windows media audio).
Commonly used video formats include .avi (audio video interleave), .mp4 (moving
picture experts group version 4 part 14), .mkv (Matroska) and .mov (Quicktime).
Similar to digital photographs, most software applications to play or edit
multimedia can handle a variety of formats.
● .csv stands for comma separated values. These files are commonly used to
store tabular data with a minimal amount of formatting. You can use spreadsheet
and database software applications to import these files.
● .html stands for hypertext markup language. This is the format used for
creating web pages that are displayed by an Internet browser.
● .txt is used for text documents that contain no formatting. They can be
opened by any software application that works with text.
● .zip is a file extension to indicate one or more files have been compressed
into a much smaller archive format. Operating systems have built-in utilities to
create and extract these types of files.
MIME TYPES
Many of the extensions we've been talking about are specific to Windows. Other
operating systems can open Word documents, spreadsheets, etc. In today's
cloud-based on online world, we also need to be able to open these files on the
Internet.
The MIME, or multipurpose Internet mail extensions format is a protocol that
lets text, audio, video, images, etc., be opened via Web applications. This way it
doesn't matter what operating system you are using: MIME allows you to open
images and video, and even Word documents over the Internet.
As long as developers associate their files correctly, all file types should open.
COMPRESSION
Compression is reducing the size of a file. This is done to reduce the amount of
storage space it takes up or to reduce the bandwidth when sending a file. There
are 2 types of compression:
LOSSLESS COMPRESSION
LOSSY COMPRESSION
Lossy compression reduces the file size by permanently removing some data from
the file
This method is often used for images and audio files where minor details or data
can be removed without significantly impacting the quality
Techniques like downsampling, reducing resolution or colour depth, and reducing
the sample rate or resolution are used for lossy compression
The amount of data removed depends on the level of compression selected and can
impact the quality of the final file
OVERALL
Image files can be very large. At the time of writing, a typical mid-range camera is
rated at 24 Megapixel; this means that the maximum number of pixels used to
represent an image is approximately 24,000,000.
The number of bits used per pixel (colour depth) for a photographic image is
typically rated at 12 bits per colour channel for red (R), green (G), and blue (B),
(RGB), a total of 36 bits per pixel.
To reduce the storage and transfer time needed, images are commonly stored in
JPEG format, which is highly compressed. JPEG compression will reduce a file to
around 10% of its uncompressed size. You can learn more about JPEG compression
on the compression techniques page.
A keen photographer may well take hundreds of images in a single day. They could
use a larger memory card, or upload the images to cloud storage and delete them
from the memory card. Consider this scenario:
Video files have even more data as video is generally made up of 24 or more still
images per second. Imagine the storage requirements for a 2-hour movie stored in
an uncompressed format.
Sound files can also be very large. Among consumers, the most popular audio file
format is MP3, which is a compressed format. A typical consumer would not notice
the difference in sound quality after compression and would benefit from the
reduction in size for streaming or download purposes.
However, music enthusiasts and professionals who work with sound files usually
work with uncompressed data to maintain the highest sound quality. WAV is an
uncompressed audio format.
ANALOGUE AND DIGITAL SOUND
Figure 3 shows how analogue sound, represented in the graph on the left, is
processed by taking samples at regular intervals. In this example, a low sample rate
has been used which results in an inaccurate representation of the sound as
represented in the graph on the right.
SAMPLE RESOLUTION
Each of the samples taken give an analogue value at the particular point at which it
is measured. This analogue value can be of any level. To represent the level
digitally, it is converted into a binary number. The number of bits used for this is
called the sample resolution and dictates how many different values the sample can
take. The higher the sample resolution, the more accurate the representation of
the sound. Figure 4 illustrates how increasing the number of bits improves the
accuracy of the respresentation.
The formula used to calculate the size of a sampled sound file is: