0% found this document useful (0 votes)
30 views

File Management and Compression

Uploaded by

mohammed zayyad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

File Management and Compression

Uploaded by

mohammed zayyad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

INTRODUCTION

Criteria for classifying computer files:

● By nature of the content: it refers to the nature of file content.


● By organization method: it refers to the way files are arranged e.g... Serial,
sequential, random and so on.
● By storage medium: it refers to storage devices in which a file could only be
stored such as magnetic or optical disk and magnetic tape and so on.

FILE EXTENSIONS

A file extension is part of the file name and uniquely identifies the type of file,
also referred to as the format. When your instructor receives your email and sees
your file with a .doc extension, she knows that you are sending her a Word
document. But, not all file extensions are quite as intuitive.
File extensions are used so that the operating system, or OS, of a computer can
recognize the file type. When your OS sees a file with a .doc extension, it knows
that this file is the native format of Microsoft Word. So, when you double-click on
a file, your OS will automatically launch the correct software application and open
up the file in this application.
Most file extensions consist of three characters, but the number of characters
can vary. For example, the file extension .py only has two characters. This file
extension is used for files in the widely used Python programming language.
Letters are most common, but some file extensions use numbers or special
characters. For example, the file extension .wp5 is used for files created using
version 5 of the word processing application, WordPerfect.

TYPES OF FILE EXTENSIONS

There are hundreds of file extensions - from .a to .z. And then there are special
characters and numbers, too. Some of this results from the fact that different
files can contain very different data. For example, a .txt file contains text, while a
.mus file contains music. So, file extensions can be data-contents specific.
However, if that was the only reason, you wouldn't need hundreds of different
ones. So, a second reason for so many different file extensions is that different
software applications organize similar data in different ways. For example,
documents created using word processing software use mostly text, but Microsoft
Word saves files as a .doc, while WordPerfect saves files as a .wp. Each application
is just a little different in how the contents of a document are stored in a file. File
extensions can therefore be software-application specific.
There are some very specific technical reasons for this. For example, digital
photographs require lots of storage. Clever algorithms have been developed to
compress the data, and there a number of different algorithms - each
corresponding to a different file extension. So, you can store a digital image as a
.jpg, .png or .tif, and each one represents a different compression algorithm for
the same type of data. These formats are not specific to a particular software
program, and most photo editing software can open all these different formats.
File extensions can therefore be algorithm specific.

Some formats are also very specific to a single operating system. For example,
some multimedia files may only play on a computer with Windows and some only on a
Mac. Most formats, however, can be used on multiple platforms. Finally, different
companies may have developed proprietary format. So, while some formats are
almost the same in terms of functionality, different file extensions are used to
recognize their proprietary nature.
There is no need to remember all the different file extensions. Remember that
your computer's OS recognizes the file extensions and knows which applications to
use for which format. However, it is good to be aware of some of the most widely
used ones.
File extensions like .exe represent executable files. In other words, they can run
by themselves and don't need to be opened by a software application. When you
download software, it often comes as a .exe file. When you double-click the file,
the software installation starts. You should be very careful with .exe files,
especially if you receive them by email. Operating systems have difficulty
determining exactly what is inside a .exe file, and computer hackers often hide
viruses inside these executable files. Unsuspecting users run the executable file
and end up with an infected computer.
One of the most widely used productivity suites is Microsoft Office. Each
software application has its own file extension, including a .doc (Word), a .xls
(Excel) and .ppt (PowerPoint). In the most recent version of these software
applications, the format was updated and the extensions changed to .docx, .xlsx
and .pptx, respectively.
Digital photographs can be stored in many different formats. These include .bmp
(bitmap picture), .gif (graphics interchange format), .jpg (joint photographic
experts group), .png (portable network graphics) and .tiff (tagged image format
file). Photo editing software applications can typically work with all these formats.
There are also a number of different audio and video formats. Commonly used
audio formats include .aiff (audio interchange format), .mp3 (moving picture
experts group version 2 audio layer 3) and .wma (windows media audio).
Commonly used video formats include .avi (audio video interleave), .mp4 (moving
picture experts group version 4 part 14), .mkv (Matroska) and .mov (Quicktime).
Similar to digital photographs, most software applications to play or edit
multimedia can handle a variety of formats.

A few more miscellaneous formats you are likely to encounter are:

● .csv stands for comma separated values. These files are commonly used to
store tabular data with a minimal amount of formatting. You can use spreadsheet
and database software applications to import these files.
● .html stands for hypertext markup language. This is the format used for
creating web pages that are displayed by an Internet browser.
● .txt is used for text documents that contain no formatting. They can be
opened by any software application that works with text.
● .zip is a file extension to indicate one or more files have been compressed
into a much smaller archive format. Operating systems have built-in utilities to
create and extract these types of files.
MIME TYPES

Many of the extensions we've been talking about are specific to Windows. Other
operating systems can open Word documents, spreadsheets, etc. In today's
cloud-based on online world, we also need to be able to open these files on the
Internet.
The MIME, or multipurpose Internet mail extensions format is a protocol that
lets text, audio, video, images, etc., be opened via Web applications. This way it
doesn't matter what operating system you are using: MIME allows you to open
images and video, and even Word documents over the Internet.
As long as developers associate their files correctly, all file types should open.

COMPRESSION

Compression is reducing the size of a file. This is done to reduce the amount of
storage space it takes up or to reduce the bandwidth when sending a file. There
are 2 types of compression:

LOSSLESS COMPRESSION

A compression algorithm is used to reduces the file size without permanently


removing any data
Repeated patterns in the file are identified and indexed
The data is replaced with the index and positions stored
The number of times the pattern appears is also stored
Techniques like run-length encoding (RLE) and Huffman encoding are used
RLE replaces sequences of repeated characters with a code that represents the
character and the number of times it is repeated
Huffman encoding replaces frequently used characters with shorter codes and less
frequently used characters with longer codes

LOSSY COMPRESSION
Lossy compression reduces the file size by permanently removing some data from
the file
This method is often used for images and audio files where minor details or data
can be removed without significantly impacting the quality
Techniques like downsampling, reducing resolution or colour depth, and reducing
the sample rate or resolution are used for lossy compression
The amount of data removed depends on the level of compression selected and can
impact the quality of the final file

OVERALL

Compression is necessary to reduce the size of large files for storage,


transmission, and faster processing
The choice between lossy and lossless compression methods depends on the type of
file and its intended use
Lossy compression is generally used for media files where minor data loss is
acceptable while lossless compression is used for text, code, and archival purposes

THE NEED TO COMPRESS IMAGES

Image files can be very large. At the time of writing, a typical mid-range camera is
rated at 24 Megapixel; this means that the maximum number of pixels used to
represent an image is approximately 24,000,000.

The number of bits used per pixel (colour depth) for a photographic image is
typically rated at 12 bits per colour channel for red (R), green (G), and blue (B),
(RGB), a total of 36 bits per pixel.

A digital picture of a dog with a section highlighted and zoomed in to show


individual pixels.
The size of an uncompressed image taken using a typical mid-range camera would
be around 108MB.
Often storage space is limited. For example, the number of images a digital camera
can hold is limited by the capacity of the memory card. Images may need to be
uploaded to:
● Cloud storage
● A web server
● A social media account
This can take a long time.

To reduce the storage and transfer time needed, images are commonly stored in
JPEG format, which is highly compressed. JPEG compression will reduce a file to
around 10% of its uncompressed size. You can learn more about JPEG compression
on the compression techniques page.
A keen photographer may well take hundreds of images in a single day. They could
use a larger memory card, or upload the images to cloud storage and delete them
from the memory card. Consider this scenario:
Video files have even more data as video is generally made up of 24 or more still
images per second. Imagine the storage requirements for a 2-hour movie stored in
an uncompressed format.

THE NEED TO COMPRESS SOUND FILES

Sound files can also be very large. Among consumers, the most popular audio file
format is MP3, which is a compressed format. A typical consumer would not notice
the difference in sound quality after compression and would benefit from the
reduction in size for streaming or download purposes.

However, music enthusiasts and professionals who work with sound files usually
work with uncompressed data to maintain the highest sound quality. WAV is an
uncompressed audio format.
ANALOGUE AND DIGITAL SOUND

In order to be processed by a computer Analogue sound must be converted into a


digital format. This process of digitising the analogue signal produces a binary
stream of 1s and 0s. This process is called analogue to digital conversion (ADC), and
the most common technique used is sampling.

Figure 3 shows how analogue sound, represented in the graph on the left, is
processed by taking samples at regular intervals. In this example, a low sample rate
has been used which results in an inaccurate representation of the sound as
represented in the graph on the right.

Figure 3: Sample rate of one sample per second


If the sample rate is increased, the accuracy of the representation improves. In
digital audio, 44,100Hz is a common sampling rate. Analogue audio is often
recorded by sampling it 44,100 times per second, and then these samples are used
to reconstruct the audio signal when playing it back. Stereo sound uses two
channels, which doubles the number of samples generated per second.

SAMPLE RESOLUTION

Each of the samples taken give an analogue value at the particular point at which it
is measured. This analogue value can be of any level. To represent the level
digitally, it is converted into a binary number. The number of bits used for this is
called the sample resolution and dictates how many different values the sample can
take. The higher the sample resolution, the more accurate the representation of
the sound. Figure 4 illustrates how increasing the number of bits improves the
accuracy of the respresentation.

Figure 4: 2-bit and 4-bit sample resolution comparison


Digital audio typically uses one of the two resolutions:

● 8-bit audio is used when lower-quality sound is acceptable. For example, in


digital devices that need simple sound capabilities.
● 16-bit audio is the standard that is used on CDs and sound cards. With 16
samples, there are 65,536 possible values.

Calculating the size of a sampled sound file

The formula used to calculate the size of a sampled sound file is:

Sampling rate × length of the sound (seconds) × sample resolution


MP3 compression typically reduces file size by 90%. This means that a file that is
15MB in its uncompressed form will be around 1.5MB in compressed form.

You might also like