0% found this document useful (0 votes)
13 views

Data Representation through Compression

Uploaded by

MOHAMED HASSAN
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Data Representation through Compression

Uploaded by

MOHAMED HASSAN
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

How Data Compression Technique helps in

Data Representation?
Ever since humans learned to communicate and exchange information, they have done their
best to reduce the length or size of the information. Prior to the digital age, techniques like
morse code were implemented. Later, telephones came into being, and voice transmission
underwent innovations like cutting off high frequencies. Fast-forward to the present era – we
are now dealing with information in digital form with the velocity, veracity, and volume
increasing exponentially. As a result, data compression has become essential for efficient
storage and transmission.

Table of Contents

 Why compress data?


 What is data compression technique?
 Types of data compression techniques
o Lossy compression
 Advantages and disadvantages of Lossy Compression
o Lossless Compression
 Advantages and disadvantages of Lossless Compression
 Data Compression Techniques: Advantages and Disadvantages
 Data Compression Technique Model
o Lossless compression technique models
o Lossy compression technique models-
o Neural network-based models

Why compress data?


Storing, managing, and transferring data becomes essential in data communication and other
data-driven solutions. This is because no matter the degree of advancement in computer
hardware (RAM, ROM, GPU) and forms of communication (internet), these resources are
scarce.

To utilize these resources efficiently, the data is often required to be compressed, i.e., reduced
to a smaller size without losing any or losing minimal information.

Varied kinds of data can be compressed. This includes numbers, text, video, images, audio, or
even programs and software. These data types can be reduced in different ratios, such as 2:1,
which means a data file with a 100 MB size can take up only 50MB of disk space after
compression. This compression, also known as compaction, is performed through various
compression techniques.

What is data compression technique?


data compression techniques in digital communication refer to the use of specific formulas
and carefully designed algorithms used by a compression software or program to reduce the
size of various kinds of data. There are particular types of such techniques that we will get
into, but to have an overall understanding, we can focus on the principles.

Data compression can be performed by using smaller strings of bits (0s and 1s) in place of the
original string and using a ‘dictionary’ to decompress the data if required. Other techniques
include the introduction of pointers (references) to a string of bits that the compression
program has become familiar with or removing redundant characters.

For a video, compression can be achieved by skipping every 3rd frame, as this will result (as
one can imagine) in a 1/3 reduction in the size of the file. All such compression can
dramatically reduce data size (in cases up to 70% or more without losing any significant
data). Compression formats like ZIP, GZIP, etc., are used when transferring data via the
internet.

The use of data compression techniques in digital communication greatly helps in reducing
the time for a file transfer, the cost of storage, and traffic in the network.

Types of data compression techniques


While one can refer to this data compression technique PDF[source], to know about the
various type of techniques available, the two common types that always stand out are:

1. Lossy
2. Lossless

Lossy compression

To understand the lossy compression technique, we must first understand the difference
between data and information. Data is a raw, often unorganized collection of facts or values
and can mean numbers, text, symbols, etc. On the other hand, Information brings context by
carefully organizing the facts.

To put this in context, a black and white image of 4×6 inches in 100 dpi (dots per inch) will
have 2,40,000 pixels. Each of these pixels contains data in the form of a number between 0 to
255, representing pixel density (0 being black and 255 being white).

This image as a whole can have some information like it is a picture of the 16th president of
the USA- Abraham Lincoln. If we display an image in 50 dpi, i.e., in 60,000 pixels, the data
required to save the image will reduce, and perhaps the quality too, but the information will
remain intact. Only after considerable loss in data, we can lose the information. Below is an
explanation of how it works.
With the above understanding of the difference between data and information, we now can
comprehend Lossy compression. As the name suggests, Lossy compression loses data, i.e.,
gets rid of it to reduce the size of the data.

Advantages and disadvantages of Lossy Compression

 Advantage:

The advantage of lossy compression is that it’s relatively quick, can reduce the file size
dramatically, and the user can select the compression level. It is beneficial for compressing
data like images, video, and even audio by taking advantage of the limitation of the human
sense. This is because of the limit of our eyes and ears as they cannot perceive a difference in
the quality of an image and audio before a certain point.

 Disadvantage:

The disadvantage of lossy is that decompression of data compressed through lossy will not
return the same data (in terms of quality, size, etc.). Still, it will hold similar information
(this, in fact, is useful in some instances, such as streaming or downloading content on the
internet). However, on the flip side, constant downloading and uploading of a file can
compress and consequently distort it beyond the point of recognition, causing permanent
information loss. Similarly, if a severe level of compression is used by the user, then the
output file might not be anywhere close to the original input file.

Lossless Compression

Lossless compression, unlike lossy compression, doesn’t remove any data; instead, it
transforms it to reduce its size. To understand the concept, we can take a simple example.

There is a piece of text where the word ‘because’ is repeated quite often. The term is
comprised of seven letters, and by using a shorthand or abbreviated version of it like ‘bcz’,
we can transform the text. This information of replacing ‘because’ with ‘bcz’ can be stored in
a dictionary for later use (during decompression).

 Methodology: While lossy compression removes redundant or unnoticeable pieces of


data to reduce the size, lossless compression transforms it through encoding it by
using some formula or logic. Here’s how lossless compression works.

Advantages and disadvantages of Lossless Compression

 Advantage:

There are types of data where lossy compression is not feasible. For example, in a
spreadsheet, software, program, or any data comprised of factual text or numbers, lossy
cannot work as every number might be essential and can’t be considered redundant as any
reduction will immediately cause loss of information. Here lossless compression becomes
crucial as, upon decompression, the file can be restored to its original state without losing any
data.

 Disadvantage:

There is a limit to data compression. If data is already compressed, then compressing it again
will result in little to no reduction in its size. Also, it is less effective against larger file sizes.

Data Compression Techniques: Advantages and


Disadvantages
There are several advantages of using the different data compression techniques discussed
above, such as-

1. Reduces the disk space occupied by the file.


2. Reading and Writing of files can be done quickly.
3. Increases the speed of transferring files through the internet and other networks.
Even with a range of advantages of the data compression techniques, there is a trade off as a
cost is always associated with the compression of a file. This cost results in certain
disadvantages such as-

1. The processing time taken by complex data compression algorithms can be very high,
especially if the data in question is large.
2. Certain compression algorithms are resource-intensive and may cause the machine to
go out of memory.
3. There is a dependency on software that decompresses compressed files.
4. The associated cost of compression can be monetary also, with certain software
requiring you to pay licensing fee.
5. Incompatibility issues can occur during decompression processes.
6. Any error occurred during the transmission of compressed data can cause significant
information loss.

Data Compression Technique Model


Let’s say, you refer to a research paper or a technical data compression techniques pdf. In that
case, you will find numerous types of data compression models that use different
compression algorithms pertaining to the two compression techniques discussed above.

Following are the most common data compression models-

Lossless compression technique models

The most common models based on lossless technique are-

1. RLE (Run Length Encoding)


2. Dictionary Coder (LZ77, LZ78, LZR, LZW, LZSS, LZMA, LZMA2)
3. Prediction by Partial Matching (PPM)
4. Deflate
5. Content Mixing
6. Huffman Encoding
7. Adaptive Huffman Coding
8. Shannon Fano Encoding
9. Arithmetic Encoding
10. Lempel Ziv Welch Encoding
11. Z Standard
12. Bzip2 (Burrows and Wheeler)

Lossy compression technique models-

The most common models based on the lossy technique are-

1. Transform coding
2. Discrete Cosine Transform
3. Discrete Wavelet Transform
4. Fractal Compression

Reference

https://www.analytixlabs.co.in/blog/data-compression-technique/

HUFFMAN CODING EXPLAINED

 Huffman Coding is a famous Greedy Algorithm.


 It is used for the lossless compression of data.
 It uses variable length encoding.
 It assigns variable length code to all the characters.
 The code length of a character depends on how frequently it occurs in the given text.
 The character which occurs most frequently gets the smallest code.
 The character which occurs least frequently gets the largest code.
 It is also known as Huffman Encoding.

Prefix Rule
Huffman Coding implements a rule known as a prefix rule.
This is to prevent the ambiguities while decoding.
It ensures that the code assigned to any character is not a prefix of the code assigned to any
other character.
Major Steps in Huffman Coding-
There are two major steps in Huffman Coding-
1. Building a Huffman Tree from the input characters.
2. Assigning code to the characters by traversing the Huffman Tree.

Huffman Tree-
The steps involved in the construction of Huffman Tree are as follows-
Step-01:
 Create a leaf node for each character of the text.
 Leaf node of a character contains the occurring frequency of that character.

Step-02:
 Arrange all the nodes in increasing order of their frequency value.

Step-03:
Considering the first two nodes having minimum frequency,
 Create a new internal node.
 The frequency of this new node is the sum of frequency of those two nodes.
 Make the first node as a left child and the other node as a right child of the newly
created node.
Step-04:
 Keep repeating Step-02 and Step-03 until all the nodes form a single tree.
 The tree finally obtained is the desired Huffman Tree.

Important Formulas-
The following 2 formulas are important to solve the problems based on Huffman Coding-

Formula-01:
Formula-02:
Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per character
= ∑ ( frequencyi x Code lengthi )

PRACTICE PROBLEM BASED ON HUFFMAN CODING-

Problem
A file contains the following characters with the frequencies as shown. If Huffman Coding is
used for data compression, determine-
1. Huffman Code for each character
2. Average code length
3. Length of Huffman encoded message (in bits)

Characters Frequencies
A 10
E 15
I 12
O 3
U 4
S 13
T 1
Solution
First, let us construct the Huffman Tree.
Huffman Tree is constructed in the following steps:
Step-01:

Step-02:

Step-03:

Step-04:
Step-05:
Step-06:
Step-07:

Now,
 We assign weight to all the edges of the constructed Huffman Tree.
 Let us assign weight ‘0’ to the left edges and weight ‘1’ to the right edges.
Rule
 If you assign weight ‘0’ to the left edges, then assign weight ‘1’ to the right
edges.
 If you assign weight ‘1’ to the left edges, then assign weight ‘0’ to the right
edges.
 Any of the above two conventions may be followed.
 But follow the same convention at the time of decoding that is adopted at
the time of encoding.

After assigning weight to all the edges, the modified Huffman Tree is-

Now, let us answer each part of the given problem one by one-

1. Huffman Code For Characters-

To write Huffman Code for any character, traverse the Huffman Tree from root node to the
leaf node of that character.
Following this rule, the Huffman Code for each character is-
 a= 111
 e= 10
 i= 00
 o= 11001
 u= 1101
 s= 01
 t= 11000

From here, we can observe-


 Characters occurring less frequently in the text are assigned the larger code.
 Characters occurring more frequently in the text are assigned the smaller code.

2. Average Code Length

Using formula-01, we have-


Average code length
= ∑ ( frequencyi x code lengthi ) / ∑ ( frequencyi )
= { (10 x 3) + (15 x 2) + (12 x 2) + (3 x 5) + (4 x 4) + (13 x 2) + (1 x 5) } / (10 + 15 + 12 + 3
+ 4 + 13 + 1)
= 2.52

3. Length of Huffman Encoded Message-

Using formula-02, we have-


Total number of bits in Huffman encoded message
= Total number of characters in the message x Average code length per character
= 58 x 2.52
= 146.16
≅ 147 bits

The above calculations can also be achieved as shown in Table 1


Table 1: Huffman Encoding
Character Code Frequency Total Bits
a 111 10 3*10=30
e 10 15 2*15=30
i 00 12 2*12=24
o 11001 3 5*3=15
u 1101 4 4*4=16
s 01 13 2*13=26
t 11000 1 5*1=5
total 58 146 bits

How compression is achieved:

Normal character encoding uses the American Standard Code for Information Interchange
(ASCII), which uses 8 bits for each character. Therefore, for 58 characters we expect 58*8=464 bits
without compression.

When Huffman Coding is used, the number of bits used includes the message and the encoding table
to allow the decoding to take place. Table 2 gives the details for the same.

Table 2: Huffman Encoding showing the total bits for decoding


Character Code Frequency Total Bits
a 111 10 3*10=30
e 10 15 2*15=30
i 00 12 2*12=24
o 11001 3 5*3=15
u 1101 4 4*4=16
s 01 13 2*13=26
t 11000 1 5*1=5
total 58 146
Total 7*8=56 bits 23 bits 146 bits
Bits

Total bits used will be 56+23+146 = 225

This implies that on this particular example encoding has reduced the number of bits by
almost 50%.

Number of bits BEFORE compression


Compression Ratio=
Number of bits AFTER compression
464
Compression Ratio= =2.1
225

EXERCISE

How many bits may be required for encoding the message WAGAGAGIGIKOKO?
Follow the procedure in this handout to perform the following:

i) Calculate the frequency of characters


ii) Generate Huffman Tree
iii) Calculate number of bits using frequency of characters and number of bits required to
represent those characters.
iv) Use the table to calculate number of bits used
v) Calculate the compression ratio.
VIDEO FILE FORMATS

Best Video File Formats for Online Video and Streaming in 2023

With so many different options out there, choosing the best video file format for your clips
can be a daunting task. But don’t worry — we’ve got you. In this article, we bring you the
best video file formats in 2023 — along with what they’re actually best for. Whether you’re
planning to launch a VOD streaming service or just need a way to store a large number of
videos without taking up too much space, you will soon know exactly which video format to
use.

What Is a Video File Format and How Does It Work?

A video file format is a format used to store digital video data on a computer. Extensions
found after the name of the file show us which format the video is stored in. This might also
dictate which software will be able to open and play the video (and whether you will have to
transcode the video in order to play it). There are dozens of different formats out there, and
not all of them are suitable for all purposes. But to know which video file format is the best
one for your video, it is important to understand how video file formats work. In order to
do that, we need to get familiar with two key terms — codecs and containers.

What Is a Container in a Video File Format?

A video file container is, quite literally, a container that carries all the data pertaining to
the video. This includes both the visual and the audio components, as well as some metadata,
such as the time and place the video was taken, the equipment used to film, video SEO
information, and the title of the video. In fact, when we talk about video file formats, we are
mostly talking about containers (although this is not always the case). An example of a video
file container you are likely familiar with is .mp4.

What Is a Codec in a Video File Format?

Codec is a piece of software used for encoding and decoding a video. In fact, the word
codec comes from coder and decoder. Simply put, the coder part of the codec compresses the
video file to make it easier to store or send. On the other hand, the decoder part is in charge of
making a compressed file usable again by decoding it. There are many different video codecs
out there, but some of the most commonly used ones are H.264 and H.265.

What Is the Difference Between a Codec and a Container in Video Files?

The difference between a codec and a container in video files is the role they play in file
storage and distribution. Although both codecs and containers are vital elements in a video
file, they serve completely different purposes. In short, the container is a box in which all
the data is stored, and the codec is a piece of software that makes the file smaller or larger.

If this sounds a little too confusing, try imagining that you are packing winter clothes in a
vacuum bag. The clothes themselves are the video data. The bag would be the container, and
the vacuum you use to suck the air out of the bag is the codec. In order to be able to store all
your clothes on a shelf, you need to make them as compact as possible (that is, encode the
video to store it on a computer). And once you decide to wear them again (that is, play the
video), you need to let the air back into the bag (you need to decode the video to play it).

Latest Best Video File Formats in 2023

There are many video file formats out there but the most commonly used ones include:

 mp4
 WebM
 mov
 avi
 mkv
 wmv
 avchd
 flv

1. MP4

MP4 is by far one of the most commonly used video file formats. It is highly versatile and
compatible with a wide array of players and devices (including TargetVideo’s HTML5
player). First released in 2001, MP4 is considered to be a global standard for video encoding
today. It provides a high level of compression (that is, it can make the video file much
smaller) without significantly affecting the quality of the video.

Pros:

 Compatibility with a long list of media players;


 Compatibility with a wide array of video-sharing websites, including YouTube;
 High compression without a lot of quality deterioration.

Cons:

 Encoding, playing, and editing an MP4 video require quite a bit of computing power;
 MP4 makes it quite easy to alter the metadata of a file and illegally distribute content;
 Repeat encoding can lead to significant quality deterioration, as MP4 is a lossy
format.

2. WebM

WebM is an open-source format developed by Google. It was mainly created for the
purpose of sharing video files online and is supported by all major browsers, from Google
Chrome to Microsoft Edge. As it contains small video files, it allows for almost immediate
playback, making it great for websites with a lot of video content and one of the go-to options
for live streaming platforms such as Streamlab.

Pros:

 Small file sizes call for low computational power;


 It is an open-source format available to everyone;
 Offers great quality real-time video delivery, making it great for live streaming;
 Compatible with major online video platforms, such as YouTube.

Cons:

 Not the best compatibility with mobile devices;


 Might not be compatible with some players and browsers, especially older ones.

3. MOV

MOV is a video file format that is most compatible with iOS devices, although it also works
on Windows. It was developed by Apple with the main purpose of storing full-length
movies. It supports high video bitrate, which also enables decent video quality. MOV is
compatible with a long list of codecs and platforms. The best players to use to open MOV
files are QuickTime and VLC.

Pros:

 Offers great video quality;


 Can contain different multimedia elements, such as video, audio, or text, stored as
separate tracks;
 Compatible with a wide array of codecs and platforms, including YouTube,
Facebook, and Instagram.

Cons:

 Large file sizes;


 Poor compatibility with players other than QuickTime or VLC;
 Compatibility with Facebook and Instagram is limited to files of up to 4GB.

4. AVI

Microsoft developed the AVI format in 1992, making it one of the oldest video formats.
Along with MP4, it is also one of the most common video formats out there. It is compatible
with all devices that use Windows, Mac, or Linux and with all major internet browsers. It is
one of the most common formats for TV, which might account for its popularity slightly
dropping in recent years.

Pros:

 Compatible with most players, browsers, and platforms;


 Offers high-quality video and audio;
 Suitable for short videos, promos, advertisements, teasers, etc.

Cons:

 It requires more storage than most other video file formats;


 Not a good option for live streaming videos;
 Compression with quality retention is not its strongest suit.

5. MKV
MKV got its name after Matroskas — Russian stacking dolls. And that is exactly what
MKV is — a video file format that can contain an unlimited number of video or audio
tracks within it. For example, if a clip has several audio options in different languages, MKV
will store each option as a separate track. It also supports elements such as chapters or
menus.

Pros:

 Free and open-source, meaning that it is constantly being updated and improved;
 It supports almost all codecs out there;
 It is a universal container that supports unlimited tracks, menus, chapters, and more.

Cons:

 Not compatible with many players and devices;


 Uses a more complicated compression process than most other formats;
 MKV file sizes are relatively large.

6. WMV

WMV format is perfect for storing large amounts of video and audio data without taking
up too much space. It has a high compression ratio with the ability to retain relatively good
video quality. WMV was developed by Microsoft. As such, it is compatible with Windows
Media Player and other Windows-based programs. Still, its compatibility with other
operating systems and programs is pretty low. Unlike many other formats, WMV can
serve both as a container and as a codec. Files stored in this format are often protected by
Digital Rights Management systems.

Pros:

 It can store a lot of data without taking up too much space;


 The compression ratio is twice as high as for MPEG-4;
 It is fully compatible with Windows, including older software such as Microsoft
PowerPoint.

Cons:

 Compatibility with other operating systems and programs is quite limited;


 The compression ratio is not manually adjustable;
 Due to limited compatibility, it is hardly a standard video file format.

7. AVCHD

AVCHD was a joint venture by Sony and Panasonic as a format for video production using
digital cameras and camcorders. It is the format that allows for the highest quality of
videos, and its newest edition also supports 3D videos. It includes highly efficient encoding
using the H.264 codec without significant quality loss.

Pros:
 Highest-quality video files;
 3D video support;
 Compatibility with Blu-ray and memory cards;
 Compatibility with Sony, Panasonic, and Canon cameras.

Cons:

 Files saved in this format are quite large;


 Compatibility with various devices and programs is relatively limited;
 Editing this format can be quite complicated and time-consuming.

8. FLV

FLV is a video file format designed for Adobe Flash Player. However, since Adobe
discontinued its support for the player in December 2020, many VOD providers have been
moving away from it in favor of the HTML5 video player. Along with the Flash Player, FLV
has been slowly dying, too. Adobe, for example, recommends replacing it with the H.264
codec.

How to Pick the Best Video File Format

So we’ve covered the basics of the top video file formats in 2023. But which one is the best?
Unfortunately, there is no clear-cut answer to this question. Are you launching an OTT
service, planning to add a couple of videos to your website, or just want to e-mail some clips
from your last vacation to a friend? The choice of the best video format will depend on what
exactly you’re planning to do with the videos, so here is a quick overview of what makes
each of these formats stand out:

 MP4 is a universal format that is supported by all major operating systems,


browsers, and players. It is the safest option for maintaining decent video quality
without sacrificing too much storage space.
 WebM is a great option for online libraries and live streaming, especially on
Windows devices.
 MOV is most compatible with iOS devices and is a great option for full-length
movies.
 AVI maintains a high quality of video and audio and is compatible with most
browsers, systems, and platforms. However, due to its high storage requirements, it is
most useful for short clips.
 MKV is a unique format in terms of its unlimited track-storing capabilities. It is
the right choice for videos with multiple audio options, chapters, menus, and similar
elements.
 WMV is the right choice if you want to save up on space, as it has a very high
compression ratio. However, its range of compatibility with operating systems other
than Windows is quite narrow.
 Lastly, AVCHD is the format used by professional recording equipment to store
data. It maintains a very high quality of video files, although it takes up quite a lot of
space.
Video File Formats: An Overview

As you can see, options are plentiful. To make the choice of the right format easier, here’s a
quick overview of what each of the formats above is best for.

File Format Best Used For


Universal format, good for uploading decent-quality videos to streaming
MP4 platforms such as YouTube or Facebook and storing video files on computers
and phones
WebM Online video libraries and live streaming
MOV Full-length movies, especially on iOS
AVI Short clips, promotional videos, advertisements
Videos with multiple audio options, a large number of separate tracks,
MKV
selectable chapters and menu options
Videos with multiple audio options, a large number of separate tracks,
WMV
selectable chapters, and menu options
AVCHD High-quality videos recorded with professional equipment, 3D videos
FLV Nostalgia

https://target-video.com/best-video-file-formats/

You might also like