0% found this document useful (0 votes)
4 views

Big Data and Hadoop

Uploaded by

qabiswajit
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Big Data and Hadoop

Uploaded by

qabiswajit
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Difference Between Big Data and Hadoop

Rashmi Karan
Manager - Co ntent

Updated on Nov 26, 2021 18:01 IST


As predicted by IDC, global data volume grew from 4.4 zettabytes to 44 zettabytes
between 2013 and 2020. By 2025, IDC predicts that there will be 163 zettabytes of
data from mobile devices, Internet of things devices with information sensing,
remote sensing, software logs, cameras, microphones, RFID readers, and wireless
sensor networks. When we talk about big data, Hadoop often comes into the
picture and people use them interchangeably, however, there is a difference between
big data and Hadoop, let us check out.

Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
Big Data

The term Big Data refers to large data sets. Such huge volumes that it gets
necessary to use specific techniques and tools to deal with them. Due to its
characteristics of size, speed of growth, and variability, traditional technologies and
methods are not enough to manage big data efficiently.

Among these computer tools designed to handle large amounts of data is specific
software, generally distributed and capable of scaling with the volume and speed at
which the data is generated. Current usage of big data includes predictive analytics,
user behavior analytics, or certain other advanced data analytics methods that
extract value from big data. However, there is no specific data size defined for a set
of data to be called Big Data.

Import ance of Big Dat a

This generation of massive data and its storage, processing, and analysis has
become critical for many organizations, being one of the sectors with the most
growth and professional trajectory today. The Big Data sector is expected to
multiply its valuation in the market by 4 times by 2025, including the internet of things ,
cloud computing, artif icial intelligence, and automation.

The value that organizations can extract from this data is focused on its use for
making better strategic decisions, developing mathematical models, artificial
intelligence, etc. In many cases, the analysis of the data obtained by an organization
can give clues and ideas about new problems, and answer questions based on
objective information, which increases security and confidence.

Hadoop

Hadoop is an open-source framework with which any type of massive data can be
stored and processed. It has the ability to operate tasks in an almost unlimited way
with great processing power and get quick responses to any type of query about
the stored data. The main purpose of the framework is to store large amounts of

Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
data and allow queries on said data, with a low response time. This is achieved
through the distributed execution of code in multiple nodes (machines), each of
which is in charge of processing a part of the work to be done.

Apache Hadoop Component s

The basic components of Apache Hadoop are –

Hadoop Distributed File System: The information is not stored on a single machine,
but is distributed among all the machines that make up the cluster.

MapReduce Framework: MapReduce is a systematic approach that uses the HDFS


distributed file system for the parallel processing of data. The system is structured
through a master-slave architecture where the master server of each Hadoop
cluster receives and queues user requests and assigns them to the slave servers for
processing.

Advant ages of using Hadoop

Some remarkable benefits that Hadoop offers, include –


Developers do not have to f ace the problems of parallel programming

Allows to distribute the inf ormation in multiple nodes and execute the processes in
parallel

It has mechanisms f or data monitoring

Allows data queries

Has multiple f unctionalities to f acilitate the treatment, monitoring, and control of the
stored inf ormation

Dif f erence between Big Data and Hadoop

Big Dat a Hadoop

It is an open-source f ramework
Ref ers to a huge chunk of
required to manage that data.
structured and non-structured
Based on a distributed sof tware
Def inition data. It is raw data containing

Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
Based on a distributed sof tware
Def inition data. It is raw data containing
f ramework to handle huge data
mainly user-generated content
set storage and processing
to be analyzed
across clustered servers

One of the dif f erent tools to


Has little or no value until
Value store, process, and analyze big
processed
data

Allows to access and process the


Accessibility Dif f icult to access given its size
big data very f ast

Hadoop Distributed File System


Not possible to store big data
(HDFS) is the primary data storage
Storage because of its raw and
system in Hadoop, storing big
unstructured f orm
data

Just a tool to pull out value f rom


Nature Big data is considered an asset
the asset

Clusters dif f erent f ormats of data


Consists of multiple f ormats of which can be stored as structured,
T ype
data semi-structured, and completely
unstructured

Used in –
Used in f etching inf ormation Fraud detection and
f rom – prevention in f inance
Social Networking sites Detect and prevent cyber-
like Facebook, Instagram, attacks
and T witter
Understand user behavior
Applications
Public transportation f rom huge data sets
Healthcare and education Real-time analysis of
systems customers data
Agriculture manage content on social
media platf orms

A complex set of data that is Allows to scale the system as the


Scalability open to interpretation and can volume of data received grows,
be unscalable since to process more data

Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
be unscalable since to process more data

Conclusion

Through the knowledge extracted from big data analysis using tools like Hadoop,
organizations are able to find new trends. This adds a lot of value and allows them
to come up with viable and effective solutions at a higher speed. Hope this article
helped in clearing the doubts regarding the concepts of big data and Hadoop and
the difference between big data and Hadoop . Keep reading and learning!

Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.

You might also like