Big Data and Hadoop
Big Data and Hadoop
Rashmi Karan
Manager - Co ntent
Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
Big Data
The term Big Data refers to large data sets. Such huge volumes that it gets
necessary to use specific techniques and tools to deal with them. Due to its
characteristics of size, speed of growth, and variability, traditional technologies and
methods are not enough to manage big data efficiently.
Among these computer tools designed to handle large amounts of data is specific
software, generally distributed and capable of scaling with the volume and speed at
which the data is generated. Current usage of big data includes predictive analytics,
user behavior analytics, or certain other advanced data analytics methods that
extract value from big data. However, there is no specific data size defined for a set
of data to be called Big Data.
This generation of massive data and its storage, processing, and analysis has
become critical for many organizations, being one of the sectors with the most
growth and professional trajectory today. The Big Data sector is expected to
multiply its valuation in the market by 4 times by 2025, including the internet of things ,
cloud computing, artif icial intelligence, and automation.
The value that organizations can extract from this data is focused on its use for
making better strategic decisions, developing mathematical models, artificial
intelligence, etc. In many cases, the analysis of the data obtained by an organization
can give clues and ideas about new problems, and answer questions based on
objective information, which increases security and confidence.
Hadoop
Hadoop is an open-source framework with which any type of massive data can be
stored and processed. It has the ability to operate tasks in an almost unlimited way
with great processing power and get quick responses to any type of query about
the stored data. The main purpose of the framework is to store large amounts of
Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
data and allow queries on said data, with a low response time. This is achieved
through the distributed execution of code in multiple nodes (machines), each of
which is in charge of processing a part of the work to be done.
Hadoop Distributed File System: The information is not stored on a single machine,
but is distributed among all the machines that make up the cluster.
Allows to distribute the inf ormation in multiple nodes and execute the processes in
parallel
Has multiple f unctionalities to f acilitate the treatment, monitoring, and control of the
stored inf ormation
It is an open-source f ramework
Ref ers to a huge chunk of
required to manage that data.
structured and non-structured
Based on a distributed sof tware
Def inition data. It is raw data containing
Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
Based on a distributed sof tware
Def inition data. It is raw data containing
f ramework to handle huge data
mainly user-generated content
set storage and processing
to be analyzed
across clustered servers
Used in –
Used in f etching inf ormation Fraud detection and
f rom – prevention in f inance
Social Networking sites Detect and prevent cyber-
like Facebook, Instagram, attacks
and T witter
Understand user behavior
Applications
Public transportation f rom huge data sets
Healthcare and education Real-time analysis of
systems customers data
Agriculture manage content on social
media platf orms
Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.
be unscalable since to process more data
Conclusion
Through the knowledge extracted from big data analysis using tools like Hadoop,
organizations are able to find new trends. This adds a lot of value and allows them
to come up with viable and effective solutions at a higher speed. Hope this article
helped in clearing the doubts regarding the concepts of big data and Hadoop and
the difference between big data and Hadoop . Keep reading and learning!
Disclaim e r: This PDF is auto -generated based o n the info rmatio n available o n Shiksha as
o n 0 1-No v-20 23.