Hadoop Components
Hadoop Components
Components Of Hadoop:
2. MapReduce:
What it does: MapReduce is used to process large data by breaking it into smaller tasks that
run in parallel (at the same time) across multiple machines.
How it works: It has two steps:
o Map: It splits data into small chunks and processes each chunk.
o Reduce: It takes the results from the map step and combines them to give the final
output.
Key parts:
o ResourceManager (RM): It’s the boss of the system, deciding who gets the
resources.
o NodeManager (NM): It manages the resources on each computer.
o ApplicationMaster (AM): It controls each application/job and makes sure it runs.
4. Hadoop Common:
What it does: This is a collection of libraries and utilities that Hadoop needs to run. It
provides support for different Hadoop tools.
5. Hive:
What it does: Hive is a tool that allows you to run SQL-like queries on your data in Hadoop. It
makes it easier to work with structured data (like tables).
How it works: You can write queries like SQL, and Hive will convert them into tasks that can
be run on Hadoop.
Key part:
o Metastore: Stores information about your data (like the table names and structure).
6. Pig:
What it does: Pig helps process data using a script-based language called Pig Latin, which is
easier than writing complex MapReduce code.
How it works: You write simple scripts, and Pig will handle the complexity of data processing.
Key part:
o Pig Latin: The language used for writing scripts.
7. HBase:
What it does: HBase is a NoSQL database for storing and managing large amounts of real-
time data in Hadoop.
How it works: It stores data in columns rather than rows, making it good for big data that
requires fast access.
8. Zookeeper:
What it does: Zookeeper is a tool that helps different parts of Hadoop work together in a
coordinated way.
How it works: It helps manage and organize distributed systems, ensuring they are
synchronized and running smoothly.
9. Sqoop:
What it does: Sqoop helps you move data between Hadoop and relational databases (like
MySQL, Oracle).
How it works: You can import data from a database into Hadoop or export data from
Hadoop to a database.
10. Flume:
What it does: Flume helps collect and transfer log data (like web server logs) into Hadoop.
How it works: It captures data from multiple sources and streams it into HDFS or other
storage systems.
Benefits of Hadoop
Uses of Hadoop
1. Big Data Processing: Handles vast amounts of structured and unstructured data.
2. Data Warehousing: Used for storing and managing large datasets.
3. Real-Time Analytics: Analyzes data in real-time for insights.
4. Log and Event Data Analysis: Processes and analyzes logs and events from systems.
5. Machine Learning: Used for training machine learning models on large datasets.
6. Data Mining: Extracts valuable insights from big data for decision-making.
Importance of Hadoop
Challenges of Hadoop