BDA_HADOOP_UNIT-2
BDA_HADOOP_UNIT-2
Chapter -2
Introduction to HADOOP
_________________________________________
@ 2024 FTVT Institute All Rights Reserved
PPT 1
Topics
§ What is Hadoop
§ A Brief History of Hadoop
§ Difference between RDBMS and Hadoop
§ Hadoop cluster
§ Hadoop EcoSystem projects
§ Hadoop distributions
§ Hadoop deployment architecture
By combining these characteristics, Hadoop has become a foundational technology for big data storage and
analysis.
21/11/24
ICT Department TVTI
HADOOP Eco System/ Architecture
§ HDFS, Hadoop Distributed File System, covered in terms of files.
§ On top of HDFS, the second core part of a Hadoop implementation is Map Reduce, data processing
framework
§ YARN stands for Yet Another Resource Navigator, Map Reduce two is more commonly used, but it
does build on top of Map Reduce one, so it's a good way to learn the processing framework.
§ HBase is very commonly used to be able to query out of a column store abstraction over the top of
the file system
§ Hive is HQL, or the sequel, like query Language, that is used to query Hbase.
§ Pig is a scripting language that's used for ETL-like processes or extracting, transforming, and loading.
§ Oozie is for workflow or coordination of jobs, and that works in combination with Zookeeper.
§ Sqoop is for data exchange in between other systems, particularly relational systems, like SQL Server and
Hadoop.
§ Flume is a log collector because Hadoop jobs produce a large amount of log information about job
process because the jobs are run in batches.
§ Flume is a log collector because Hadoop jobs/Applications produce a large amount of log
information about job process because the jobs are run in batches.
§ Sqoop is for data exchange in between other systems, particularly relational systems, like
SQL Server and Hadoop.
ICT Department TVTI 21/11/24
APACHE PIG
???
ICT Department TVTI 21/11/24