Big Data Notes With Diagrams
Big Data Notes With Diagrams
- Big Data refers to datasets that are too large or complex to process using traditional methods.
- Diagram: A flowchart showing data collection, processing, analysis, and insight generation.
2. History of Hadoop:
- Hadoop was inspired by Google's MapReduce and GFS (Google File System).
3. Hadoop Ecosystem:
- Comprises tools that work together to process and analyze Big Data.
- Diagram: Hadoop Ecosystem Overview - showing HDFS, MapReduce, Pig, Hive, Sqoop, Flume,
etc.
1. HDFS Concepts:
- Distributed storage system designed to store very large datasets across multiple nodes.
2. Data Ingestion:
3. Hadoop I/O:
- Compression: Reduces data size to save storage.
- Diagram: A pipeline representing data flow through compression and serialization stages.
- Diagram: MapReduce Workflow - showing split, map, shuffle, and reduce phases.
3. Job Scheduling:
1. Pig:
2. Hive:
- Diagram: HBase Architecture - showing regions, Region Servers, and Master Node.
1. Supervised Learning:
2. Unsupervised Learning:
3. Collaborative Filtering: