0% found this document useful (0 votes)
9 views

Lesson 01 Course Introduction

Uploaded by

niraj.karki5497
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lesson 01 Course Introduction

Uploaded by

niraj.karki5497
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Big Data Hadoop and Spark Developer

Course Introduction
About Simplilearn
Simplilearn

For over a decade, Simplilearn has focused on digital economy skills.


Now, Simplilearn has become the World’s #1 Online Bootcamp.
Simplilearn

Simplilearn
provides:

Self-paced Interactive labs Real-time,


Live virtual classes (LVCs)
learning content scenario-based projects
What Is Big Data?

Big data is an open-source software framework for storing data and executing applications on
commodity hardware clusters.
Why Big Data?

01
Better career
scope
02 Any data, at any
time, and on
any device
03
Ease of use

04 Exponential
growth of
data
05
High
salaries
Apache Spark

Apache Spark is an open-source cluster computing framework for real-time data processing.
It contains the following components:
Why Apache Spark?

More than 91% of companies use Apache Spark because of its


performance gains. It has:

Huge Global Fading


demand standards MapReduce

Integration with Developer


Hadoop community
Demand for Big Data and Apache Spark

Globally recognized Accelerated career growth


certificate

Increased job selection


probability
Demand for Big Data and Apache Spark

The demand for Big data is increasing in various data science fields. In the future, it is
expected that this demand will continue to grow significantly.

Market volume (In billion US dollars) 103


100 96
90
84
80 77
70
64
60 56
49
42
40
32

20

Source: https://appinventiv.com/blog/spark-vs-hadoop-big-data-frameworks/
Companies Hiring Data Engineers

Many companies around the world hire data engineers. These include:
Career Opportunities

Data Engineer Apache Spark Application


Developer

Big Data Developer Spark Developer

Hadoop or Spark
Developer
Prerequisites

Prior knowledge and understanding of the following languages:

JAVA SQL
Simplilearn Program Features
Program Features

The blended learning program is a combination of:

Self-paced learning
content
Live virtual classes
(LVCs)

Hands-on exercises
Program Features

The program contains the following features:

Theoretical concepts Case studies

Integrated labs Projects


Program Features

The class sizes are limited to foster maximum interaction.


Target Audience

Students IT Professionals Data Engineers


Learning Path
Course Outline

The outline of the course helps to understand the path of Big data Hadoop and
Spark developers.

1. Course Introduction

6. Apache Hive
2. Introduction to Big Data
and Hadoop
7. Pig-Data Analysis Tool
3. HDFS: The Storage Layer

8. NoSQL Databases:
4. Distributed Processing: HBase
MapReduce Framework
9. Data Ingestion into Big
5. MapReduce: Advanced Data Systems and ETL
Concepts
10. YARN Introduction
Course Outline

11. Introduction to Python


for Apache Spark
16. Spark SQL and Data
12. Functions, OOPS, and Frames
Modules in Python
17. Machine Learning Using
13. Big Data and the Need Spark ML
for Spark
18. Stream Processing Frameworks
14. Deep Dive into and Spark Streaming
Apache Spark Framework
19. Spark Structured
15. Working with Spark Streaming
RDDs
20. Spark GraphX
Course Components
Course Components

E-books: All lessons are available as downloadable


PDF files for quick reference guides.

Assisted practices: These will assist you in developing


abilities that will make you an asset to any business.
Course Components

Assessments: There are over 100 questions to


assess your knowledge.

Projects: Lesson-end and course-end projects


provide real-time and industry-based examples.
Course Completion Criteria

The learner needs to complete:

85% OSL or 80% Course-end At least one project


LVC classes assessment
Course Outcomes

By the end of this course, you will be able to:

• Create an interaction between users and Hadoop


Distributed File System using Hive
• Create an internal and external Hive table
structure to read data from different formats
• Execute batch jobs using MapReduce frameworks
• Work with real-time streaming data pipelines and
applications using Kafka
Course Outcomes

By the end of this course, you will be able to:

• Create Spark applications using Spark 3.x cluster


and client mode
• Determine the components of Spark machine
learning and GraphX
• Create and execute a real-time pipeline using
Spark streaming and structured streaming
• Analyze the appropriate tools based on the data
trends
Let’s get started!

You might also like