Azure Data Engineer + Databricks Content
Azure Data Engineer + Databricks Content
DataBricks
Spark
Pyspark
Spark SQL
Delta Lake
Azure Data Factory
Azure Synapse DW (Dedicated SQL POOL)
Azure ADF & Databricks Projects
Azure Databricks Concepts.
3) Data Management
A. Databricks File System. - DBFS commands copy and manage files using DBFS.
B. Database - Creating database, tables and managing databases and tables.
C. Table - Creating Tables, dropping tables, loading data ..
D. Metastore - managing metadata and delta tables creation, managing delta tables.
4) Computation Management
A. Cluster -- Creating Clusters , managing clusters
B. Pool - creating pools and using pools for Auto scaling.
C. Databricks RunTime - understanding and using Databricks runtimes based on requirement.
D. Jobs - creating jobs from notebooks and assigning types of clusters for jobs.
E. Workload - monitoring jobs and managing loads.
F. Execution Context – understanding context.
SPARK Concepts
PySpark Content
Introduction To Pyspark
1) What is SparkSession
2) How to create spark session
3) What is SparkContext
4) How to create SparkContext
5) What is SQLContext
How to Use Jupyter Notebooks & Databricks notebooks for Python Development.
Install and configure PySpark in Local System for development.
Introduction to Big Data and Apache Spark
Apache Spark Framework & Execution Process.
Introduction To RDDs
1) Different Ways to Create RDD’s in Pyspark.
2) RDD Transformations
YouTube Channel: techlake
3) RDD Actions
4) RDD Cache & Persist
Introduction to DataFrame.
1) Different Ways to Create Data Frame’s in Pyspark.
2) Dataframe Transformations
3) Dataframe Actions
4) Dataframe Cache & Persist
DELTA LAKE
SPARK SQL: