Azure DataBricks
Azure DataBricks
com/Learn
Read and write data in Azure
Databricks
Speaker Name
Title
Speaker Name
Title
Join the chat at
https://aka.ms/LearnLiveTV
Use Azure Databricks to read multiple file types,
Learning both with and without a Schema.
objectives Combine inputs from files and data stores, such as
Azure SQL Database.
Transform and store that data for advanced
analytics.
Unit Prerequisites
Microsoft Azure Account: You will need a valid and active
Azure account for the Azure labs.
• If you are a Visual Studio Active Subscriber, you are entitled to Azure credits per month.
You can refer to this link to find out more including how to activate and start using your
monthly Azure credit.
• If you are not a Visual Studio Subscriber, you can sign up for the FREE
Visual Studio Dev Essentials program to create Azure free account.
Create the required resources
To complete this lab, you will need to deploy an Azure
Databricks workspace in your Azure subscription.
Introduction
Agenda Read data in CSV format
Read data in JSON format
Read data in Parquet format
Read data stored in tables and views
Write data
Agenda Exercises: Read and write data
continued Knowledge check
Summary
Introduction
Introduction
Suppose you're working for a data analytics startup that's
now expanding along with its increasing customer base.
Creating your Databricks
workspace
Deploy an Azure Databricks workspace
Click the following button to open the Azure Resource
Manager Template (ARM) template in the Azure portal.
• Click the following button to open the Azure Resource Manager Template (ARM) template in the Azure portal.
Deploy Databricks from the ARM Template
• Provide the required values to create your Azure Databricks workspace:
• Subscription: Choose the Azure Subscription in which to deploy the workspace.
• Resource Group: Leave at Create new and provide a name for the new resource group.
• Location: Select a location near you for deployment. For the list of regions supported by Azure
Databricks, see Azure services available by region.
• Workspace Name: Provide a name for your workspace.
• Pricing Tier: Ensure premium is selected.
• Accept the terms and conditions.
• Select Purchase.
• The workspace creation takes a few minutes. During workspace creation, the portal displays the Submitting
deployment for Azure Databricks tile on the right side. You may need to scroll right on your dashboard to see the
tile. There is also a progress bar displayed near the top of the screen. You can watch either area for progress.
Create a cluster
When your Azure Databricks workspace creation is complete,
select the link to go to the resource.
Clone the Databricks archive
If you do not currently have your Azure Databricks workspace
open: in the Azure portal, navigate to your deployed Azure
Databricks workspace and select Launch Workspace.
• Select Import.
• Select the 03-Reading-and-writing-
data-in-Azure-Databricks folder that
appears.
Read data in CSV format
Read data in CSV format
In this unit, you need to complete the exercises within a
Databricks Notebook.
Complete the following notebook
Open the 1.Reading Data - CSV notebook.
• Start working with the API documentation
• Introduce the class SparkSession and other entry points
• Introduce the class DataFrameReader
• Read data from:
• CSV without a Schema
• CSV with a Schema
Read data in JSON format
Read data in JSON format
In your Azure Databricks workspace, open the 03-Reading-
and-writing-data-in-Azure-Databricks folder that you
imported within your user folder.
• Read data from:
• JSON without a Schema
• JSON with a Schema
Read data in Parquet format
Read data in Parquet format
In your Azure Databricks workspace, open the 03-Reading-
and-writing-data-in-Azure-Databricks folder that you
imported within your user folder.
• Introduce the Parquet file format
• Read data from:
• Parquet files without a schema
• Parquet files with a schema
Read data stored in tables and
views
Read data stored in tables and views
In your Azure Databricks workspace, open the 03-Reading-
and-writing-data-in-Azure-Databricks folder that you
imported within your user folder.
• Demonstrate how to pre-register data sources in Azure Databricks
• Introduce temporary views over files
• Read data from tables/views
Write data
Write data
In your Azure Databricks workspace, open the 03-Reading-
and-writing-data-in-Azure-Databricks folder that you
imported within your user folder.
• Write data to a Parquet file
• Read the Parquet file back and display the results
Exercise
Exercises: Read and write data
Exercises: Read and write data
In your Azure Databricks workspace, open the 03-Reading-
and-writing-data-in-Azure-Databricks folder that you
imported within your user folder.
Knowledge check
Question 1
How do you list files in DBFS within a notebook?
A. ls /my-file-path
B. %fs dir /my-file-path
C. %fs ls /my-file-path
Question 1
How do you list files in DBFS within a notebook?
A. ls /my-file-path
B. %fs dir /my-file-path
C. %fs ls /my-file-path
Question 2
How do you infer the data types and column names when
you read a JSON file?
A. spark.read.option("inferSchema", "true").json(jsonFile)
B. spark.read.inferSchema("true").json(jsonFile)
C. spark.read.option("inferData", "true").json(jsonFile)
Question 2
How do you infer the data types and column names when
you read a JSON file?
A. spark.read.option("inferSchema", "true").json(jsonFile)
B. spark.read.inferSchema("true").json(jsonFile)
C. spark.read.option("inferData", "true").json(jsonFile)
Summary
Summary
In this module, you learned the basics about reading and
writing data in Azure Databricks.
• Read data from CSV files into a Spark Dataframe
• Provide a Schema when reading Data into a Spark Dataframe
• Read data from JSON files into a Spark Dataframe
• Read Data from parquet files into a Spark Dataframe
• Create Tables and Views
• Write data from a Spark Dataframe
Clean up
If you plan on completing other Azure Databricks modules,
don't delete your Azure Databricks instance yet.
Delete the Azure Databricks instance