0% found this document useful (0 votes)
42 views

Data Analytics Course 3

Course 3 Module 3

Uploaded by

julianoftheeast
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Data Analytics Course 3

Course 3 Module 3

Uploaded by

julianoftheeast
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Databases in data analytics

Databases enable analysts to manipulate, store, and process data. This helps them search through data a lot
more efficiently to get the best insights.

Relational databases
A relational database is a database that
contains a series of tables that can be
connected to show relationships. Basically,
they allow data analysts to organize and link
data based on what the data has in common.

In a non-relational table, you will find all of the possible variables you might be interested in analyzing all
grouped together. This can make it really hard to sort through. This is one reason why relational databases are
so common in data analysis: they simplify a lot of analysis processes and make data easier to find and use
across an entire database.

Database Normalization
Normalization is a process of organizing data in a relational database. For example, creating tables and
establishing relationships between those tables. It is applied to eliminate data redundancy, increase data
integrity, and reduce complexity in a database.

The key to relational databases


Tables in a relational database are connected by the fields they have in common. You might remember learning
about primary and foreign keys before. As a quick refresher, a primary key is an identifier that references a
column in which each value is unique. In other words, it's a column of a table that is used to uniquely identify
each record within that table. The value assigned to the primary key in a particular row must be unique within the
entire table. For example, if customer_id is the primary key for the customer table, no two customers will ever
have the same customer_id.

By contrast, a foreign key is a field within a table that is a primary key in another table. A table can have only one
primary key, but it can have multiple foreign keys. These keys are what create the relationships between tables
in a relational database, which helps organize and connect data across multiple tables in the database.

Some tables don't require a primary key. For example, a revenue table can have multiple foreign keys and not
have a primary key. A primary key may also be constructed using multiple columns of a table. This type of
primary key is
called a composite
key. For example, if
customer_id and
location_id are
two columns of a
composite key for a
customer table,
the values
assigned to those
fields in any given
row must be
unique within the
entire table.
SQL? You’re speaking my language
Databases use a special language to communicate called a query language. Structured Query Language (SQL)
is a type of query language that lets data analysts communicate with a database. So, a data analyst will use
SQL to create a query to view the specific data that they want from within the larger set. In a relational database,
data analysts can write queries to get data from the related tables. SQL is a powerful tool for working with
databases — which is why you are going to learn more about it coming up!

Metadata is as important as
the data itself
Data analytics, by design, is a field that thrives on collecting and
organizing data. In this reading, you are going to learn about how to
analyze and thoroughly understand every aspect of your data.

Take a look at any data you find. What is it? Where did it come from?
Is it useful? How do you know? This is where metadata comes in to
provide a deeper understanding of the data. To put it simply, metadata
is data about data. In database management, it provides information
about other data and helps data analysts interpret the contents of the
data within a database.

Regardless of whether you are working with a large or small quantity of data, metadata is the mark of a
knowledgeable analytics team, helping to communicate about data across the business and making it easier to
reuse data. In essence, metadata tells the who, what, when, where, which, how, and why of data.

Elements of metadata
Before looking at metadata examples, it is important to understand what type of information metadata typically
provides.

Title and description


What is the name of the file or website you are examining? What type of content does it contain?

Tags and categories


What is the general overview of the data that you have? Is the data indexed or described in a specific way?

Who created it and when


Where did the data come from, and when was it created? Is it recent, or has it existed for a long time?

Who last modified it and when


Were any changes made to the data? If yes, were the modifications recent?

Who can access or update it


Is this dataset public? Are special permissions needed to customize or modify the dataset?

Examples of metadata
In today’s digital world, metadata is everywhere, and it is becoming a more common practice to provide
metadata on a lot of media and information you interact with. Here are some real-world examples of where to
find metadata:

Photos
Whenever a photo is captured with a camera, metadata such as camera filename, date, time, and geolocation
are gathered and saved with it.

Emails
When an email is sent or received, there is lots of visible metadata such as subject line, the sender, the recipient
and date and time sent. There is also hidden metadata that includes server names, IP addresses, HTML format,
and software details.

Spreadsheets and documents


Spreadsheets and documents are already filled with a considerable amount of data so it is no surprise that
metadata would also accompany them. Titles, author, creation date, number of pages, user comments as well
as names of tabs, tables, and columns are all metadata that one can find in spreadsheets and documents.

Websites
Every web page has a number of standard metadata fields, such as tags and categories, site creator’s name,
web page title and description, time of creation and any iconography.

Digital files
Usually, if you right click on any computer file, you will see its metadata. This could consist of file name, file size,
date of creation and modification, and type of file.

Books
Metadata is not only digital. Every book has a number of standard metadata on the covers and inside that will
inform you of its title, author’s name, a table of contents, publisher information, copyright description, index, and
a brief description of the book’s contents.

Data as you know it


Knowing the content and context of your data, as well as how it is structured, is very valuable in your career as a
data analyst. When analyzing data, it is important to always understand the full picture. It is not just about the
data you are viewing, but how that data comes together. Metadata ensures that you are able to find, use,
preserve, and reuse data in the future. Remember, it will be your responsibility to manage and make use of data
in its entirety; metadata is as important as the data itself.

Metadata and metadata repositories


As you’re learning, metadata is data about data. It clearly describes how and when data was collected and how
it’s organized. Metadata puts data into context and makes the data more understandable. This helps data
analysts use data to solve problems and make informed business decisions.

In this reading, you’ll learn more about the benefits of metadata, metadata repositories, and metadata of external
databases.
The benefits of metadata
Reliability
Data analysts use reliable and high-quality data to identify the root causes of any problems that might occur
during analysis and to improve their results. If the data being used to solve a problem or to make a data-driven
decision is unreliable, there’s a good chance the results will be unreliable as well.

Metadata helps data analysts confirm their data is reliable by making sure it is:

 Accurate

 Precise

 Relevant

 Timely

It does this by helping analysts ensure that they’re working with the right data and that the data is described
correctly. For example, a data analyst completing a project with data from 2022 can use metadata to easily
determine if they should use data from a particular file.

Consistency
Data analysts thrive on consistency and aim for uniformity in their data and databases, and metadata helps
make this possible. For example, to use survey data from two different sources, data analysts use metadata to
make sure the same collection methods were applied in the survey so that both datasets can be compared
reliably.

When a database is consistent, it’s easier to discover relationships between the data inside the database and
data that exists elsewhere. When data is uniform, it is:

 Organized: Data analysts can easily find tables and files, monitor the creation and alteration of assets,
and store metadata.

 Classified: Data analysts can categorize data when it follows a consistent format, which is beneficial in
cleaning and processing data.

 Stored: Consistent and uniform data can be efficiently stored in various data repositories. This
streamlines storage management tasks such as managing a database.

 Accessed: Users, applications, and systems can efficiently locate and use data.

Together, these benefits empower data analysts to effectively analyze and interpret their data.

Metadata repositories
Metadata repositories help data analysts ensure their data is reliable and consistent.

Metadata repositories are specialized databases specifically created to store and manage metadata. They can
be kept in a physical location or a virtual environment—like data that exists in the cloud.

Metadata repositories describe where the metadata came from and store that data in an accessible form with a
common structure. This provides data analysts with quick and easy access to the data. If data analysts didn’t
use a metadata repository, they would have to select each file to look up its information and compare the data
manually, which would waste a lot of time and effort.
Data analysts also use metadata repositories to bring together multiple sources for data analysis. Metadata
repositories do this by describing the state and location of the data, the structure of the tables inside the data,
and who accessed the user logs.

Metadata of external databases


Data analysts use both second-party and third-party data to gain valuable insights and make strategic, data-
driven decisions. Second-party data is data that’s collected by a group directly from the group’s audience and
then sold. Third-party data is provided by outside sources that didn’t collect it directly. The providers of this data
are not its original collectors and do not have a direct relationship with any individuals to whom the data belongs.
The outside providers get the data from websites or other programs that pull it from the various platforms where
it was originally generated.

Data analysts should understand the metadata of external databases to confirm that it is consistent and reliable.
In some cases, they should also contact the owner of the third-party data to confirm that it is accessible and
available for purchase. Confirming that the data is reliable and that the proper permissions to use it have been
obtained are best practices when using data that comes from another organization.

Key takeaways
Metadata helps data analysts make data-driven decisions more quickly and efficiently. It also ensures that data
and databases are reliable and consistent.

Metadata repositories are used to store metadata—including data from second-party and third-party companies.
These repositories describe the state and location of the metadata, the structure of the tables inside it, and who
has accessed the repository. Data analysts use metadata repositories to ensure that they use the right data
appropriately.

Import data dynamically


As you’ve learned, you can import data from some data sources, like .csv files into a Google spreadsheet from
the File menu. Keep in mind that, when you use this method, data that is updated in the .csv will not
automatically be updated in the Google Sheet. Instead, it will need to be manually—and continually—updated in
the Google Sheet. In some situations, such as when you want to be able to keep track of changes you’ve made,
this method is ideal. In other situations, you might need to keep the data the same in both places, and using data
that doesn’t update automatically can be time-consuming and tedious. Further, trying to maintain the same
dataset in multiple places can cause errors later on.

Fortunately, there are tools to help you automate data imports so you don’t need to continually update the data
in your current spreadsheet. Take a small general store as an example. The store has three cash registers
handled by three clerks. At the end of each day, the owner wants to determine the total sales and the amount of
cash in each register. Each clerk is responsible for counting their money and entering their sales total into a
spreadsheet. The owner has the spreadsheets set up to import each clerks’ data into another spreadsheet,
where it automates and calculates the total sales for all three registers. Without this automation, each clerk
would have to take turns entering their data into the owner’s spreadsheet. This is an example of a dynamic
method of importing data, which saves the owner and clerks time and energy. When data is dynamic, it is
interactive and automatically changes and updates over time.

In the following sections you’ll learn how to import data into Google Sheets dynamically.

IMPORT functions in Google Sheets


The IMPORTRANGE function
In Google Sheets, the IMPORTRANGE function can import all or part of a dataset from another Google Sheet.
To use this function, you need two pieces of information:

1. The URL of the Google Sheet from which you’ll import data.
2. The name of the sheet and the range of cells you want to import into your Google Sheet.

Once you have this information, open the Google Sheet into which you want to import data and select the cell
into which the first cell of data should be copied. Enter = to indicate you will enter a function, then complete the
IMPORTRANGE function with the URL and range you identified in the following manner:
=IMPORTRANGE("URL", "sheet_name!cell_range"). Note that an exclamation point separates the sheet name
and the cell range in the second part of this function.

An example of this function is:

=IMPORTRANGE("https://docs.google.com/thisisatestabc123", "sheet1!A1:F13")

Note: This URL is for syntax purposes only. It is not meant to be entered into your own spreadsheet.

Once you’ve completed the function, a box will pop up to prompt you to allow access to the Google Sheet from
which you’re importing data. You must allow access to the spreadsheet containing the data the first time you
import it into Google Sheets. Replace it with a spreadsheet’s URL that you have created so you can control
access by selecting the Allow access button.

Refer to the Google Help Center's IMPORTRANGE page for more information about the syntax. You’ll also learn
more about this later in the program.

The IMPORTHTML function


Importing HTML tables is a basic method to extract data from public web pages. This process is often called
“scraping.” Web scraping made easy introduces how to do this with Google Sheets or Microsoft Excel.

In Google Sheets, you can use the IMPORTHTML function to import the data from an HTML table (or list) on a
web page. This function is similar to the IMPORTRANGE function. Refer to the Google Help Center's
IMPORTHTML page for more information about the syntax.

The IMPORTDATA function


Sometimes data displayed on the web is in the form of a comma- or tab-delimited file.

You can use the IMPORTDATA function in a Google Sheet to import data into a Google Sheet. This function is
similar to the IMPORTRANGE function. Refer to Google Help Center's IMPORTDATA page for more information
and the syntax.

From external source to a spreadsheet


When you work with spreadsheets, there are a few different ways to import data. This reading covers how you
can import data from external sources, specifically:

 Other spreadsheets
 CSV files
 HTML tables (in web pages)
Importing data from other spreadsheets
In a lot of cases, you might have an existing spreadsheet open and need to add additional data from another
spreadsheet.

Google Sheets
In Google Sheets, you can use the IMPORTRANGE function. It enables you to specify a range of cells in the
other spreadsheet to duplicate in the spreadsheet you are working in. You must allow access to the spreadsheet
containing the data the first time you import the data. The URL shown below is for syntax purposes only. Don't
enter it in your own spreadsheet. Replace it with a URL to a spreadsheet you have created so you can control access
to it by clicking the Allow access button.

Refer to the Google Help Center's IMPORTRANGE page for more information about the syntax. There is also an
example of its use later in the program in Advanced functions for speedy data cleaning.

Microsoft Excel
To import data from another spreadsheet, do the following:

Step 1: Select Data from the main menu.

Step 2: Click Get Data, and then select From File within the toolbar. In the drop down, choose From Excel
Workbook

Step 3: Browse for and select the spreadsheet file and then click Import.

Step 4: In the Navigator, select which worksheet to import.

Step 5: Click Load to import all the data in the worksheet; or click Transform Data to open the Power Query
Editor to adjust the columns and rows of data you want to import.

Step 6: If you clicked Transform Data, click Close & Load and then select one of the two options:

 Close & Load - import the data to a new worksheet


 Close & Load to... - import the data to an existing worksheet
If these directions do not work for the version of Excel that you have. Visit
this free online training center, Microsoft Excel for Windows Training, you
will find everything you need to know, all in one place.

If you are using Numbers, search the Numbers User Guide for directions.

Importing data from CSV files


Google Sheets
Step 1: Open the File menu in
your spreadsheet and select
Import to open the Import file
window.

Step 2: Select Upload and then


select the CSV file you want to
import.
Step 3: From here, you will have a few options. For Import
location, you can choose to replace the current
spreadsheet, create a new spreadsheet, insert the CSV
data as a new sheet, add the data to the current
spreadsheet, or replace the data in a specific cell. The
data will be inserted as plain text only if you uncheck the
Convert text to numbers, dates, and formulas checkbox,
which is the default setting. Sometimes a CSV file uses a
separator like a semi-colon or even a blank space instead
of a comma. For Separator type, you can select Tab or
Comma, or select Custom to enter another character that
is being used as the separator.

Step 4: Select Import data. The data in the CSV file will be loaded into your sheet, and you can begin using it!

Note: You can also use the IMPORTDATA function in a spreadsheet cell to import data using the URL to a CSV
file. Refer to Google Help Center's IMPORTDATA page for more information and the syntax.

Microsoft Excel
Step 1: Open a new or existing spreadsheet

Step 2: Click Data in the main menu and select the From Text/CSV option.

Step 3: Browse for and select the CSV file and then click Import.

Step 4: From here, you will have a few options. You can change the delimiter from a comma to another character
such as a semicolon. You can also turn automatic data type detection on or off. And, finally, you can transform
your data by clicking Transform Data to open the Power Query Editor.

Step 5: In most cases, accept the default settings in the previous step and click Load to load the data in the CSV
file to the spreadsheet. The data in the CSV file will be loaded into the spreadsheet, and you can begin working
with the data.

If these directions do not work for the version of Excel that you have. Visit this free online training center,
Microsoft Excel for Windows Training, you will find everything you need to know, all in one place.

If you are using Numbers, search the Numbers User Guide for directions.

Importing HTML tables from web pages


Importing HTML tables is a very basic method to extract or "scrape" data from public web pages. Web scraping
made easy introduces how to do this with Google Sheets or Microsoft Excel.

Google Sheets
In Google Sheets, you can use the IMPORTHTML function. It enables you to import the data from an HTML
table (or list) on a web page.
Refer to the Google Help Center's IMPORTHTML page for more information about the syntax. If you are
importing a list, replace "table" with "list" in the above example. The number 4 is the index that refers to the order
of the tables on a web page. It is like a pointer indicating which table on the page you want to import the data
from.

You can try this yourself! In blank worksheets, copy and paste each of the following IMPORTHTML functions into
cell A1 and watch what happens. You will actually be importing the data from four different HTML tables in a
Wikipedia article: Demographics of India. You can compare your imported data with the tables in the article.

 =IMPORTHTML("http://en.wikipedia.org/wiki/Demographics_of_India","table",1)
 =IMPORTHTML("http://en.wikipedia.org/wiki/Demographics_of_India","table",2)
 =IMPORTHTML("http://en.wikipedia.org/wiki/Demographics_of_India","table",3)
 =IMPORTHTML("http://en.wikipedia.org/wiki/Demographics_of_India","table",4)

Microsoft Excel
You can import data from web pages using the From Web option:

Step 1: Open a new or existing spreadsheet.

Step 2: Click Data in the main menu and select the From Web option.

Step 3: Enter the URL and click OK.

Step 4: In the Navigator, select which table to import.

Step 5: Click Load to load the data from the table into your spreadsheet.

If these directions do not work for the version of Excel that you have. Visit this free online training center,
Microsoft Excel for Windows Training, you will find everything you need to know, all in one place.

If you are using Numbers, search the Numbers User Guide for directions.

Exploring public datasets


Open data helps create a lot of public datasets that you can access to make data-driven decisions. Here are
some resources you can use to start searching for public datasets on your own:

 The Google Cloud Public Datasets allow data analysts access to high-demand public datasets, and
make it easy to uncover insights in the cloud.
 The Dataset Search can help you find available datasets online with keyword searches.
 Kaggle has an Open Data search function that can help you find datasets to practice with.
 Finally, BigQuery hosts 150+ public datasets you can access and use.

Public health datasets

1. Global Health Observatory data: You can search for datasets from this page or explore featured data
collections from the World Health Organization.
2. The Cancer Imaging Archive (TCIA) dataset: Just like the earlier dataset, this data is hosted by the
Google Cloud Public Datasets and can be uploaded to BigQuery.
3. 1000 Genomes: This is another dataset from the Google Cloud Public resources that can be uploaded
to BigQuery.

Public climate datasets

1. National Climatic Data Center: The NCDC Quick Links page has a selection of datasets you can
explore.
2. NOAA Public Dataset Gallery: The NOAA Public Dataset Gallery contains a searchable collection
of public datasets.

Public social-political datasets

1. UNICEF State of the World’s Children: This dataset from UNICEF includes a collection of tables
that can be downloaded.
2. CPS Labor Force Statistics: This page contains links to several available datasets that you can
explore.
3. The Stanford Open Policing Project: This dataset can be downloaded as a .CSV file for your own
use.

Cleaning data is an important part of the data analysis process. If data analysis is based on bad or dirty data, it
may be biased, erroneous, and uninformed. Sorting and filtering are essential skills for every data analyst, and are
also very useful for cleaning data.

Think about everything you’ve learned about spreadsheets and databases. In many ways, they are
similar. In other ways, they are different.

For example, both spreadsheets and databases store and organize data. However, databases can be
relational while spreadsheets cannot. This means that spreadsheets are better-suited to self-contained
data, where the data exists in one place. Meanwhile, you can use databases to store data from external
tables, allowing you to change data in several places by editing in only one place.

Take a moment to consider these examples and come up with a few of your own. Here are some areas
you may want to consider:

 How do they store data?


 How are they used to interact with data?
 How powerful is each?
 What are their pros and cons when sorting?
 What are their pros and cons when filtering?
Question Spreadsheet
How do they store data? Stores data in cells.
How are they used to interact with data?
As you consider each of these questions, compile them into a simple table. You can use pen and paper or
your preferred spreadsheet software. Add the question on the left, and compare and contrast
spreadsheets and databases on the right. Your table may look something like this:
Using BigQuery
BigQuery is a data warehouse on Google Cloud that data analysts can use to query, filter large datasets,
aggregate results, and perform complex operations.

An upcoming activity is performed in BigQuery. This reading provides instructions to create your own BigQuery
account, select public datasets, and upload CSV files. At the end of this reading, you can confirm your access to
the BigQuery console before you move on to the activity,

Note: Additional getting started resources for a few other SQL database platforms are also provided at the end of
this reading if you choose to work with them instead of BigQuery.

Types of BigQuery accounts


There are two different types of accounts: sandbox and free trial. A sandbox account allows you to practice
queries and explore public datasets for free, but has additional restrictions on top of the standard quotas and
limits. If you prefer to use BigQuery with the standard limits, you can set up a free trial account instead. More
details:

 A free sandbox account doesn’t ask for a method of payment. It does, however, limit you to 12 projects. It
also doesn't allow you to insert new records to a database or update the field values of existing
records. These data manipulation language (DML) operations aren't supported in the sandbox.
 A free trial account requires a method of payment to establish a billable account, but offers full
functionality during the trial period.
With either type of account, you can upgrade to a paid account at any time and retain all of your existing
projects. If you set up a free trial account but choose not to upgrade to a paid account when your trial period
ends, you can still set up a free sandbox account at that time. However, projects from your trial account won't
transfer to your sandbox account. It would be like starting from scratch again.

Set up a free sandbox account for use in this program


 Follow these step-by-step instructions or watch the video, Setting up BigQuery, including sandbox and
billing options.
 For more detailed information about using the sandbox, start with the documentation, Using the
BigQuery sandbox.
 After you set up your account, you will see the project name you created for the account in the banner
and SANDBOX at the top of your BigQuery console.

Set up a free trial account instead (if you prefer)


If you prefer not to have the sandbox limitations in BigQuery, you can set up a free trial account for use in this
program.

 Follow these step-by-step instructions or watch the video, Setting up BigQuery, including sandbox and
billing options. The free trial offers $300 in credit over the next 90 days. You won’t get anywhere near
that spending limit if you just use the BigQuery console to practice SQL queries. After you spend the
$300 credit (or after 90 days) your free trial will expire and you will need to personally select to upgrade
to a paid account to keep using Google Cloud Platform services, including BigQuery. Your method of
payment will never be automatically charged after your free trial ends. If you select to upgrade your
account, you will begin to be billed for charges.
 After you set up your account, you will see My First Project in the banner and the status of your account
above the banner – your credit balance and the number of days remaining in your trial period.
How to get to the BigQuery console
In your browser, go to console.cloud.google.com/bigquery.

Note: Going to console.cloud.google.com in your browser takes you to the main dashboard for the Google Cloud
Platform. To navigate to BigQuery from the dashboard, do the following:

 Click the Navigation menu icon (Hamburger icon) in the banner.


 Scroll down to the BIG DATA section.
 Click BigQuery and select SQL workspace.
Watch the How to use BigQuery video for an introduction to each part of the BigQuery SQL workspace.

(Optional) Explore a BigQuery public dataset


You will be exploring a public dataset in an upcoming activity, so you can perform these steps later if you prefer.

 Refer to these step-by-step instructions.

(Optional) Upload a CSV file to BigQuery


These steps are provided so you can work with a dataset on your own at this time. You will upload CSV files to
BigQuery later in the program.

 Refer to these step-by-step instructions.

Getting started with other databases (if not using BigQuery)


It is easier to follow along with the course activities if you use BigQuery, but if you are connecting to and
practicing SQL queries on other database platforms instead of BigQuery, here are similar getting started
resources:

 Getting started with MySQL: This is a guide to setting up and using MySQL.
 Getting started with Microsoft SQL Server: This is a tutorial to get started using SQL Server.
 Getting started with PostgreSQL: This is a tutorial to get started using PostgreSQL.
 Getting started
with SQLite:
This is a quick
start guide for
using SQLite.
Set up your BigQuery account
As you’ve been learning, BigQuery is a database you can use to access, explore, and analyze data from many
sources. Now, you’ll begin using BigQuery, which will help you gain SQL knowledge by typing out commands
and troubleshooting errors. This reading will guide you through the process of setting up your very own BigQuery
account.

Note: Working with BigQuery is not a requirement of this program. Additional resources for other SQL database
platforms are also provided at the end of this reading if you choose to use them instead.

BigQuery account options


BigQuery offers a variety of account tiers to cater to various user needs and has two free-of-charge entry points,
a sandbox account and a free-of-charge trial account. These options allow you to explore the program before
selecting the best choice to suit your needs. A sandbox account allows you to practice writing queries and to
explore public datasets free of charge, but it has quotas and limits, as well as some additional restrictions. If you
prefer to use BigQuery with the standard limits, you can set up a free-of-charge trial account instead. The free-
of-charge trial is a trial period prior to paying for a subscription. In this instance, there is no automatic charge, but
you will be asked for payment information when you create the account.

This reading provides instructions for setting up either account type. An effective first step is to begin with a
sandbox account and switch to a free-of-charge trial account when needed to run the SQL presented upcoming
courses.

Sandbox account
The sandbox account is available at no cost, and anyone with a Google account can use it. However, it does
have some limitations. For instance, you are limited to a maximum of 12 projects at a time. This means that, to
create a 13th project, you'll need to delete one of your existing 12 projects. Additionally, the sandbox account
doesn't support all operations you’ll do in this program. For example, there are limits on the amount of data you
can process and you can’t insert new records into a database or update the values of existing records. However,
a sandbox account is perfect for most program activities, including all of the activities in this course. Additionally,
you can convert your sandbox account into a free-of-charge trial account at any time.

Set up your sandbox account

To set up a sandbox account:

1. Visit the BigQuery sandbox documentation page.


2. Log in to your preferred Google account by selecting the profile icon in the BigQuery menu bar.
3. Select the Go to BigQuery button on the documentation page.
4. You'll be prompted to select your country and read the terms of service agreement.
5. This will bring you to the SQL Workspace, where you'll be conducting upcoming activities. By default,
BigQuery creates a project for you.
After you set up your account, the name of the project will be in the banner in your BigQuery console.

Free-of-charge trial
If you wish to explore more of BigQuery's capabilities with fewer limitations, consider the Google Cloud Free
Trial. It provides you with $300 in credit for Google Cloud usage during the first 90 days. If you're primarily using
BigQuery for SQL queries, you're unlikely to come close to this spending limit. After you've used up the $300
credit or after 90 days, your free trial will expire, and you will only be able to use this account if you pay to do so.
Google won't automatically charge your payment method when the trial ends. However, you'll need to set up a
payment option with Google Cloud. This means that you’ll need to enter your financial information. Rest assured,
it won't charge you unless you consciously opt to upgrade to a paid account. If you're uncomfortable providing
payment information, don't worry; you can use the BigQuery sandbox account instead.

Set up your free-of-charge trial

1. Go to the BigQuery page.


2. Select Try BigQuery free.
3. Log in using your Google email, or create an account free of charge if you don't have one. Click here to
create an account.
4. Select your country, a description of your organization or needs, and the checkbox to accept the terms of
service, Then select CONTINUE.
5. Enter your billing information and select START MY FREE TRIAL.
After you set up your account, your first project, titled My First Project will be in the banner.

Transferring between BigQuery accounts


With either a sandbox or free-of-charge trial account, you have the flexibility to upgrade to a paid account at any
time. If you upgrade, all your existing projects will be retained and transferred to your new account. If you started
with a free-of-charge trial, but choose not to upgrade when it ends, you can switch to a sandbox account.
However, note that projects from your trial won't transfer to your sandbox. Essentially, creating a sandbox is like
starting from scratch.

Get started with other databases (if not using BigQuery)


It’s easiest to follow along with the course activities if you use BigQuery, but you may use other SQL platforms, if
you prefer. If you decide to practice SQL queries on other database platforms, here are some resources to get
started:

 Getting Started with MySQL


 Getting Started with Microsoft SQL Server
 Getting Started with PostgreSQL
 Getting Started with SQLite

Key takeaways
BigQuery offers multiple account options. Keep the following in mind when you choose an account type:
 Account tiers: BigQuery provides various account tiers to cater to a wide range of user requirements.
Whether you're starting with a sandbox account or exploring a paid account with the free-of-charge trial
option, BigQuery offers flexibility to choose the option that aligns best with your needs and budget.
 Sandbox limitations: While a sandbox account is a great starting point, it comes with some limitations,
such as a cap on the number of projects and restrictions on data manipulation operations like inserting or
updating records, which you will encounter later in this program. Be aware of these limitations if you
choose to work through this course using a sandbox account.
 Easy setup and upgrades: Getting started with any BigQuery account type is quick and easy. And if your
needs evolve, you have the flexibility to modify your account status at any time. Additionally, projects can
be retained even when transitioning between account types.
Choose the right BigQuery account type to match your specific needs and adapt as your requirements change!
Go to next item

Get started with BigQuery


BigQuery is a data warehouse on the Google Cloud Platform used to query and filter large datasets,
aggregate results, and perform complex operations. Throughout this program, you’re going to use BigQuery
to practice your SQL skills and collect, prepare, and analyze data. At this point, you have set up your own
account. Now, explore some of the important elements of the SQL workspace. This will prepare you for the
upcoming activities in which you will use BigQuery. Note that BigQuery updates its interface frequently, so
your console might be slightly different from what is described in this reading. That’s okay; use your
troubleshooting skills to find what you need!

Log in to BigQuery
When you log in to BigQuery using the landing page, you will automatically open your project space. This
is a high-level overview of your project, including the project information and the current resources being
used. From here, you can check your recent activity.
Navigate to your project’s BigQuery Studio by selecting BigQuery from the navigation menu and BigQuery
Studio from the dropdown menu.

BiqQuery Studio components


Once you have navigated to BigQuery from the project space, most of the major components of the
BigQuery console will be present: the Navigation pane, the Explorer pane, and the SQL Workspace.

The Navigation
pane

On the console
page, find the
Navigation pane.
This is how you
navigate from the
project space to the
BigQuery tool. This
menu also contains
a list of other
Google Cloud
Project (GCP) data tools. During this program, you will focus on BigQuery, but it’s useful to understand that
the GCP has a collection of connected tools data professionals use
every day.

The Explorer pane

The Explorer pane lists your current projects and any starred projects
you have added to your console. It’s also where you’ll find the + ADD
button, which you can use to add datasets.
This button opens the Add dialog that allows you to open or import a variety of datasets.

Add Public Datasets

BigQuery offers a variety of


public datasets from the Google
Cloud Public Dataset Program.
Scroll down the Add dialog to the Public Datasets option.

Select Public Datasets. This takes you to the Public Datasets Marketplace, where you can search for and
select public datasets to add to your BigQuery console. For example, search for the "noaa lightning" dataset
in the Marketplace search bar. When you search for this dataset, you will find NOAA’s Cloud-to-Ground
Lightning Strikes data.

Select the
dataset to read
its
description.
Select View
dataset to
create a tab of
the dataset’s
information
within the
SQL
workspace.

The Explorer
Pane lists the
noaa_lightning
and other public
datasets.

Star and examine Public Datasets


You added the public noaa_lightning dataset to your BigQuery Workspace, so the Explorer pane displays
the noaa_lightning dataset, along with the list of other public datasets. These datasets are nested under
bigquery-public-data. Star bigquery-public-data by navigating to the top of the Explorer pane and selecting
the star next to bigquery-public-data.

Starring bigquery-public-data will enable you to search for and add public datasets by scrolling in the
Explorer pane or by searching for them in the Explorer search bar.

For example, you might want to select a different public dataset. If you select the second dataset,
"austin_311," it will expand to list the table stored in it, “311_service_requests.”

The Explorer pane with the “bigquery-public data” and


“austin_311” datasets expanded, revealing the
“311_service_requests” table
When you select a table, its information is displayed in the
SQL Workspace. Select the 311_service_requests table to
examine several tabs that describe it, including:

 Schema, which displays the column names in the


dataset
 Details, which contains additional metadata, such as
the creation date of the dataset
 Preview, which shows the first rows from the dataset

Additionally, you can select the Query button from the menu
bar in the SQL Workspace to query this table.

The SQL Workspace

The final menu pane in your console is the SQL Workspace. This is where you will actually write and
execute queries in BigQuery.
The SQL Workspace also gives you
access to your personal and project
history, which stores a record of the
queries you’ve run. This can be
useful if you want to return to a
query to run it again or use part of it
in another query.

Upload your data


In addition to offering access to public datasets, BigQuery also gives you the ability to upload your own data
directly into your workspace. Access this feature by opening the + ADD menu again or by clicking the three
vertical dots next to your project’s name in the Explorer pane. This will give you the option to create your
own dataset and upload your own tables. You will have the opportunity to upload your own data in an
upcoming activity to practice using this feature!

Key takeaways
BigQuery's SQL workspace allows you to search for public datasets, run SQL queries, and even upload your
own data for analysis. Whether you're working with public datasets, running SQL queries, or uploading your
own data, BigQuery’s SQL workspace offers a range of features to support all kinds of data analysis tasks.
Throughout this program, you will be using BigQuery to practice your SQL skills, so being familiar with the
major components of your BigQuery console will help you navigate it effectively in the future!

Step-by-Step: BigQuery in action


This reading provides you with the steps the instructor performs in the following video, BigQuery in action. The
video focuses on how to create a query to view a small section of data from a large dataset.

Keep this guide open as you watch the video. It can serve as a helpful reference if you need additional context
or clarification while following the video steps. This is not a graded activity, but you can complete these steps to
practice the skills demonstrated in the video.

What you'll need


To follow along with the examples in this video, log in to your BigQuery account and follow the instructions to
star bigquery-public-data in The Explorer pane section of the previous reading, Get Started with BigQuery.

Example 1: Preview a section from a table viewer


A database is a collection of data stored in a computer system. Query languages such as SQL enable
communication between databases and data analysts. You discovered earlier that a relational database is made
up of several tables that may be joined together to create relationships. Primary and foreign keys serve as
representations of these relationships. To extract data from these tables, data analysts use queries. To learn
more about that, explore BigQuery in action:

1. Log in to BigQuery and go to your console. You should find the Welcome to your SQL Workspace!
landing page open. Select COMPOSE A NEW QUERY In the Bigquery console. Make sure that no tabs
are open so that the entire workspace is displayed, including the Explorer pane.
2. Enter sunroof in the search bar. In the search results, expand sunroof_solar and then select the
solar_potential_by_postal_code dataset.
3. Observe the Schema tab of the Explorer pane to explore the table fields.
4. Select the Preview tab to view the regions, states, yearly sunlight, and more.

Example 2: Writing a query


In order to view the entire dataset, you will need to write a query.

1. The first step is finding out the complete, correct name of the dataset. Select the ellipses by the dataset
solar_potential_by_postal_code, then select Query. A new tab will populate on your screen. Select the
tab. The name of the dataset should be written inside the two backticks.
2. Select the dataset name by highlighting the text including the backticks and copy it.
3. Now, click on the plus sign to create a new query. Notice that BigQuery doesn’t automatically generate a
SELECT statement in this window. Enter SELECT and add a space after it.
4. Put an asterisk * after SELECT to indicate you want to return the entire dataset. The asterisk lets the
database know to include all columns. Without this shortcut, you would have to manually enter every
column name!
5. Next, press the Enter/Return key and Enter FROM on the second line. FROM indicates where the data is
coming from. After FROM, add another space.
6. Paste in the name of the dataset that you copied earlier. It will read `bigquery-public-
data.sunroof_solar.solar_potential_by_postal_code`
7. Execute the query by selecting the RUN button.
Example 3: Use SQL to view a piece of data
If the project doesn’t require every field to be completed, you can use SQL to see a particular piece, or pieces, of
data. To do this, specify a certain column name in the query.

1. For example, you might only need data from Pennsylvania. You’d begin your query the same way you
just did in the previous examples: Click on the plus sign, enter SELECT, add a space, an asterisk (*), and
then press Enter/Return.
2. Enter FROM and then paste `bigquery-public-data.sunroof_solar.solar_potential_by_postal_code`. Press
Enter/Return.
3. This time, add WHERE. It will be on the same line as the FROM statement. Add a space and enter
state_name with a space before state and a space after name.
4. Because you only want data from Pennsylvania, add = and 'Pennsylvania' on the same line as state_name. In
SQL, single quotes represent the beginning and ending of a string.
5. Execute the query with the RUN button.
6. Review the data on solar potential for Pennsylvania. Scroll through the query results.
Keep in mind that SQL queries can be written in a lot of different ways and still return the same results. You
might discover other ways to write these queries!

You might also like