0% found this document useful (0 votes)

10 views

Simple Data Infrastructure Using Google Cloud B09K5R1CW2

DATA SCIENCE

Uploaded by

Marcos

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Simple Data Infrastructure Using Google Cloud B09K5R1CW2

DATA SCIENCE

Uploaded by

Marcos

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Introduction.

The purpose of this book is to show you how

to build a simple data analysis infrastructure
using Google Cloud Platform. In general,
building a data analysis infrastructure
requires a great deal of time and money. For
example, there are many procedures and tasks
that need to be completed in addition to the
actual system construction, such as
organizing the business strategies and data
utilization policies of the management
strategy and IT departments within the
company, organizing the data utilization
images required by the user departments, and
then using these images as the basis for
internal budget deliberations and other
procedures and adjustments. In addition to
the actual system construction, many other
procedures and tasks occur.
In the case of a company that does not have a
well-developed internal IT environment and
is not willing to invest in IT, when I
explained to the internal executives that we
were going to build a new data analysis
infrastructure, they said, "Data analysis
infrastructure? What's the use of building
such a thing? or "How much sales will be
generated by building it? It is easy to imagine
that they would point out something that is
not essential. (That's how it was at my
company, too.) However, since the data
analysis infrastructure itself is a system that
deals with "data," which at first glance seems
concrete but is actually abstract, it is very
difficult to clarify the usage to what extent it
can be used when considering the
construction. Thus, what should we do if the
organization does not invest enough in the
use of data? To answer this question, I would
like to recommend that you start building
your data infrastructure with a small start.
This means actually building the data
infrastructure with your own hands.
When you hear the words "build a data
infrastructure", you may have the image of
doing something very difficult. Of course,
data infrastructure has various functions, and
it is not realistic to create it from scratch by
yourself, but if you use cloud services such as
Amazon Web Service (AWS) or Google
Cloud Platform (GCP), it is actually not that
hard. In fact, it's rather easy to do.
Specifically, how to proceed is to assign one
or two people within the company who are
familiar with AWS, GCP, Azure, etc. as PoC
personnel for building the data infrastructure,
build the infrastructure once in about two to
three months, and then actually store the
company's data and perform tasks related to
data utilization (e.g., creating data marts for
BI connection, creating data marts for
advertising operations, etc.). If it works well,
then we can invest a little more to develop it
into a solid infrastructure, which would be
more efficient considering the cost of
building the system.
This book describes how to build a data
infrastructure as simply and simply as
possible for organizations (or individuals)
who are going to build a new data analysis
infrastructure and want to build it easily
without spending a lot of money.
First of all, with this book in hand, let's try to
build a data infrastructure easily using GCP!
October 23, 2021 Data Consulting Lab.
Kanazawa
Chapter1.What is a "data analytics
infrastructure"?
How to build a data analysis
infrastructure
About Data Lake and DWH
Chapter2.Building a Data Lake and DWH
with Cloud Storage and BigQuery
About the data analysis infrastructure
built in this book
Data Lake and DWH Construction
Procedures
Chapter3.Using Cloud Functions to
Automate BigQuery Load Processing
About the image of the loading process to
BigQuery
About the Python code that runs in Cloud
Functions
Steps to automate BigQuery load
processing using Cloud Functions
Chapter4.The cost of building a data
infrastructure
Chapter5.Other things that are necessary
for building a data infrastructure
ETL processing that occurs in data
infrastructure construction
Chapter1.What is a
"data analytics
infrastructure"?
As the name suggests, a data analysis
infrastructure is "an infrastructure for
analyzing data. This may be difficult to
understand, so I'll try to explain it more
carefully.
"We collect a variety of internal data into the
system, use that data to perform aggregation
and analysis according to our objectives, and
then use the results to create and distribute
reports."
This is the system you need to use to do the
above. Of course, this is just one example of
how it can be used. If you are acquiring
various data from external sources in addition
to your own data, it is possible to make new
discoveries by combining those data with
your own data for analysis. If you are
operating your own e-commerce site or using
CRM or MA tools, you can combine the
member attribute information and purchase
history contained in those tools to conduct
more accurate digital marketing.

How to build a data analysis

infrastructure
In order to build a data analysis
infrastructure, we first need to investigate
what kind of data is being managed within
the company and think about how to collect it
and where to manage and aggregate it. It is
also necessary to consider how to utilize this
data within the company. First of all, we need
to create a system to store the data that will
be linked from various systems such as the
company's core system, and then we need to
create a system to analyze and aggregate the
data. (This book does not deal with specific
data utilization methods, etc.)

About Data Lake and DWH

A place to store data linked from various
systems is generally called a "data lake," and
a mechanism to perform aggregation and
analysis based on that data is called a "data
warehouse (DWH). A general data analysis
infrastructure is built based on these two
mechanisms.
Nowadays, most data lakes and DWHs are
built using cloud services such as Amazon
Web Service (AWS), Google Cloud Platform
(GCP), and Azure. In this section, I would
like to introduce the services required to use
the data analysis infrastructure services in
AWS and GCP respectively.
In the case of AWS
Data Lake: Amazon S3
DWH: Amazon Redshift
In the case of GCP
Data Lake: Cloud Storage
DWH: BigQuery
There are many cloud services available for
building data analysis infrastructure, and you
can choose from a variety of services
depending on the scale of your data and your
usage. In addition, since the services are
updated quite frequently, it is recommended
to check the latest updates of the services you
are using. In this document, we will be using
the services of Google Cloud Platform
(GCP).

Chapter2.Building a
Data Lake and DWH
with Cloud Storage
and BigQuery
About the data analysis
infrastructure built in this
book
Now, I would like to start building a data
analysis infrastructure using Google Cloud
Platform. First of all, I will explain a simple
image of the data analysis infrastructure to be
built in this book. The data analysis
infrastructure can be roughly divided into two
parts: a "data lake" that stores various data,
and a "DWH" that reads data from the data
lake and processes and aggregates the data
into an image that can actually be used. First,
we will build the data lake and DWH with
Google Cloud Platform's Cloud Storage and
BigQuery services, respectively.
If you want to know more about Cloud
Storage and BigQuery, please refer to the
Google Cloud documentation.
■ About Cloud Storage
https://cloud.google.com/storage/docs/introduction
■ About BigQuery
https://cloud.google.com/bigquery/what-
is-bigquery?
Advance preparation
How to create a Google Cloud Platform
Account
The details of the creation procedure are
omitted in this document. If you have any
questions, please refer to the technical articles
on the Internet.
https://cloud.google.com/free/
Creating an application with the App Engine
In order to use the Cloud Scheduler for
scheduling, which will be explained later, it is
assumed that you already have an application
running on the App Engine. If you do not
have an application running on the App
Engine, please create one in advance.
■ Setting up a Google Cloud project for App
Engine
https://cloud.google.com/appengine/docs/standard
About the contents of the sample data
handled in this manual
The following are the specifications of the
sample data to be uploaded to the Cloud
Storage bucket and the table definitions to be
created in BigQuery. Please check them if
necessary.
Files to be uploaded to the Cloud Storage
bucket
file-name:pos_salesdata_daily.csv
Column Name
1 sales_date

2 store_code

3 store_name

4 product_code

5 product_name

6 sales_amount

7 sales_cnt

Tables to create in BigQuery

Table name:pos_salesdata
Field Name Type contents
1 sales_date DATE Date the product was sold

2 store_code STRING Store Code

3 store_name STRING Store name

4 product_code STRING Product Code

5 product_name STRING Product name

6 sales_amount NUMERIC Sales amount

7 sales_cnt INTEGER Number of sales

Image of data
sales_date store_code store_name product_code
1 2020-10-21 10031 Narita 1000001
2 2020-10-21 10031 Narita 1000002
3 2020-05-01 10021 Suginami 5000301
4 2020-05-01 10021 Suginami 5000301
5 2020-08-21 10001 Tachikawa 8000001

We do not provide sample data for download,

so please create appropriate data using the
above format.

Data Lake and DWH

Construction Procedures
Now, let's create a data infrastructure using
GCP's services. First, we will build a data
lake and a DWH as shown in the following
figure.

Image of data lake and DWH configuration

Building a Data Lake
We will use Cloud Storage, a storage service
of GCP, to build a Data Lake.
(1) Log in to GCP.
(2) Log in to the GCP console from the
following link.
https://console.cloud.google.com/
To operate the command line from Cloud
Shell, start Cloud Shell from the Cloud Shell
start button on the upper right of the screen.
(3)Create a bucket
Open the navigation menu (hamburger button
in the upper left corner of the screen) and
select "Storage" -> "Browser" in "Storage".
When the storage browser screen opens,
select "Create Bucket" at the top of the
screen.

In the bucket creation screen

Enter a name for the bucket.

Name it so that it is globally unique.

・ Check "Region" for location type.
・ Select "us-central1" for Location
・ Check "Standard" for the storage class.
・ Check "Fine-grained management" for the
method to control access to objects.
・ Check "Fine-grained management" for the
method of controlling object access.
For operations using the CLI on Cloud Shell
This is the procedure for executing the GUI
procedure using the CLI (command line
operation).

$ gsuitl mb gs://[test-sample-
gcp]/
The bucket name must be set to a globally
unique name.
(4)Uploading files to the bucket
When you select the bucket you created, you
will be taken to the "Bucket Details" screen.
・ Click "UPLOAD FILES" and select the
files you want to upload.
For operations using the CLI on Cloud Shell
$ gsutil cp
[pos_salesdata_daily.csv]
gs://test-sample-gcp/
In this example, we assume that we want to
copy the files on the Cloud Shell home
directory to the Cloud Storage bucket.
The above steps will complete the building of
the data lake.
"Well, you can't just create a bucket and
upload files to it."
But that's exactly what it is. All you have to
do is "create a bucket" and "upload files". In
reality, you probably store files in separate
buckets and folders within the buckets for
different business purposes (e.g. for each data
source such as mission-critical systems) or
for different periods of time of the target data.
However, it is important to understand that
the minimum configuration unit of a data lake
is one file stored in one bucket, as shown in
the previous step.
Building a DWH
Next, we will use BigQuery, GCP's DWH
service, to build the DWH.
(1)Create a dataset in BigQuery
Select "BigQuery" from the navigation menu.

With an existing project selected in the

resource column on the left of the screen,
click "Create dataset" on the right of the
screen.

The "Create dataset" screen will appear on

the right side of the screen. Fill in the input
fields as shown below, and then click "Create
dataset".
Data Set ID：bq_dataset
Data Location： us-central1
Default table expiration date
Encryption: Key managed by Google
For operations using the CLI on Cloud Shell
Creating a Data Set

$ bq --location=us-central1 mk \
--dataset \
[project_id]:bq_dataset
Please replace [project_id] with your own
working environment.
(2)Create a table in the dataset
With the dataset you just created "bq_dataset"
selected in Resources, click "Create Table"
on the right side of the screen.
From the "Create Table" screen on the right
side of the screen, fill in the following items
and click "Create Table".
Source ： Empty table
Send to: Search for projects
Project name: Select an existing project
data set ： bq_dataset
Table type: Native table
Table name ： pos_salesdata
Schema: See the table below
Name Type Mode
sales_date DATE NULLABLE
store_code STRING NULLABLE
store_name STRING NULLABLE
product_code STRING NULLABLE
product_name STRING NULLABLE
sales_amount NUMERIC NULLABLE
sales_cnt INTEGER NULLABLE
For operations using the CLI on Cloud Shell
Creating a table
$ bq mk \
--table \
[project_id]:bq_dataset.pos_salesdata
\
sales_date:DATE,store_code:STRING,sto
This will create a bucket in Cloud Storage,
upload the sample data, and create a data set
and table in BigQuery.

Chapter3.Using Cloud
Functions to Automate
BigQuery Load
Processing
About the image of the loading
process to BigQuery
From now on, we will use GCP services such
as Cloud Functions, Cloud Pub/Sub, and
Cloud Scheduler to create a process that
automatically loads files stored in a Cloud
Storage bucket into a BigQuery table at a
predetermined time. We will use Cloud
Functions, Cloud Pub/Sub, and Cloud
Scheduler.
Image of the load process from Cloud
Storage to BigQuery

The process of loading files stored in Cloud

Storage into BigQuery table can be controlled
in the GCP console, but I think it would be
more convenient to automate it in actual
operation.
Make good use of GCP's scheduler function,
Cloud Scheduler, and FaaS function, Cloud
Functions, to automate the loading process.
The following services are required to
automate the process of loading from Cloud
Storage to BigQuery.
・Cloud Scheduler
・Cloud Pub/Sub
・Cloud Functions
The specific steps to be followed are as
follows
(1) Create a topic in Cloud Pub/Sub
(2) Setting up triggers for Cloud
Functions
(3) Publish the message.
( 4)Set up the function execution schedule
of Cloud Functions with Cloud Scheduler.

About the Python code that

runs in Cloud Functions
Before going into the steps, I will briefly
explain the function (Python code) to be
executed by Cloud Functions in this
document. There is a sample code in the
Google Cloud documentation, which will be
used as the basis for the explanation. In the
following URL, you will find a section called
"Load CSV data into a table" and select the
Python code.
■ Importing CSV data from Cloud Storage
https://cloud.google.com/bigquery/docs/loading-
data-cloud-storage-csv?

/// Here is a sample code

from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of
the table to create.
# table_id = "your-
project.your_dataset.your_table_name"
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("name",
"STRING"),
bigquery.SchemaField("post_abbr",
"STRING"),
],
skip_leading_rows=1,
# The source format defaults to CSV, so
the line below is optional.
source_format=bigquery.SourceFormat.CSV,
)
uri = "gs://cloud-samples-data/bigquery/us-
states/us-states.csv"
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
) # Make an API request.
load_job.result() # Waits for the job to
complete.
destination_table =
client.get_table(table_id) # Make an API
request.
print("Loaded {}
rows.".format(destination_table.num_rows))

For the sample code, here is a brief

explanation of how it is processed.
Define the schema for loading into BigQuery
tables
schema=[
bigquery.SchemaField("name",
"STRING"),
bigquery.SchemaField("post_abbr",
"STRING"),
],

Set the number of lines to read from the

source file (in the sample, the first line is
skipped and the second line is read).
skip_leading_rows=1,
Describe the URI of the Cloud Storage you
are loading from.
uri = "gs://cloud-samples-
data/bigquery/us-states/us-states.csv"
Load data into the target table (table_id)
destination_table =
client.get_table(table_id)

Now, let's create the code to actually load the

data into the BigQuery table based on the
above sample code. Once you have set the
project name, dataset name, table name, and
the URI of the Cloud Storage you are loading
from to match your environment, run the
code from the Cloud Shell and check that the
data is actually loaded into the BigQuery
table. You can save the modified code in a
file such as "bqload.py" on Cloud Shell and
run it manually for easy confirmation.
///Sample code (modified version)
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of
the table to create.
table_id = "[your-
project].bq_dataset.pos_salesdata"
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("sales_date",
"DATE"),
bigquery.SchemaField("store_code",
"STRING"),
bigquery.SchemaField("store_name",
"STRING"),
bigquery.SchemaField("product_code",
"STRING"),
bigquery.SchemaField("product_name",
"STRING"),
bigquery.SchemaField("sales_amount",
"NUMERIC"),
bigquery.SchemaField("sales_cnt",
"INTEGER")
],
skip_leading_rows=1,
# The source format defaults to CSV, so
the line below is optional.
source_format=bigquery.SourceFormat.CSV,
)
uri = "gs://[test-sample-
gcp]/pos_salesdata.csv"
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
) # Make an API request.
load_job.result() # Waits for the job to
complete.
destination_table =
client.get_table("testtable") # Make an API
request.
print("Loaded {}
rows.".format(destination_table.num_rows))

Now, let's see if the above code can actually

be executed.

$ python3 bqload.py
After running the above code and confirming
that the table has been created in BigQuery,
we need to modify the above code so that it
can be executed as a function in Cloud
Functions. To execute the Python code, we
need to decode the base64-encoded data in
the Pub/Sub message, so we modify the code
as follows

///Sample code (to be executed as a Cloud

Functions function)
import base64
import datetime
from google.cloud import bigquery
def bqload_func(event, context):
pubsub_message =
base64.b64decode(event['data']).decode('utf-
8')
print(pubsub_message)
client = bigquery.Client()
dataset_id = 'bq_dataset'
dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
bigquery.SchemaField("sales_date",
"DATE"),
bigquery.SchemaField("store_code",
"STRING"),
bigquery.SchemaField("store_name",
"STRING"),
bigquery.SchemaField("product_code",
"STRING"),
bigquery.SchemaField("product_name",
"STRING"),
bigquery.SchemaField("sales_amount",
"NUMERIC"),
bigquery.SchemaField("sales_cnt",
"INTEGER")
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so
the line below is optional.
job_config.source_format =
bigquery.SourceFormat.CSV
uri = "gs://test-sample-
gcp/pos_salesdata_daily.csv"
load_job = client.load_table_from_uri(
uri,
dataset_ref.table("pos_salesdata"),
job_config=job_config
) # API request
load_job.result() # Waits for table load to
complete.
destination_table =
client.get_table(dataset_ref.table("pos_salesdata"))
print("Loaded {}
rows.".format(destination_table.num_rows))

The above code will be used later to

configure Cloud Functions.

Steps to automate BigQuery

load processing using Cloud
Functions
Let's start with Cloud Pub/Sub and go
through the steps in order.
(1) Create a topic in Cloud Pub/Sub
・ Select "Pub/Sub" -> "Topic" from the
navigation menu.
・ Enter the following information on the
topic creation screen.
Topic ID: bqload-topic
Encryption: Unchecked
(2) Setting up triggers for Cloud Functions
From the Pub/Sub screen, click the "Trigger
Cloud Function" link at the top of the "Topic
Details" screen to create a function.
function name ： bqload_func
region ： us-central1
Trigger Type ： Cloud Pub/Sub
Please select a Cloud Pub/Sub topic ： bqload-
topic (the topic you created in the previous
step)
Retry on failure: Unchecked
When you have completed the above settings,
select "Save" and click "Next" at the bottom
of the screen.
You will be redirected to the "Edit Function"
screen. Configure the settings as follows, edit
main.py and requirements.txt, and deploy.
Runtime ： Python 3.8
entry point ： bqload_func
Source code: inline editor
Contents of main.py

import base64
import datetime
from google.cloud import bigquery
def bqload_func(event, context):
pubsub_message =
base64.b64decode(event['data']).decode('utf-
8')
print(pubsub_message)
client = bigquery.Client()
dataset_id = 'bq_dataset'
dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.schema = [
bigquery.SchemaField("sales_date",
"DATE"),
bigquery.SchemaField("store_code",
"STRING"),
bigquery.SchemaField("store_name",
"STRING"),
bigquery.SchemaField("product_code",
"STRING"),
bigquery.SchemaField("product_name",
"STRING"),
bigquery.SchemaField("sales_amount",
"NUMERIC"),
bigquery.SchemaField("sales_cnt",
"INTEGER")
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so
the line below is optional.
job_config.source_format =
bigquery.SourceFormat.CSV
uri = "gs://test-sample-
gcp/pos_salesdata_daily.csv"
load_job = client.load_table_from_uri(
uri,
dataset_ref.table("pos_salesdata"),
job_config=job_config
) # API request
load_job.result() # Waits for table load to
complete.
destination_table =
client.get_table(dataset_ref.table("pos_salesdata"))
print("Loaded {}
rows.".format(destination_table.num_rows))
Contents of requirements.txt
google-cloud-bigquery==1.25.0
Paste the above code and deploy it.
(3) Publish the message.
Return to the Cloud Pub/Sub screen, and
from the topic details screen, select "Publish
message" -> "Publish one message".

In the "Publish Message" screen, enter the

following information and click "Publish".
Type of publication: 1 time
Message body: bqload
(4)Set up the function execution schedule of
Cloud Functions in Cloud Scheduler.
Click "Create Job" in the Cloud Scheduler,
and make the following settings from the job
creation screen.
Name ： bqload_scd
Frequency ： 0 */1 * * * (running every
hour)
Time Zone ： Japanese Standard Time
Target ： Pub/Sub
Topics. ： bqload-topic
Payload ： test
After creating the job, click the "Run Now"
button on the job screen, and confirm that the
BigQuery table is loaded with the data from
the target file in Cloud Storage. (It is better to
delete the target table beforehand for easier
confirmation.
Chapter4.The cost of
building a data
infrastructure
For the data analysis infrastructure built with
Cloud Storage, BigQuery, Cloud Functions,
etc. described in this book, the basic cost of
construction and operation is almost zero.
(Exclude storing extremely large files in
Cloud Storage or querying large tables in
BigQuery.)
When building and operating a data analytics
infrastructure in GCP, the most costly points
are storing data in Cloud Storage, executing
queries (SQL) with BigQuery, and using
other services (Cloud Composer and
Dataflow). However, by using services such
as Cloud Scheduler and Cloud Functions as
described in this document, the operational
costs can be dramatically reduced. However,
by using services such as Cloud Scheduler
and Cloud Functions described in this book,
the operational cost can be dramatically
reduced. (However, if complex ETL
processing is required, it is recommended to
introduce a separate ETL tool, etc.)
Especially in the phase where you are
building a new data analysis infrastructure,
there is almost no charge, so you should feel
free to try to build a data analysis
infrastructure.

Chapter5.Other things
that are necessary for
building a data
infrastructure
ETL processing that occurs in
data infrastructure
construction
In this book, I have not mentioned ETL
processing at all, because I have touched on
the construction of the infrastructure that is
the minimum requirement for building a data
analysis infrastructure. However, ETL
processing is always necessary for data
analysis infrastructure. In that case, if you
implement it in GCP, you need to use a
service to execute ETL processing or
workflow processing, specifically Dataflow
or Composer. However, building and
operating a data analysis infrastructure using
these services requires a considerable amount
of engineering resources to be allocated, so it
is not necessary to get involved at the stage of
building a new and casual data analysis
infrastructure. If you want to do ETL
processing on the simple data analytics
infrastructure configuration discussed in this
book, we recommend you to use BigQuery's
"Query Scheduling" feature. "The query
scheduling feature allows you to schedule
SQL to be executed in BigQuery. You can
use this feature to process and transform the
data loaded into a BigQuery table by a Cloud
Functions function and store it in another
table using SQL.

Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
From Everand
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Debananda Ghosh
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Building Serverless Apps with Azure Functions and Cosmos DB: Leverage Azure functions and Cosmos DB for building serverless applications (English Edition)
From Everand
Building Serverless Apps with Azure Functions and Cosmos DB: Leverage Azure functions and Cosmos DB for building serverless applications (English Edition)
Hansamali Gamage
No ratings yet
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
From Everand
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
Adi Wijaya
No ratings yet
CCD chapter 3 notes
No ratings yet
CCD chapter 3 notes
11 pages
Bring Data Lakes and Data Warehouses Together
100% (1)
Bring Data Lakes and Data Warehouses Together
19 pages
Unit3 - Cloud Data Storage
No ratings yet
Unit3 - Cloud Data Storage
7 pages
GCP - DataPlex - Building A Data Lakehouse
No ratings yet
GCP - DataPlex - Building A Data Lakehouse
19 pages
Learn Microsoft Azure: Step by Step in 7 day for .NET Developers
From Everand
Learn Microsoft Azure: Step by Step in 7 day for .NET Developers
Saillesh Pawar
No ratings yet
Module 1.ppt
No ratings yet
Module 1.ppt
29 pages
Hadoop Report
No ratings yet
Hadoop Report
110 pages
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
M1.1 Introduction To Data Engineering
No ratings yet
M1.1 Introduction To Data Engineering
75 pages
Create Your Website and E-Commerce at No Cost. Thanks to WordPress and Google Cloud Platform
From Everand
Create Your Website and E-Commerce at No Cost. Thanks to WordPress and Google Cloud Platform
Giovanni Lillo
5/5 (1)
Ecommerce Big Data Computeing Platform System Based On Distribuded Computing
No ratings yet
Ecommerce Big Data Computeing Platform System Based On Distribuded Computing
10 pages
Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021
No ratings yet
Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021
18 pages
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
From Everand
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
vivian njoroge
No ratings yet
Big Data Question Bank
No ratings yet
Big Data Question Bank
38 pages
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
No ratings yet
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
12 pages
OD M1 Introduction To Data Engineering
No ratings yet
OD M1 Introduction To Data Engineering
69 pages
Google BigQuery Analytics
From Everand
Google BigQuery Analytics
Jordan Tigani
3/5 (1)
46-Article Text-261-2-10-20210422
No ratings yet
46-Article Text-261-2-10-20210422
10 pages
Oracle Warehouse Builder 11g: Getting Started
From Everand
Oracle Warehouse Builder 11g: Getting Started
Bob Griesemer
No ratings yet
40833 OR
No ratings yet
40833 OR
29 pages
ADBMS-Module 1 Notes
No ratings yet
ADBMS-Module 1 Notes
18 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
6 pages
Mastering BigQuery: Scalable Analytics on Google Cloud
From Everand
Mastering BigQuery: Scalable Analytics on Google Cloud
Robert Johnson
No ratings yet
Paper On Data Analytic in CLoud Computing 1
No ratings yet
Paper On Data Analytic in CLoud Computing 1
5 pages
Cours BI 23 24 Session 4 2
No ratings yet
Cours BI 23 24 Session 4 2
46 pages
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
No ratings yet
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
56 pages
OD M1 Introduction To Data Engineering
No ratings yet
OD M1 Introduction To Data Engineering
69 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
155928-Turn Big Data
No ratings yet
155928-Turn Big Data
8 pages
BDC Output 10
No ratings yet
BDC Output 10
7 pages
Effective Business Intelligence with QuickSight
From Everand
Effective Business Intelligence with QuickSight
Rajesh Nadipalli
No ratings yet
Real-Time Big Data Analytics: Emerging Trends
From Everand
Real-Time Big Data Analytics: Emerging Trends
Trilokesh Khatri
No ratings yet
Augmenting Data Warehouses With Big Data
No ratings yet
Augmenting Data Warehouses With Big Data
17 pages
Scaling Google Cloud Platform: Run Workloads Across Compute, Serverless PaaS, Database, Distributed Computing, and SRE (English Edition)
From Everand
Scaling Google Cloud Platform: Run Workloads Across Compute, Serverless PaaS, Database, Distributed Computing, and SRE (English Edition)
Swapnil Dubey
No ratings yet
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
ETL for Data infra-2
No ratings yet
ETL for Data infra-2
33 pages
554 de
No ratings yet
554 de
33 pages
BD1 1
No ratings yet
BD1 1
9 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
Deploy any website on google cloud platform
From Everand
Deploy any website on google cloud platform
AJ Books
No ratings yet
Unit 3 Data-Analytics
No ratings yet
Unit 3 Data-Analytics
48 pages
Parcial Cono 1 21
No ratings yet
Parcial Cono 1 21
21 pages
Parcial Cono 1 14
No ratings yet
Parcial Cono 1 14
14 pages
Learning Google Cloud Vertex AI: Build, deploy, and manage machine learning models with Vertex AI (English Edition)
From Everand
Learning Google Cloud Vertex AI: Build, deploy, and manage machine learning models with Vertex AI (English Edition)
Hemanth Kumar K
No ratings yet
Data Engineering 101
No ratings yet
Data Engineering 101
1 page
Enterprise Data Science: Smarter Decisions with Big Data
From Everand
Enterprise Data Science: Smarter Decisions with Big Data
Vidhur Gupta
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
ACC IT APP MIdterm Bigdata
No ratings yet
ACC IT APP MIdterm Bigdata
12 pages
Exam AZ 900: Azure Fundamental Study Guide-1: Explore Azure Fundamental guide and Get certified AZ 900 exam
From Everand
Exam AZ 900: Azure Fundamental Study Guide-1: Explore Azure Fundamental guide and Get certified AZ 900 exam
Mamta Devi
No ratings yet
Sql : The Ultimate Beginner to Advanced Guide To Master SQL Quickly with Step-by-Step Practical Examples
From Everand
Sql : The Ultimate Beginner to Advanced Guide To Master SQL Quickly with Step-by-Step Practical Examples
Mark Robinson
No ratings yet
SQL| KILLING STEPS TO INTRODUCE SQL DATABASES
From Everand
SQL| KILLING STEPS TO INTRODUCE SQL DATABASES
Ben Brumm
No ratings yet
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
79 pages
Cloud & Big Data
No ratings yet
Cloud & Big Data
5 pages
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
Hands-on Cloud Analytics with Microsoft Azure Stack
From Everand
Hands-on Cloud Analytics with Microsoft Azure Stack
Prashila Naik
No ratings yet
Big Data Analysis Using Apache HADOOP (November 2013) : Abstract-Big Data Problems Are Often Complex To
No ratings yet
Big Data Analysis Using Apache HADOOP (November 2013) : Abstract-Big Data Problems Are Often Complex To
11 pages
MT2024002
No ratings yet
MT2024002
3 pages
Teknologi Informasi, Internet dan Pengguna
No ratings yet
Teknologi Informasi, Internet dan Pengguna
30 pages
Databricks Certified Data Engineer Associate
No ratings yet
Databricks Certified Data Engineer Associate
4 pages
IOT in APAC
No ratings yet
IOT in APAC
30 pages
Challenges and Opportunities in Implementing Quantum Safe Key Distribution
No ratings yet
Challenges and Opportunities in Implementing Quantum Safe Key Distribution
7 pages
Career Objective - : "Learning Through Experience and Striving For Excellence...... "
No ratings yet
Career Objective - : "Learning Through Experience and Striving For Excellence...... "
8 pages
Faculty Bio Sample
No ratings yet
Faculty Bio Sample
9 pages
Sahil Gupta
No ratings yet
Sahil Gupta
2 pages
solution-overview-base-command-manager
No ratings yet
solution-overview-base-command-manager
3 pages
Cybersecurity and Safety Measure
No ratings yet
Cybersecurity and Safety Measure
5 pages
Devops Fresher Resume
No ratings yet
Devops Fresher Resume
3 pages
CIMdata eBook Siemens Teamcenter X_tcm27-90182
No ratings yet
CIMdata eBook Siemens Teamcenter X_tcm27-90182
12 pages
Exam Az 104 Microsoft Azure Administrator Skills Measured PDF
100% (1)
Exam Az 104 Microsoft Azure Administrator Skills Measured PDF
12 pages
Az 900
No ratings yet
Az 900
165 pages
Splunk Security Investigation and Aws
100% (1)
Splunk Security Investigation and Aws
5 pages
Designing Enterprise Architecture Toward Big Data Readiness Using TOGAF ADM in The Public Health Sector
No ratings yet
Designing Enterprise Architecture Toward Big Data Readiness Using TOGAF ADM in The Public Health Sector
9 pages
Setting Up Customs Management With SAP Global Trade Services (2U1)
No ratings yet
Setting Up Customs Management With SAP Global Trade Services (2U1)
18 pages
Accenture Workplace Safety Insurance Board Ontario Cloud Case Study
No ratings yet
Accenture Workplace Safety Insurance Board Ontario Cloud Case Study
7 pages
The Digital Services Playbook - From The U.S. Digital Service
100% (2)
The Digital Services Playbook - From The U.S. Digital Service
13 pages
c01_analyzing_the_aws_blu_age_refactor_strategy_in_the_aws_mm_offering_2405
No ratings yet
c01_analyzing_the_aws_blu_age_refactor_strategy_in_the_aws_mm_offering_2405
30 pages
Aws The Final
No ratings yet
Aws The Final
20 pages
DAY 1 Temenos Transact Overview - Technical 4
100% (1)
DAY 1 Temenos Transact Overview - Technical 4
62 pages
GCP Questions
No ratings yet
GCP Questions
71 pages
Cyber Security Seminar Report
No ratings yet
Cyber Security Seminar Report
19 pages
Extremecloud Iq Data Sheet
No ratings yet
Extremecloud Iq Data Sheet
6 pages
SAP Application Engineer BI
No ratings yet
SAP Application Engineer BI
2 pages
Unit 4 Business Organization
No ratings yet
Unit 4 Business Organization
23 pages
G Iyer - Updated Resume
No ratings yet
G Iyer - Updated Resume
8 pages
Resume of Pavan Kumar Keesara
No ratings yet
Resume of Pavan Kumar Keesara
4 pages
What Is Identity and Access Management - Guide To IAM
No ratings yet
What Is Identity and Access Management - Guide To IAM
8 pages

Simple Data Infrastructure Using Google Cloud B09K5R1CW2

Uploaded by

Simple Data Infrastructure Using Google Cloud B09K5R1CW2

Uploaded by

Introduction.

The purpose of this book is to show you how

How to build a data analysis

About Data Lake and DWH

Tables to create in BigQuery

2 store_code STRING Store Code

3 store_name STRING Store name

4 product_code STRING Product Code

5 product_name STRING Product name

6 sales_amount NUMERIC Sales amount

We do not provide sample data for download,

Data Lake and DWH

Image of data lake and DWH configuration

In the bucket creation screen

Name it so that it is globally unique.

With an existing project selected in the

The "Create dataset" screen will appear on

The process of loading files stored in Cloud

About the Python code that

/// Here is a sample code

For the sample code, here is a brief

Set the number of lines to read from the

Now, let's create the code to actually load the

Now, let's see if the above code can actually

///Sample code (to be executed as a Cloud

The above code will be used later to

Steps to automate BigQuery

In the "Publish Message" screen, enter the

You might also like