0% found this document useful (0 votes)

14 views

Episode 2 - Transcription

Power BI material

Uploaded by

Vitor Hugo Ferreira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Episode 2 - Transcription

Power BI material

Uploaded by

Vitor Hugo Ferreira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Intro

Hey guys, and welcome back to the Data Analyst Portfolio Project Series! In this episode, we are diving into the
first technical part of our marketing analysis project. Today, we're going to look at extracting and cleaning
marketing data using SQL.

We'll go through different SQL transformations that will be applied to each table in our data model before we
bring the data into Power BI. We'll break down the SQL queries step by step, explaining each transformation and
its purpose.

If you haven't watched the previous episode where we introduced the project and discussed the business case,
make sure to check that out first. You can find it in the description. I have the links and all the information to be
able to find that there.

That being said, we are ready to take a look at the queries and the transformations that we're going to do in this
project.

Product Table
Okay, guys, so here we have some different SQL statements that we're going to walk through. We're going to
break them down, and I'm going to explain as we go. We're going to start on a little bit of the simpler side and then
get a little bit more advanced as we progress through the different statements.

What's important is that there is a database called Portfolio Project Marketing Analytics. You can find the link for
that in the description of this video. You need to restore that backup file onto this SQL Server Express server that
you have so that you can get these tables available for you locally.

That is what I show in my video on how to set up and create a practice environment. If you haven't done that,
make sure to go back and check that video. That's also described in the description. So before you can do this,
before you can follow along, you need to make sure that you have this database attached on your SQL Server, and
you can find all of the instructions to do that.

The first statement that we will walk through is the one that is on our product data. Now, you'll see in these
statements that there is a top part, and I've added something like a divider.

I will share these SQL queries on my GitHub—again, link in the description. Everything is there. This part above
the stars will not be there; that's just for me to go a little bit back and forth to show you guys the table before and
after. But just follow along.

If you have any questions, ask in the comments, and I think this is going to be quite useful. I've also tried to really
add comments on the different parts for you guys to understand what is going on here.

Let’s just start by looking at the products data. What does the products table look like before anything has been
done to it? So we're going to mark this up and execute this part.

Here you can see we have a Product ID, we have a Product Name, we have a Category, and we have a Price.

Now, right away, this table is fine as is. One thing you're going to see is that I am going to comment out the
Category column because I think it's a bit redundant when we have just products and all the categories are just
one.

So, I'm going to remove that in our final query. Then there's one thing that I'm going to do, and that is I'm going to
case this Price column. I want to make a category for when the price is above or below something, or within
certain amounts. I want it to go Low, Medium, or High, and I'm going to store that—or we're going to turn it into—
a Price Category column.
Now, the values that you decide here, I'm going to let you guys decide that. However you want, but I recommend
that you just write this statement out and try to understand what is going on.

Because what we're doing, we're doing a comparison. We're doing a case, and when something is fulfilled, then
categorize it as this text. When it is between these bands, this text. Else, this text. And end it and return it as this
column, which becomes Price Category.

So, it's fairly simple. You know, this is a simple start. We're going to start a little bit soft with a query here. There
isn't anything else that we're going to do on this table.

If I show you guys, if I run this now—and if you're new in SQL Server Management Studio—the part that I mark
is what is going to be returned in this window, this result window down here.

So, I mark up this query now, execute, and now you can see that we have this new column. We have this Price
Category column, and you can see that there's High, Medium, Low.

The point of doing that is just if a company wants to have an easy way to understand how many products are
selling within a certain category, it's easier to have some buckets than having these values and having to just try
and understand for yourself what is High, Medium, Low.

So, I've just decided to add that category there. Also, you could also just go Ctrl + A, Select. Then you will see
both—you'll see the first statement, which is this one, and the bottom statement, which is the result of this one.

Now, like I said, on my GitHub, I'm going to add these SQL statements there. It will be the exact same statements,
but it will be without anything that is above this star. The stars, we don't need that.

This is the statement that will produce this result, and this is the statement—or the result of this SQL query—is
what is going to be used in our dashboard later.

But I recommend that you guys go in, look at the video, try to write the statement yourself, look at the different
comments that I have added, and try to understand what is going on here.

You can see here, it's the part where I commented out the category because I thought that was a little bit
redundant. Now it's back, but let's take it out again.

If you get other ideas of things you want to do, then please go ahead. There's no right or wrong answer here.
What's important is that the steps we are applying in this SQL statement are going to make our opportunity and use
case more analytically driven and better later for the dashboards.

So, we're going to add and do some improvements here so that the output becomes better, cleaner, and more
structured in the Power BI dashboard at a later stage.

But this is the first one. This is the one that relates to product information.

Customer Table
The next one then is going to be about customers. Now, this one is a little bit different.

What we're doing here is we have one customers table and we have one geography table. I'm going to mark both of
them, and we'll get two different results.

So, you can see we have a Customer ID, we have a Name, we have an Email, Gender, Age, and Geography.
This Geography ID is going to correspond to an ID in the geography table.

What we're doing here is trying to create a combined table.

Like I said, the stars and above are just for me to show you guys. Below is the actual query that we're going to
focus on.

So, we can then just go back to, let's say, if we have customers—that's what that looks like by itself—or we have
geography—that’s what that looks like by itself.

Now, if we go down, you can see this is a join, a statement which is going to combine these two tables.

So, we have a SELECT and we are selecting from the customer table as "C", where we're going to alias all the
columns that come from this table in the statement.

Then we're selecting from the geography table as "G".

So, you can see customers over here—that's this table—and geography over here—that's that table.

We have a LEFT JOIN on this condition where Geography ID from the table, which has been aliased with "C", is
matched onto the Geography ID on the table, which has been aliased as "G", which is Geography.

We are going to then SELECT:

 Customer ID
 Customer Name
 Email
 Gender
 Age

...with this alias to kind of tell the SELECT statement which table does this come from.

Then we're going to add the Country and City from this other table on this condition.

This is a LEFT JOIN. If you want to, you can go in and change it to a RIGHT JOIN.

Actually, let's first look at the result of the LEFT JOIN before I start changing the join criteria.

So, we can mark this part off, execute, and now you can see we have a table which combines customer information
and geography information.

In this dimension, we are doing a join, and it's because we don't need two different tables. We can combine them
into one.

Fewer tables in our data model are more efficient.

It makes sense. It's fine.

This is the combined result that we want to get into Power BI at a later stage when we're going to do some
analysis.

This is a LEFT JOIN, so these two columns have been added to the left table, which is the first one in the join,
which is customers.

Just for the sake of it, if I comment this out and I do this as a RIGHT JOIN...

Let me see if I can get this part here...

You can see now it's going to list the cities first because those are the values that come up in the geography table
first, and then it adds whatever matches on that from the customer table.

So, this becomes the first table on the right side and is joined into that.

Then there's an INNER and FULL OUTER JOIN—just some other types.

In this example, it doesn't really make any difference, but I've added those there in case you want to comment in
and out and just look at it.

But the purpose of this one is for you guys to just do a quick practice on selecting two different tables, combining
them.

And again, from a more sensible point of view, why are we doing this?

It's because we don't need two different tables.

We can combine them.

Fewer tables in a data model—it just makes sense.

Here you can see we have an Age, we have a Country, and a City.

If you want to, you can go in and case the Age column into, let's call it, different categories of age like Young,
Middle Age, Old, or some other ways of putting it.

Or you can do countries—Northern Europe, Western Europe, Eastern Europe, Southern Europe.

If you want to, I'm just trying to come up with some ideas for you guys.

You could have cities.

You could categorize them into Large or Small based on the number of people that live there.

There are different things you can do here to categorize or put it into a context which means that you can use it
more in a different analytical point of view.
That is what we're trying to do with the SQL here.

So far, we have two dimension tables. They are going to become what we call Dimension Tables in a data model
or lookup tables.

If we have, let me see if I can draw this up for you guys...

Say we have a data model.

We'll come back to this later in Power BI, but let's say we have, you know...

Let me find...

I haven't done this forever, that's for sure.

Let's go.

Okay, so if we have Customers and Geography, and then we're going to have some tables down here—excuse my
paint drawing skills—

These are going to be Dimension Tables, and they're going to filter down onto these tables.

They're going to filter the Fact Tables, which we will just get to.

This will be a little more clear when we get into Power BI.

I'm just trying to get it out there already.

We have some tables down here which are Fact Tables, and some tables here which are Lookup Tables or
Dimension Tables.

So far, we have done category changes, and often you do that on Dimension Tables, or we've combined different
dimension tables.

Customer Review Table

Now we're going to go into the tables that will hold the calculations, which we call Fact Tables.

The first one we can look at is a simple example. It is the one that has some information about customer reviews.

Let’s take a look at that one.

You can see we have:

 A Review ID
 The Customer who did the review
 The Product ID to identify what product was reviewed
 The Date
 The Rating
 The Text

Now, the issue with this one is that in some of these reviews, there is double spacing instead of single spacing.

So, I've added a simple REPLACE function to clean up double spacing with single spacing.

That’s the only thing we're going to do on this table for now.

We can run this query, and you can see that the text looks a little bit cleaner.

So, we've done a small fix there.

This is also the table that we're going to work on in Python later.

In Python, we’re going to do something called sentiment analysis.

I'll just very quickly explain what it is.

We will try to determine if the text in the review is positive or negative.

We’re going to combine that with our understanding of the review text and the rating that they gave to see if we
can make it even more accurate or extract some other types of information.

That’s in the next episode, but I thought I’d explain to you guys what we are going to do.

This table is a Fact Table, which means we can calculate metrics like:

 Average Rating

Later, when we add the sentiment analysis, we can create:

 A Rating Sentiment Score

 A Rating Sentiment Category

For now, the rating and the text are something that we want to understand based on the product or the customer
who wrote it.

The customer provides context:

 Who was the customer?

 Which city are they from?
 How old are they?
 What gender?

The product tells us:

 The category of the product
 The product type
 Whether it’s a Low, Medium, or High priced product

These details will then be used to filter the rating to understand how this all comes together.

So, this table becomes one of the Fact Tables in our data model.

Engagement Table
The next Fact Table we’re going to look at is the engagement data table.

Here’s what it looks like before we’ve done anything to it.

We’re going to make some improvements here to address a few issues, including:

1. Different spellings in the Content Type column (e.g., "Blog", "blog", "Video", "video").
2. Inconsistent formatting in the Engagement Date column.
3. Combined Views & Clicks column, which needs to be split into two separate columns.

These are the steps we’ll take to clean up this table:

1. Standardize the content types using the UPPER function to make all values consistent in terms of case.
2. Replace variations like "social media" with a single, standardized version.
3. Format the Engagement Date column for consistency.
4. Split the Views & Clicks column into separate columns for easier analysis.

We’ve also decided to filter out rows where the Content Type is “Newsletter,” as we don’t need those records for
our analysis.

Let’s run the query.

After execution, this is what the cleaned table looks like:

 IDs are listed first for better clarity.

 Content types are standardized.
 Views and clicks are now in separate columns.
 Dates are formatted properly.

This cleaned data is now ready to be brought into Power BI for further analysis.

We’ve introduced a few more SQL concepts here, such as:

 Using UPPER to standardize text

 Splitting columns with string functions
 Filtering rows with specific conditions
If you want to see the query, like I mentioned earlier, check the description for links to my GitHub.

Customer Journey Table

The last table we’ll look at is the Customer Journey table.

This one is a little more advanced, so I’ll try to take you through it step by step.

If we start by just looking at the Customer Journey table by itself, it shows:

 Journey ID
 Customer ID
 Product ID
 The date they visited
 Different stages
 An action they performed

This table allows us to analyze customer journeys in a funnel.

For example:

 Did they view a product?

 Did they click on it?
 Did they buy something?

One issue with this table is the Duration column, which has missing values in some places.

Another issue is that there are duplicate rows—some rows have exactly the same information.

To fix these issues, we’ll do the following:

1. Use a subquery to calculate the average duration for each visit date and use that to fill in missing values.
2. Identify and remove duplicate rows using the ROW_NUMBER function.

I’ll show you the query to identify duplicates first.

This query uses a Common Table Expression (CTE) to assign a row number to each record based on its unique
combination of fields, such as:

 Customer ID
 Product ID
 Visit Date
 Stage
 Action
If a unique combination appears more than once, the ROW_NUMBER function assigns it multiple row numbers.

For example, the first occurrence gets a row number of 1, the second occurrence gets 2, and so on.

Let’s execute this query to verify duplicate records.

Here, you can see rows where the ROW_NUMBER is greater than 1—these are the duplicates.

Now that we’ve identified them, we can remove them by keeping only the rows with a ROW_NUMBER equal to 1.

Next, we’ll handle the missing Duration values.

We’ll use another subquery to calculate the average duration for each visit date.

If a Duration value is null, we’ll replace it with the average duration for that specific date.

When we execute the final query, the cleaned table will:

1. Have no duplicate rows.

2. Have missing durations replaced with average values.

After running the final query for the Customer Journey table, here’s what the cleaned data looks like:

 The Duration column now has no null values; all missing durations have been replaced with averages for
their respective visit dates.
 Duplicate rows have been removed; only the first occurrence of each unique combination remains.

To verify, let’s look at a specific Journey ID that previously had duplicates. For example, Journey ID 23 initially
had two rows. Now, it has only one row in the cleaned table.

The query has successfully eliminated duplicates and replaced missing Duration values.

Combining Results
This cleaned table will now be brought into Power BI, where it will serve as one of our Fact Tables.

The advanced concepts we’ve covered here include:

1. Using the ROW_NUMBER function to identify and remove duplicates.

2. Implementing a subquery to calculate averages and fill missing values.
3. Leveraging Common Table Expressions (CTEs) for temporary result sets to aid in intermediate
calculations.
If this process feels complex, I recommend revisiting the query and breaking it down step by step. Pay attention to:

 How the ROW_NUMBER function partitions and orders data.

 How the subquery calculates averages based on a specific column.

These are essential SQL techniques for cleaning and preparing data for analysis.

Data Model Overview

At this stage, we’ve worked on several tables:

1. Dimension Tables:
o Product Table
o Customer Table (combined with Geography)
2. Fact Tables:
o Customer Reviews Table
o Engagement Table
o Customer Journey Table

In Power BI, these Dimension Tables will serve as filters for the Fact Tables.

For example:

 The Product Table can filter customer reviews by product type or price category.
 The Customer Table can filter engagement or journey data by customer demographics or location.

Next Episode
That’s it for this episode on extracting and cleaning marketing data using SQL.

You’ve learned how to:

1. Connect to a database in SQL Server Management Studio.

2. Extract relevant data.
3. Apply cleaning techniques to prepare data for analysis in Power BI.

In the next episode, we’ll enhance our data by performing sentiment analysis on customer reviews using Python.
This will add valuable insights to our marketing analytics project.

Make sure to like, subscribe, and hit the notification bell so you don’t miss any updates in this series.

If you have any questions or thoughts, leave a comment below. I’d love to hear from you in the comments section.

Conclusion
Thank you for watching, and I’ll see you in the next episode!

Excel 2019 PivotTables: Easy Excel Essentials 2019, #1
From Everand
Excel 2019 PivotTables: Easy Excel Essentials 2019, #1
M.L. Humphrey
5/5 (1)
DFSMS
No ratings yet
DFSMS
256 pages
CIS CentOS Linux 6 Benchmark v1.1.01 PDF
No ratings yet
CIS CentOS Linux 6 Benchmark v1.1.01 PDF
172 pages
SQL Essentials: Mark Mcilroy
No ratings yet
SQL Essentials: Mark Mcilroy
36 pages
SC4x W2L2 v2
No ratings yet
SC4x W2L2 v2
49 pages
Task 1 - Introduction and Previews - Transcript
No ratings yet
Task 1 - Introduction and Previews - Transcript
3 pages
Project to share
No ratings yet
Project to share
15 pages
SQL Essentials PDF
No ratings yet
SQL Essentials PDF
36 pages
SQL Course Complete
No ratings yet
SQL Course Complete
5 pages
376422_LEC08_AdvancedSQL
No ratings yet
376422_LEC08_AdvancedSQL
64 pages
Episode 4 - Transcript
No ratings yet
Episode 4 - Transcript
10 pages
Lecture 02: SQL: Wednesday, January 8, 2003 Guest Lecturer: Rachel Pottinger
No ratings yet
Lecture 02: SQL: Wednesday, January 8, 2003 Guest Lecturer: Rachel Pottinger
33 pages
FULL DB AND SQL
No ratings yet
FULL DB AND SQL
27 pages
Chapter 2 - SQL Basics and Query Optimization
No ratings yet
Chapter 2 - SQL Basics and Query Optimization
23 pages
3.Note_3
No ratings yet
3.Note_3
10 pages
Expediting Data Retrieval With Indexes Recommending Guidelines For Index Creation
No ratings yet
Expediting Data Retrieval With Indexes Recommending Guidelines For Index Creation
3 pages
Database Testing Using SQL
No ratings yet
Database Testing Using SQL
6 pages
SQL Queries and Concepts
No ratings yet
SQL Queries and Concepts
5 pages
The SQL JOIN Refers To Using The JOIN Keyword in A SQL Statement in Order To Query Data From Two Tables
No ratings yet
The SQL JOIN Refers To Using The JOIN Keyword in A SQL Statement in Order To Query Data From Two Tables
9 pages
lec4
No ratings yet
lec4
39 pages
Data Retrieval
No ratings yet
Data Retrieval
18 pages
Basic SQL Queries
No ratings yet
Basic SQL Queries
4 pages
Department of Product
No ratings yet
Department of Product
27 pages
sql lectures ptit
No ratings yet
sql lectures ptit
3 pages
Advanced Concepts in SQL
No ratings yet
Advanced Concepts in SQL
5 pages
Coding Rules
No ratings yet
Coding Rules
8 pages
Data Analytics With Financial Accounting Information: Winter 2022 Session 4
No ratings yet
Data Analytics With Financial Accounting Information: Winter 2022 Session 4
36 pages
Order of Execution in SQL
No ratings yet
Order of Execution in SQL
12 pages
Programming SQL Server 201 6: Topics Homework
No ratings yet
Programming SQL Server 201 6: Topics Homework
3 pages
The SELECT TOP Clause Is Used To Specify The Number of Records To Return
No ratings yet
The SELECT TOP Clause Is Used To Specify The Number of Records To Return
5 pages
Week7 Chapter 8 - moreSQLexamples
No ratings yet
Week7 Chapter 8 - moreSQLexamples
50 pages
13 SQL Statements For 90 - of Your Data Analysis Tasks. by Abhishek Saud Mar, 2023 Medium
No ratings yet
13 SQL Statements For 90 - of Your Data Analysis Tasks. by Abhishek Saud Mar, 2023 Medium
18 pages
Lesson3b-MY SQL
No ratings yet
Lesson3b-MY SQL
47 pages
UU-COM-4008 Reading Material Week 3
No ratings yet
UU-COM-4008 Reading Material Week 3
9 pages
SQL
No ratings yet
SQL
33 pages
Benja's Notes
No ratings yet
Benja's Notes
40 pages
SQL+Commands
No ratings yet
SQL+Commands
13 pages
Mastering Data Cleaning Techniques with SQL — Explained Examples _ by ? panData _ Level Up Coding
No ratings yet
Mastering Data Cleaning Techniques with SQL — Explained Examples _ by ? panData _ Level Up Coding
31 pages
Lec03 SQL Joins
No ratings yet
Lec03 SQL Joins
50 pages
TECH MAHINDRA DATA ANALYST INTERVIEW QUESTIONS
No ratings yet
TECH MAHINDRA DATA ANALYST INTERVIEW QUESTIONS
11 pages
SQL Basics
No ratings yet
SQL Basics
4 pages
Lec04 SQL Aggregates
No ratings yet
Lec04 SQL Aggregates
53 pages
Week+2SQL
No ratings yet
Week+2SQL
7 pages
SQLQueries First Editon
No ratings yet
SQLQueries First Editon
43 pages
SQLQueries First Editon PDF
No ratings yet
SQLQueries First Editon PDF
43 pages
Techniques Used to Transform Data, Part 2
No ratings yet
Techniques Used to Transform Data, Part 2
7 pages
IS 4420 Database Fundamentals Introduction To SQL Leon Chen
No ratings yet
IS 4420 Database Fundamentals Introduction To SQL Leon Chen
42 pages
Lec7 - BasicSQL Part 2 PDF
No ratings yet
Lec7 - BasicSQL Part 2 PDF
40 pages
Interview_7 - IMP
No ratings yet
Interview_7 - IMP
26 pages
SQL Examples
No ratings yet
SQL Examples
5 pages
sql notes
No ratings yet
sql notes
10 pages
SQL Revision
No ratings yet
SQL Revision
41 pages
SQL Notes
No ratings yet
SQL Notes
9 pages
SQL For Everyone (Definitive Guide)
No ratings yet
SQL For Everyone (Definitive Guide)
10 pages
Rdbms 2
No ratings yet
Rdbms 2
15 pages
W3S SQL
No ratings yet
W3S SQL
13 pages
SQL Case Study 1 Corrected
No ratings yet
SQL Case Study 1 Corrected
3 pages
SQL Masterclass
No ratings yet
SQL Masterclass
25 pages
SQL
No ratings yet
SQL
10 pages
Transaction - ID Customer - Id Channel Product Price Discount: Is The Alias' For and Is Designated Using
No ratings yet
Transaction - ID Customer - Id Channel Product Price Discount: Is The Alias' For and Is Designated Using
10 pages
SQL for Data Science
No ratings yet
SQL for Data Science
8 pages
The Ultimate Guide to Google Sheets Pivot Tables
From Everand
The Ultimate Guide to Google Sheets Pivot Tables
Miloš Jovanović
No ratings yet
AdvaRisk Assignment - Data Science Internship Round 1
No ratings yet
AdvaRisk Assignment - Data Science Internship Round 1
2 pages
Inventory Events
No ratings yet
Inventory Events
3 pages
38 Basic Linux Commands To Learn With Examples: Syntax
No ratings yet
38 Basic Linux Commands To Learn With Examples: Syntax
18 pages
SQL Basics Cheat Sheet A3
No ratings yet
SQL Basics Cheat Sheet A3
1 page
MS Access VBA Database Developer
0% (1)
MS Access VBA Database Developer
4 pages
Model Answer of Questions Booklet: Question-1: Consider The Following Schema Named University
No ratings yet
Model Answer of Questions Booklet: Question-1: Consider The Following Schema Named University
24 pages
Gunbound Offline
No ratings yet
Gunbound Offline
1 page
838 200 Sap BW Interview Questions and Answers PDF
No ratings yet
838 200 Sap BW Interview Questions and Answers PDF
18 pages
AWS Architecture Icons Deck For Light BG 20190729 Legacy
No ratings yet
AWS Architecture Icons Deck For Light BG 20190729 Legacy
111 pages
Response Codes Adabas
No ratings yet
Response Codes Adabas
14 pages
23-24-III-DSL-Assignment List
No ratings yet
23-24-III-DSL-Assignment List
3 pages
Test PDF
No ratings yet
Test PDF
2 pages
Triggers & Active Data Bases
No ratings yet
Triggers & Active Data Bases
10 pages
SQL
No ratings yet
SQL
9 pages
DWM-theory
No ratings yet
DWM-theory
37 pages
CSCI235 Database Systems Project
No ratings yet
CSCI235 Database Systems Project
11 pages
From Excel to Executive Dashboards: Your Power BI Roadmap!
No ratings yet
From Excel to Executive Dashboards: Your Power BI Roadmap!
11 pages
Price List API
No ratings yet
Price List API
19 pages
S190311 SAMSUNG Memory Over Provisioning White Paper
No ratings yet
S190311 SAMSUNG Memory Over Provisioning White Paper
8 pages
Rubrik CDM Version 6.0 Events Guide (Rev. A1)
No ratings yet
Rubrik CDM Version 6.0 Events Guide (Rev. A1)
40 pages
Dice Resume CV SN
No ratings yet
Dice Resume CV SN
5 pages
A7 - E7-1-to-E7-3 Database System
No ratings yet
A7 - E7-1-to-E7-3 Database System
5 pages
Navya - Creating the MealRater App
No ratings yet
Navya - Creating the MealRater App
7 pages
Mashup Tool For Automatic Query Generation For Data Web
No ratings yet
Mashup Tool For Automatic Query Generation For Data Web
5 pages
LogRhythm NextGen SIEM v3
No ratings yet
LogRhythm NextGen SIEM v3
38 pages
LIST_OF_REGISTERED_VOTERS_BY_BARANGAY_02021-11-23_205742
No ratings yet
LIST_OF_REGISTERED_VOTERS_BY_BARANGAY_02021-11-23_205742
803 pages
AIS E12 CH04
No ratings yet
AIS E12 CH04
67 pages
Data Structures and CAATTs For Data Extraction
No ratings yet
Data Structures and CAATTs For Data Extraction
11 pages

Episode 2 - Transcription

Uploaded by

Episode 2 - Transcription

Uploaded by

Intro

What we're doing here is trying to create a combined table.

Then we're selecting from the geography table as "G".

We are going to then SELECT:

Fewer tables in our data model are more efficient.

It makes sense. It's fine.

Let me see if I can get this part here...

It's because we don't need two different tables.

We can combine them.

Fewer tables in a data model—it just makes sense.

You could have cities.

If we have, let me see if I can draw this up for you guys...

Say we have a data model.

I haven't done this forever, that's for sure.

I'm just trying to get it out there already.

Customer Review Table

Let’s take a look at that one.

You can see we have:

So, we've done a small fix there.

In Python, we’re going to do something called sentiment analysis.

I'll just very quickly explain what it is.

We will try to determine if the text in the review is positive or negative.

Later, when we add the sentiment analysis, we can create:

 A Rating Sentiment Score

The customer provides context:

 Who was the customer?

The product tells us:

Here’s what it looks like before we’ve done anything to it.

These are the steps we’ll take to clean up this table:

Let’s run the query.

After execution, this is what the cleaned table looks like:

 IDs are listed first for better clarity.

We’ve introduced a few more SQL concepts here, such as:

 Using UPPER to standardize text

Customer Journey Table

If we start by just looking at the Customer Journey table by itself, it shows:

This table allows us to analyze customer journeys in a funnel.

 Did they view a product?

To fix these issues, we’ll do the following:

I’ll show you the query to identify duplicates first.

Let’s execute this query to verify duplicate records.

Next, we’ll handle the missing Duration values.

When we execute the final query, the cleaned table will:

1. Have no duplicate rows.

The advanced concepts we’ve covered here include:

1. Using the ROW_NUMBER function to identify and remove duplicates.

 How the ROW_NUMBER function partitions and orders data.

Data Model Overview

You’ve learned how to:

1. Connect to a database in SQL Server Management Studio.

You might also like