0% found this document useful (0 votes)
308 views

Google Coursera Data Analytics

This document provides an overview of the Google Data Analytics course offered through Coursera. The course covers the following topics over 6 weeks: 1) the data analytics process, which involves asking questions, preparing, processing, analyzing, sharing, and acting on data; 2) the importance of using data-driven decision making over relying on gut instinct; 3) how blending data with business knowledge can help "solve mysteries"; and 4) an introduction to the field of people analytics.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
308 views

Google Coursera Data Analytics

This document provides an overview of the Google Data Analytics course offered through Coursera. The course covers the following topics over 6 weeks: 1) the data analytics process, which involves asking questions, preparing, processing, analyzing, sharing, and acting on data; 2) the importance of using data-driven decision making over relying on gut instinct; 3) how blending data with business knowledge can help "solve mysteries"; and 4) an introduction to the field of people analytics.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Course 1

Google Data Analytics (Coursera)

Week 1

Data analysis - the collection, transformation, and organization of data to draw conclusions,

make predictions and drive informed decision-making. Data analytics is the science of data.

Data analytics is a very broad concept that encompasses everything from the job of managing

and using data to the tools and methods that data workers use every day

Data is a collection of facts that can be used to conclude, make predictions, and assist in

decision-making.

Phases of the data analytics process

Analysts use data-driven decision-making and follow a step-by-step process.

1. Ask questions and define the problem. In this phase, we do two things. We define the

problem to be solved (by looking at the current state and identifying how it's different

from the ideal state) and we make sure that we fully understand stakeholders' (people

who have invested time and resources into a project and are interested in the outcome)

expectations.

2. Prepare data by collecting and storing the information. Here, you’ll learn about the

different types of data and how to identify which kinds of data are most useful for solving

a particular problem. It's also important that your data and results are objective and

unbiased. In other words, any decisions made from your analysis should always be based

on facts and be fair and impartial


3. Process data by cleaning and checking the information. Here, data analysts find and

eliminate any errors and inaccuracies that can get in the way of results. This usually

means cleaning data, transforming it into a more useful format, combining two or more

datasets to make information more complete, and removing outliers, which are any data

points that could skew the information. After that, you'll learn how to check the data you

prepare to make sure it's complete and correct. This phase is all about getting the details

right. So you'll also fix typos, inconsistencies, or missing and inaccurate data.

4. Analyze data to find patterns, relationships, and trends. Analyzing the data you've

collected involves using tools to transform and organize that information so that you can

draw useful conclusions, make predictions, and drive informed decision-making. There

are lots of powerful tools data analysts use in their work and this course you'll learn about

two of them, spreadsheets and structured query language, or SQL, which is often

pronounced "sequel."

5. Share data with your audience. Here you'll learn how data analysts interpret results and

share them with others to help stakeholders make effective data-driven decisions. In the

sharing phase, visualization is a data analyst's best friend. So this course will highlight

why visualization is essential to getting others to understand what your data is telling

you. With the right visuals, facts and figures become so much easier to see and complex

concepts become easier to understand. We'll explore different kinds of visuals and some

great data visualization tools. You'll also practice your presentation skills by creating

compelling slideshows and learning how to be fully prepared to answer questions.

6. Act on the data and use the analysis results. This is the exciting moment when the

business takes all of the insights you, the data analyst, have provided and puts them to
work to solve the original business problem and will be acting on what you've learned

throughout this program. This is when you prepare for your job search and have the

chance to complete a case study project. In the active phase of the data analysis process, a

company may need to validate the insights of the data analysis team.

But other factors influence the decision-making process. You may have read mysteries where the

detective used their gut instinct and followed a hunch that helped them solve the case. Gut

instinct is an intuitive understanding of something with little or no explanation. This isn’t always

something conscious; we often pick up on signals without even realizing it. You just have a

“feeling” it’s right.

Why gut instinct can be a problem

At the heart of data-driven decision-making is data. Therefore, data analysts must focus on the

data to ensure they make informed decisions. If you ignore data by preferring to make decisions

based on your own experience, your decisions may be biased. But even worse, decisions based

on gut instinct without any data to back them up can cause mistakes.

Consider an example of a restaurant entrepreneur, partnering with a well-known chef to develop

a new restaurant in a bustling part of the city’s central shopping district. The well-known chef

has several restaurants across the city. Banking on their reputation, the restaurant entrepreneur

and chef followed gut instinct and created another uniquely themed restaurant. However,

fundraising efforts fell short to fund the opening of the restaurant after months of planning and

preparation. The property will go back on the market to be sold at a loss. Had the entrepreneur
done more research, they would've found data showing prospective customers in this new

restaurant location were very different from the chef's other restaurants.

The more you understand the data related to a project, the easier it will be to figure out what is

required. These efforts will also help you identify errors and gaps in your data so you can

communicate your findings more effectively. Sometimes experience helps you make a

connection that no one else would notice. For example, a detective might be able to crack open a

case because they remember an old case just like the one they’re solving today. It's not just gut

instinct.

Data + business knowledge = mystery solved

Blending data with business knowledge, plus maybe a touch of gut instinct, will be a common

part of your process as a junior data analyst. The key is figuring out the exact mix for each

particular project. A lot of times, it will depend on the goals of your analysis. That is why

analysts often ask, “How do I define success for this project?”

In addition, try asking yourself these questions about a project to help find the perfect balance:

• What kind of results are needed?

• Who will be informed?

• Am I answering the question being asked?

• How quickly does a decision need to be made?

For instance, if you are working on a rush project, you might need to rely on your knowledge and

experience more than usual. There just isn’t enough time to thoroughly analyze all of the
available data. But if you get a project that involves plenty of time and resources, then the best

strategy is to be more data-driven. It’s up to you, the data analyst, to make the best possible

choice. You will probably blend data and knowledge in a million different ways throughout your

data analytics career. And the more you practice, the better you will get at finding that perfect

blend.

Meet and Greet

Hi. I am Salma and am based in Lagos, Nigeria. I have a B.Sc (Hons) in Computer science and

am currently undergoing my National Youth Service. I first started thinking of data analytics

when I joined a Community Development Service during my 3-week orientation stays on camp.

I was introduced to the basics of data analytics and became somewhat fascinated with how much

data we produce, and the possibilities of what that data could be used for. I am quite excited

about this course and hope to learn a lot from it.

I am hoping to acquire the knowledge and qualifications to achieve my career goal of becoming

a UI/UX designer and a front-end developer. I am inspired by the fact that computer science has

become a fundamental element in the development of a better, smarter future for our world, and

my goal is to be part of that development process.

People analytics — also known as human resources analytics or workforce analytics. People

analytics is the practice of collecting and analyzing data on the people who make up a company’s

workforce to gain insights to improve how the company operates. Being a people analyst

involves using data analysis to gain insights about employees and how they experience their

work lives. The insights are used to define and create a more productive and empowering
workplace. This can unlock employee potential, motivate people to perform at their best, and

ensure a fair and inclusive company culture.

Data science encompasses machine learning, statistics, and analytics

An ecosystem is a group of elements that interact with one another

Dataset -a collection of data that can be manipulated or analyzed as one unit

Data ecosystems are made up of various elements that interact with one another to produce,

manage, store, organize, analyze, and share data. These elements include hardware and software

tools, and the people who use them. People like you. Data can also be found in something called

the cloud. The cloud is a place to keep data online, rather than on a computer hard drive. So

instead of storing data somewhere inside your organization's network, that data is accessed over

the internet. So the cloud is just a term we use to describe the virtual location.

Data science is defined as the creation of new ways of modeling and understanding the unknown

by using raw data. Data scientists create new questions using data while analysts find answers to

existing questions by creating insights from a data source

Data-driven decision-making is defined as using facts to guide business strategy

Subject matter experts - can look at the results of data analysis and identify any

inconsistencies, make sense of grey areas, and eventually validate choices being made
Data analysis life cycle - the process of going from data to decision. Data goes through several

phases as it gets created, consumed, tested, processed, and reused. With a life cycle model, all

key team members can drive success by planning work both upfront and at the end of the data

analysis process. While the data analysis life cycle is well known among experts, there isn't a

single defined structure of those phases. There might not be one single architecture that’s

uniformly followed by every data analysis expert, but there are some shared fundamentals in

every data analysis process.

This reading provides an overview of several, starting with the process that forms the foundation

of the Google Data Analytics Certificate.

The process presented as part of the Google Data Analytics Certificate will be valuable to you as

you keep moving forward in your career:

● Ask: Business Challenge/Objective/Question

● Prepare: Data generation, collection, storage, and data management

● Process: Data cleaning/data integrity

● Analyze: Data exploration, visualization, and analysis

● Share: Communicating and interpreting results

● Act: Putting your insights to work to solve the problem

Understanding this process—and all of the iterations that helped make it popular—will be a big

part of guiding your analysis and your work in this program.

Week 2

Analytical skills - qualities and characteristics associated with solving problems using facts.

Aspects of analytical skills are


● Curiosity: a desire to know more about something, and ask the right questions

● Understanding context: understanding where information fits into the “big picture”.

Context is the condition in which something exists or happens

● Having a technical mindset: breaking big things into smaller steps

● Data design: thinking about how to organize data and information

● Data strategy: thinking about the people, processes, and tools used in data analysis. It is

the management of people, processes, and tools used in data analysis

Analytical thinking involves identifying and defining a problem and then solving it by using data

in an organized step-by-step manner. Aspects of analytical thinking are

● Visualization - the graphical representation of information. Some of them include graphs,

maps, or other design elements

● Strategy

● Problem-orientation

● Correlation (correlation does not equal causation)

● Big-picture and detail-oriented thinking

Some of the questions data analysts ask when they’re on the hunt for a solution

● What is the root cause of the problem? A root cause is a reason why a problem occurs. If

we can identify and get rid of a root cause, we can prevent that problem from happening

again
● Where are the gaps in our process? For this, gap analysis is used. Gap analysis is a

method for examining and evaluating how a process works currently to get to where you

want to be in the future

● What did we not consider before? This is a great way to think about what information or

procedure might be missing from a process, so you can identify ways to make better

decisions and strategies moving forward

Nonprofits are organizations dedicated to advancing a social cause or advocating for a particular

effect

Week 3

The life cycle of data is

● Planning - during planning, a business decides what kind of data it needs, how it will be

managed throughout its life cycle, who will be responsible for it, and the optimal

outcomes.

● Capture - this is where data is collected from a variety of different sources and brought

into the organization. One common method is getting data from outside resources.

Another way to get data is from a company’s documents and files, which are usually

stored inside a database (a collection of data stored in a computer system)

● Manage - here we’re talking about how we cared for our data, how and where it's stored,

the tools used to keep it safe and secure, and the actions are taken to make sure it's

maintained properly
● Analyze - this is where data analysis shines. In this phase, the data is used to solve

problems, make great decisions, and support business goals

● Archive - archiving means storing data in a place where it's still available, but may not be

used again

● Destroy - to destroy data stored on multiple hard drives, secure data erasure software is

used. If there were any paper files, they would be shredded too. This is important for

protecting a company’s private information, as well as private data about its customers

Note: Be careful not to mix up or confuse the six stages of the data life cycle (Plan, Capture,

Manage, Analyze, Archive, and Destroy) with the six phases of the data analysis life cycle (Ask,

Prepare, Process, Analyze, Share, and Act). They shouldn't be used or referred to

interchangeably.

Learn about the process through the program:

0. Learn more about the Ask phase of the process in the Ask Questions to Make

Data-Driven Decisions course.

0. Learn more about the Prepare phase of the process in the Prepare Data for

Exploration course.

0. Learn more about the Process phase of the process in the Process Data from Dirty

to Clean course.
How the data analysis process guides this program:

0. Learn more about the Analyze phase of the process in the Analyze Data to

Answer Questions and Data Analysis with R Programming courses.

0. Learn more about the Share phase of

the process in the Share Data Through the Art of Visualization and Data Analysis with R

Programming courses.

0. Learn more about the Act phase of the process in the Google Data Analytics

Capstone: Complete a Case Study course.

Common tools data analysts use

1. Spreadsheets - there are lots of different spreadsheet solutions, but two popular options are

Microsoft Excel and Google Sheets. To put it simply, a spreadsheet is a digital worksheet. It

stores, organizes, and sorts data. This is important because the usefulness of your data depends

on how well it's structured. When you put your data into a spreadsheet, you can see patterns, and

group information and easily find the information you need. Spreadsheets also have some really

useful features called formulas (a set of instructions that performs a specific calculation using the

data in a spreadsheet) and functions (a preset command that automatically performs a specific

process or task using the data in a spreadsheet).


Spreadsheets structure data in a meaningful way by letting you

• Collect, store, organize, and sort information

• Identify patterns and piece the data together in a way that works for each specific

data project

• Create excellent data visualizations, like graphs and charts.

2. Query languages - are computer programming languages that allow you to retrieve and

manipulate data from a database. You'll learn something called structured query language, more

commonly known as SQL. SQL is a language that lets data analysts communicate with a

database. SQL is the most widely used structured query language for a couple of reasons. It's

easy to understand and works very well with all kinds of databases. With SQL, data analysts can

access the data they need by making a query. Although query means to question, I like to think of

it as more of a request. So you're requesting that the database do something for you. You can ask

it to do a lot of different things such as insert, delete, select, or update data.

Some popular Structured Query Language (SQL) programs include MySQL, Microsoft SQL

Server, and BigQuery.

Query languages

• Allow analysts to isolate specific information from a database(s)

• Make it easier for you to learn and understand the requests made to databases

• Allow analysts to select, create, add, or download data from a database for

analysis

3. Visualization tools - data visualization is the graphical representation of information. Some

examples include graphs, maps, and tables. Most people process visuals more easily than words

alone. That's why visualizations are so important. They help data analysts communicate their
insights to others, in a compelling way. When you think about the data analysis process, after

data is prepared, processed, and analyzed, the insights are visualized so they can be understood

and shared. Some popular visualization tools are Tableau and Looker. Data analysts like using

Tableau because it helps them create visuals that are very easy to understand. This means that

even non-technical users can get the information they need. Looker is also popular with data

analysts because it gives them an easy way to create visuals based on the results of a query. With

Looker, you can give stakeholders a complete picture of your work by showing them

visualization data and the actual data related to it.

These tools

• Turn complex numbers into a story that people can understand

• Help stakeholders come up with conclusions that lead to informed decisions and

effective business strategies

• Have multiple features

- Tableau's simple drag-and-drop feature lets users create interactive graphs in

dashboards and

worksheets

- Looker communicates directly with a database, allowing you to connect your data right

to the visual

tool you choose

A career as a data analyst also involves using programming languages, like R and Python, which

are used a lot for statistical analysis, visualization, and other data analysis.

Choosing the right tool for the job


As a data analyst, you will usually have to decide which program or solution is right for the

particular project you are working on. In this reading, you will learn more about how to choose

which tool you need and when.

Depending on which phase of the data analysis process you’re in, you will need to use different

tools. For example, if you are focusing on creating complex and eye-catching visualizations, then

the visualization tools we discussed earlier are the best choice. But if you are focusing on

organizing, cleaning, and analyzing data, then you will probably be choosing between

spreadsheets and databases using queries.

Differences between spreadsheets and databases

Spreadsheets

Database

Software applications

Data stores - accessed using a query language (e.g. SQL)

Structure data in a row and column format

Structure data using rules and relationships

Organize information in cells

Organize information in complex collections

Provide access to a limited amount of data

Provide access to huge amounts of data

Manual data entry

Strict and consistent data entry

Generally one user at a time

Multiple users
Controlled by the user

Controlled by a database management system

You don’t have to choose one or the other because each serves its purpose. Generally, data

analysts work with a combination of the two, as both tools are very useful in data analytics. For

example, you can store data in a database, then export it to a spreadsheet for analysis. Or, if you

are collecting information in a spreadsheet, and it becomes too much for that particular platform,

you can import it into a database. And, later in this course, you will learn about programming

languages like R that give you even greater control of your data, its analysis, and the

visualizations you create.

Week 4

Mastering Spreadsheets basics

More spreadsheet resources

In the spirit of lifelong learning, it is good to have resources to turn to when you want to know

more about using spreadsheets. Two of the most well-known and used spreadsheet platforms are

Google Sheets and Microsoft Excel. Both provide free online training resources that you can

access anytime you need them. Bookmark these links if you want to access them later.

Google Sheets Training and Help

Learn even more ways to move, store, and analyze your data with the Google Sheets Training

and Help page, located in the Google Workspace Learning Center. This hub offers an expanded

list of tips, from beginner to advanced, along with cheat sheets, templates, guides, and tutorials.

Google Sheets Cheat Sheet

Want to learn more about Google Sheets? This online help article features a short list of the most

important functions you will use, including rows, columns, cells, and functions.
Microsoft Excel for Windows Training

Get to know Excel spreadsheets a little better by visiting this free online training center. Offering

everything from a quick-start guide and introduction to tutorials and templates, you will find

everything you need to know, all in one place.

SQL

Remember that SQL can do lots of the same things with data spreadsheets can do. You can use it

to store, organize, and analyze your data, among other things. It is however on a larger scale,

bigger and more action-packed than spreadsheets. Think of them as supersized spreadsheets. For

example, you can use a spreadsheet when you have a small dataset, say one with just 100 rows,

and SQL is used when you have a larger dataset. To use SQL, you need a place where the SQL

language is understood. There are several databases out there that use SQL, they include Oracle,

MySQL, PostgreSQL, and Microsoft SQL Server. No matter which database you use only works

the same in each. In SQL, a query is a request for data or information from a database

You can see that with this query we can select specific data from a table by adding where we can

filter the data based on certain conditions.

SQL Guide: Getting started

Just as humans use different languages to communicate with others, so do computers. Structured

Query Language (or SQL, often pronounced “sequel”) enables data analysts to talk to their

databases. SQL is one of the most useful data analyst tools, especially when working with large

datasets in tables. It can help you investigate huge databases, track down text (referred to as

strings) and numbers, and filter for the exact kind of data you need—much faster than a

spreadsheet can.
If you haven’t used SQL before, this reading will help you learn the basics so you can appreciate

how useful SQL is and how useful SQL queries are in particular. You will be writing SQL

queries in no time at all.

What is a query?

A query is a request for data or information from a database. When you query databases, you use

SQL to communicate your question or request. You and the database can always exchange

information as long as you speak the same language.

Every programming language, including SQL, follows a unique set of guidelines known as

syntax. The syntax is the predetermined structure of a language that includes all required words,

symbols, and punctuation, as well as their proper placement. As soon as you enter your search

criteria using the correct syntax, the query starts working to pull the data you’ve requested from

the target database.

The syntax of every SQL query is the same:

• Use SELECT to choose the columns you want to return.

• Use FROM to choose the tables where the columns you want are located.

• Use WHERE to filter for certain information.

A SQL query is like filling in a template. You will find that if you are writing a SQL query from

scratch, it is helpful to start a query by writing the SELECT, FROM, and WHERE keywords in

the following format:

Next, enter the table name after the FROM; the table columns you want after the SELECT; and,

finally, the conditions you want to place on your query after the WHERE. Make sure to add a

new line and indent when adding these, as shown below:


Following this method, each time makes it is easier to write SQL queries. It can also help you

make fewer syntax errors.

Example of a query

Here is how a simple query would appear in BigQuery, a data warehouse on the Google Cloud

Platform.

The above query uses three commands to locate customers with the first name Tony:

0. SELECT the column named first_name

0. FROM a table named customer_name (in a dataset named customer_data)

0. (The dataset name is always followed by a dot, and then the table name.)

0. But only return the data WHERE the first_name is Tony

The results from the query might be similar to the following:

first_name

Tony

Tony

Tony

As you can conclude, this query had the correct syntax but wasn't very useful after the data was

returned.

Multiple columns in a query

In real life, you will need to work with more data beyond customers named Tony. Multiple

columns that are chosen by the same SELECT command can be indented and grouped.

If you are requesting multiple data fields from a table, you need to include these columns in your

SELECT command. Each column is separated by a comma as shown below:

Here is an example of how it would appear in BigQuery:


The above query uses three commands to locate customers with the first name Tony.

0. SELECT the columns named customer_id, first_name, and last_name

0. FROM a table named customer_name (in a dataset named customer_data)

0. (The dataset name is always followed by a dot, and then the table name.)

0. But only return the data WHERE the first_name is Tony

The only difference between this query and the previous one is that more data columns are

selected. The previous query selected first_name only while this query selects customer_id and

last_name in addition to first_name. In general, it is a more efficient use of resources to select

only the columns that you need. For example, it makes sense to select more columns if you will

use the additional fields in your WHERE clause. If you have multiple conditions in your

WHERE clause, they may be written like this:

Notice that, unlike the SELECT command which uses a comma to separate

fields/variables/parameters, the WHERE command uses the AND statement to connect

conditions. As you become a more advanced writer of queries, you will make use of other

connectors/operators such as OR and NOT.

Here is a BigQuery example with multiple fields used in a WHERE clause:

The above query uses three commands to locate customers with a valid (greater than 0) customer

ID whose first name is Tony and whose last name is Magnolia.

0. SELECT the columns named customer_id, first_name, and last_name

0. FROM a table named customer_name (in a dataset named customer_data)

0. (The dataset name is always followed by a dot, and then the table name.)

0. But only return the data WHERE customer_id is greater than 0, first_name is

Tony, and last_name is Magnolia.


Note that one of the conditions is a logical condition that checks to see if customer_id is greater

than zero.

If only one customer is named Tony Magnolia, the results from the query could be:

customer_id

first_name

last_name

1967

Tony

Magnolia

If more than one customer has the same name, the results from the query could be:

customer_id

first_name

last_name

1967

Tony

Magnolia

7689

Tony

Magnolia

Key takeaway

The most important thing to remember is how to use SELECT, FROM, and WHERE in a query.

Queries with multiple fields will become simpler after you practice writing your own SQL

queries later in the program.


Endless SQL possibilities

You have learned that a SQL query uses SELECT, FROM, and WHERE to specify the data to be

returned from the query. This reading provides more detailed information about formatting

queries, using WHERE conditions, selecting all columns in a table, adding comments, and using

aliases. All of these make it easier for you to understand (and write) queries to put SQL in action.

The last section of this reading provides an example of what a data analyst would do to pull

employee data for a project.

Capitalization, indentation, and semicolons

You can write your SQL queries in all lowercase and don’t have to worry about extra spaces

between words. However, using capitalization and indentation can help you read the information

more easily. Keep your queries neat, and they will be easier to review or troubleshoot if you need

to check them later on.

Notice that the SQL statement shown above has a semicolon at the end. The semicolon is a

statement terminator and is part of the American National Standards Institute (ANSI) SQL-92

standard, which is a recommended common syntax for adoption by all SQL databases. However,

not all SQL databases have adopted or enforced the semicolon, so you may come across some

SQL statements that aren’t terminated with a semicolon. If a statement works without a

semicolon, it’s fine.

WHERE conditions

In the query shown above, the SELECT clause identifies the column you want to pull data from

my name, field1, and the FROM clause identifies the table where the column is located by name,

table. Finally, the WHERE clause narrows your query so that the database returns only the data

with an exact value match or the data that match a certain condition that you want to satisfy.
For example, if you are looking for a specific customer with the last name Chavez, the WHERE

clause would be:

WHERE field1 = 'Chavez'

However, if you are looking for all customers with a last name that begins with the letter “Ch,"

the WHERE clause would be:

WHERE field1 LIKE 'Ch%'

You can conclude that the LIKE clause is very powerful because it allows you to tell the database

to look for a certain pattern! The percent sign (%) is used as a wildcard to match one or more

characters. In the example above, both Chavez and Chen would be returned. Note that in some

databases an asterisk (*) is used as the wildcard instead of a percent sign (%).

SELECT all columns

Can you use SELECT *?

In the example, if you replace SELECT field1 with SELECT *, you would be selecting all of the

columns in the table instead of the field1 column only. From a syntax point of view, it is a correct

SQL statement, but you should use the asterisk (*) sparingly and with caution. Depending on

how many columns a table has, you could be selecting a tremendous amount of data. Selecting

too much data can cause a query to run slowly.

Comments

Some tables aren’t designed with descriptive enough naming conventions. In the example, field 1

was the column for a customer’s last name, but you wouldn’t know it by the name. A better

name would have been something such as last_name. In these cases, you can place comments

alongside your SQL to help you remember what the name represents. Comments are text placed

between certain characters, /* and */, or after two dashes (--) as shown below.
Comments can also be added outside of a statement as well as within a statement. You can use

this flexibility to provide an overall description of what you are going to do, step-by-step notes

about how you achieve it, and why you set different parameters/conditions.

The more comfortable you get with SQL, the easier it will be to read and understand queries at a

glance. Still, it never hurts to have comments in a query to remind yourself of what you’re trying

to do. This also makes it easier for others to understand your query if your query is shared. As

your queries become more and more complex, this practice will save you a lot of time and

energy to understand complex queries you wrote months or years ago.

Example of a query with comments

Here is an example of how comments could be written in BigQuery:

In the above example, a comment has been added before the SQL statement to explain what the

query does. Additionally, a comment has been added next to each of the column names to

describe the column and its use. Two dashes (--) are generally supported. So it is best to use --

and be consistent with it. You can use # in place of -- in the above query, but # is not recognized

in all SQL versions; for example, MySQL doesn’t recognize #. You can also place comments

between /* and */ if the database you are using supports it.

As you develop your skills professionally, depending on the SQL database you use, you can pick

the appropriate comment delimiting symbols you prefer and stick with those as a consistent style.

As your queries become more and more complex, the practice of adding helpful comments will

save you a lot of time and energy to understand queries that you may have written months or

years prior.

Aliases
You can also make it easier on yourself by assigning a new name or alias to the column or table

names to make them easier to work with (and avoid the need for comments). This is done with a

SQL AS clause. In the example below, the alias last_name has been assigned to field1 and the

alias customers are assigned to a table. These aliases are good for the duration of the query only.

An alias doesn’t change the actual name of a column or table in the database.

Example of a query with aliases

Putting SQL to work as a data analyst

Imagine you are a data analyst for a small business and your manager asks you for some

employee data. You decide to write a query with SQL to get what you need from the database.

You want to pull all the columns: empID, firstName, lastName, jobCode, and salary. Because

you know the database isn’t that big, instead of entering each column name in the SELECT

clause, you use SELECT *. This will select all the columns from the Employee table in the

FROM clause.

Now, you can get more specific about the data you want from the Employee table. If you want all

the data about employees working in the SFI job code, you can use a WHERE clause to filter out

the data based on this additional requirement.

Here, you use:

A portion of the resulting data returned from the SQL query might look like this:

empID

firstName

lastName

job code

salary
0002

Homer

Simpson

SFI

15000

0003

Marge

Simpson

SFI

30000

0034

Bart

Simpson

SFI

25000

0067

Lisa

Simpson

SFI

38000

0088

Ned

Flanders
SFI

42000

0076

Barney

Gumble

SFI

32000

Suppose you notice a large salary range for the SFI job code. You might like to flag all

employees in all departments with lower salaries for your manager. Because interns are also

included in the table and they have salaries less than $30,000, you want to make sure your results

give you only the full-time employees with salaries that are $30,000 or less. In other words, you

want to exclude interns with the INT job code who also earn less than $30,000. The AND clause

enables you to test for both conditions.

You create a SQL query similar to below, where <> means "does not equal":

The resulting data from the SQL query might look like the following (interns with the job code

INT aren't returned):

empID

firstName

lastName

job code

salary

0002

Homer
Simpson

SFI

15000

0003

Marge

Simpson

SFI

30000

0034

Bart

Simpson

SFI

25000

0108

Edna

Krabappel

TUL

18000

0099

Moe

Szyslak

ANA

28000
With quick access to this kind of data using SQL, you can provide your manager with tons of

different insights about employee data, including whether employee salaries across the business

are equitable. Fortunately, the query shows only an additional two employees might need a salary

adjustment and you share the results with your manager.

Pulling the data, analyzing it, and implementing a solution might ultimately help improve

employee satisfaction and loyalty. That makes SQL a pretty powerful tool.

Resources to learn more

Nonsubscribers may access these resources for free, but if a site limits the number of free articles

per month and you already reached your limit, bookmark the resource and come back to it later.

• W3Schools SQL Tutorial: If you would like to explore a detailed tutorial of SQL,

this is the perfect place to start. This tutorial includes interactive examples you can edit, test, and

recreate. Use it as a reference or complete the whole tutorial to practice using SQL. Click the

green Start learning SQL now button or the Next button to begin the tutorial.

• SQL Cheat Sheet: For more advanced learners, go through this article for standard

SQL syntax used in PostgreSQL. By the time you are finished, you will know a lot more about

SQL and will be prepared to use it for business analysis and other tasks.

Planning a data visualization

Earlier, you learned that data visualization is the graphical representation of information. As a

data analyst, you will want to create visualizations that make your data easy to understand and

interesting to look at. Because of the importance of data visualization, most data analytics tools

(such as spreadsheets and databases) have a built-in visualization component while others (such

as Tableau) specialize in visualization as their primary value-add. In this reading, you will
explore the steps involved in the data visualization process and a few of the most common data

visualization tools available.

Steps to planning a data visualization

Let’s go through an example of a real-life situation where a data analyst might need to create a

data visualization to share with stakeholders. Imagine you’re a data analyst for a clothing

distributor. The company helps small clothing stores manage their inventory, and sales are

booming. One day, you learn that your company is getting ready to make a major update to its

website. To guide decisions for the website update, you’re asked to analyze data from the

existing website and sales records. Let’s go through the steps you might follow.

Step 1: Explore the data for patterns

First, you ask your manager or the data owner for access to the current sales records and website

analytics reports. This includes information about how customers behave on the company’s

existing website, basic information about who visited, who bought from the company, and how

much they bought.

While reviewing the data you notice a pattern among those who visit the company’s website

most frequently: geography and larger amounts spent on purchases. With further analysis, this

information might explain why sales are so strong right now in the northeast—and help your

company find ways to make them even stronger through the new website.

Step 2: Plan your visuals

Next, it is time to refine the data and present the results of your analysis. Right now, you have a

lot of data spread across several different tables, which isn’t an ideal way to share your results

with management and the marketing team. You will want to create a data visualization that
explains your findings quickly and effectively to your target audience. Since you know your

audience is sales oriented, you already know that the data visualization you use should:

• Show sales numbers over time

• Connect sales to location

• Show the relationship between sales and website use

• Show which customers fuel growth

Step 3: Create your visuals

Now that you have decided what kind of information and insights you want to display, it is time

to start creating the actual visualizations. Keep in mind that creating the right visualization for a

presentation or sharing it with stakeholders is a process. It involves trying different visualization

formats and making adjustments until you get what you are looking for. In this case, a mix of

different visuals will best communicate your findings and turn your analysis into the most

compelling story for stakeholders. So, you can use the built-in chart capabilities in your

spreadsheets to organize the data and create your visuals.

1) line charts can track sales over time

2) maps can connect sales to locations

3) donut charts can show customer segments

4) bar charts can compare the total visitors that make a purchase

Build your data visualization toolkit

There are many different tools you can use for data visualization.

• You can use the visualization tools in your spreadsheet to create simple

visualizations such as line and bar charts.


• You can use more advanced tools such as Tableau that allow you to integrate data

into dashboard-style visualizations.

• If you’re working with the programming language R you can use the visualization

tools in RStudio.

Your choice of visualization will be driven by a variety of drivers including the size of your data,

and the process you used for analyzing your data (spreadsheet, databases/queries, or

programming languages). For now, just consider the basics.

Spreadsheets (Microsoft Excel or Google Sheets)

In our example, the built-in charts and graphs in spreadsheets made the process of creating

visuals quick and easy. Spreadsheets are great for creating simple visualizations like bar graphs

and pie charts, and even provide some advanced visualizations like maps, and waterfall and

funnel diagrams (shown in the following figures).

But sometimes you need a more powerful tool to truly bring your data to life. Tableau and

RStudio are two examples of widely used platforms that can help you plan, create, and present

compelling data visualizations.

Visualization software (Tableau)

Tableau is a popular data visualization tool that lets you pull data from nearly any system and

turn it into compelling visuals or actionable insights. The platform offers built-in visual best

practices, which makes analyzing and sharing data fast, easy, and (most importantly) useful.

Tableau works well with a wide variety of data and includes an interactive dashboard that lets

you and your stakeholders click to explore the data interactively.

You can start exploring Tableau from the How-to Video resources. Tableau Public is free, easy to

use, and full of helpful information. The Resources page is a one-stop shop for how-to videos,
examples, and datasets for you to practice with. To explore what other data analysts are sharing

on Tableau, visit the Viz of the Day page where you will find beautiful visuals ranging from the

Hunt for (Habitable) Planets to Who’s Talking in Popular Films.

Programming language (R with RStudio)

A lot of data analysts work with a programming language called R. Most people who work with

R end up also using RStudio, an integrated development environment (IDE), for their data

visualization needs. As with Tableau, you can create dashboard-style data visualizations using

RStudio.

Check out their website to learn more about RStudio.

You could easily spend days exploring all the resources provided at RStudio.com, but the

RStudio Cheatsheets and the RStudio Visualize Data Primer are great places to start. When you

have more time, check out the webinars and videos which offer advice and helpful perspectives

for both beginners and advanced users.

A pie chart shows how a whole is broken down into parts (eg a class broken down by age)

Week 5

An issue is topic or subject to investigate. A question is designed to discover information. A

problem is an obstacle or complication that needs to be worked out

Business tasks - question or problem data analysis answers for a business eg analyze weather

data from the last decade to identify predictable patterns

Fairness - means ensuring that your analysis doesn’t create or reinforce bias. For example, a

small company with 9 male employees and 2 female employees wants to better understand the

performance quota of their employees and hence calls in a data analyst. Such a person analyses

the company’s data, concludes that the men are performing better, and recommends hiring more
men. Male employees are indeed performing well, but of course, they are, the gender ratio

between male and female employees is quite wide!

Data ethics - when a choice is made between good, bad, or a combination of consequences based

on facts

Important factors to think about when searching for your dream job

Industry - If you're just starting, a great way to guide your search is to think first about what

you're interested in. The key is to think about your interests early in your job search. That'll lead

you in the right direction, and it will help you in interviews too. Potential employers will want to

know why you're interested in their company, and how you can address their needs, so if you can

speak about your motivation to work in data analytics during interviews, you'll make yourself

stand out in a great way.

Location and Travel - When you start your job search, you need to make some decisions about

where you want to live, so it helps to ask yourself some questions, does your preferred industry

have opportunities in your area? Are you trying to stay local or would you be happy relocating?

How long are you willing to commute to work every day? Will you drive to work, walk, or take

public transport? Is that possible year-round? How do you feel about working remotely? Does

working from home excite you or bore you? Of course, you'll want to consider the cost of living,

and whether or not you want the convenience of city living or a quiet suburban home, and it's not

just about where you'll be based, some jobs may ask you to travel, which could be an exciting

chance to see the world or a deal-breaker. It's all about what you want out of this job, so start

asking yourself some of these questions. Figuring out the answers can help you narrow down

your search even further, so you're only looking at jobs you'd accept.
Culture - At this point, it's a good time to think about your values and what company culture is a

good fit for you. Ready, here comes some more questions, do you work best in a team or by

yourself? Do you like to have a set routine or do you enjoy taking on a new project and trying

new things? Do your values match the company's values? You'll want to pay attention to these

things during your job search and interview process, so you can be sure you are fully invested in

the company you work for

Data analyst roles and job descriptions

As technology continues to advance, being able to collect and analyze the data from that new

technology has become a huge competitive advantage for a lot of businesses. Everything from

websites to social media feeds is filled with fascinating data that, when analyzed and used

correctly, can help inform business decisions. A company’s ability to thrive now often depends

on how well it can leverage data, apply analytics, and implement new technologies.

This is why skilled data analysts are some of the most sought-after professionals in the world. A

study conducted by IBM estimates that there are over 380,000 job openings in the Data Analytics

field in the United States*. Because the demand is so strong, you’ll be able to find job

opportunities in virtually any industry. Do a quick search on any major job site and you’ll notice

that every type of business from zoos to health clinics, to banks, is seeking talented data

professionals. Even if the job title doesn’t use the exact term “data analyst,” the job description

for most roles involving data analysis will likely include a lot of the skills and qualifications

you’ll gain by the end of this program. In this reading, we’ll explore some of the data

analyst-related roles you might find in different companies and industries.

* Burning Glass data, Feb 1, 2021 - Jan 31, 2022, US

Decoding the job description


The data analyst role is one of many job titles that contain the word “analyst.”

To name a few others that sound similar but may not be the same role:

• Business analyst — analyzes data to help businesses improve processes, products,

or services

• Data analytics consultant — analyzes the systems and models for using data

• Data engineer — prepares and integrates data from different sources for analytical

use

• Data scientist — uses expert skills in technology and social science to find trends

through data analysis

• Data specialist — organizes or converts data for use in databases or software

systems

• Operations analyst — analyzes data to assess the performance of business

operations and workflows

Data analysts, data scientists, and data specialists sound very similar but focus on different tasks.

As you start to browse job listings online, you might notice that companies’ job descriptions

seem to combine these roles or look for candidates who may have overlapping skills. The fact

that companies often blur the lines between them means that you should take special care when

reading the job descriptions and the skills required.

The table below illustrates some of the overlaps and distinctions between them:

We used the role of data specialist as one example of many specializations within data analytics,

but you don’t have to become a data specialist! Specializations can take several different turns.

For example, you could specialize in developing data visualizations and likewise go very deep

into that area.


Job specializations by industry

We learned that the data specialist role concentrates on in-depth knowledge of databases.

Similarly, other specialist roles for data analysts can focus on in-depth knowledge of specific

industries. For example, in a job as a business analyst, you might wear some different hats than

in a more general position as a data analyst. As a business analyst, you would likely collaborate

with managers, share your data findings, and maybe explain how a small change in the

company’s project management system could save the company 3% each quarter. Although you

would still be working with data all the time, you would focus on using the data to improve

business operations, efficiencies, or the bottom line.

Other industry-specific specialist positions that you might come across in your data analyst job

search include:

• Marketing analyst — analyzes market conditions to assess the potential sales of

products and services

• HR/payroll analyst — analyzes payroll data for inefficiencies and errors

• Financial analyst — analyzes financial status by collecting, monitoring, and

reviewing data

• Risk analyst — analyzes financial documents, economic conditions, and client

data to help companies determine the level of risk involved in making a particular business

decision

• Healthcare analyst — analyzes medical data to improve the business aspect of

hospitals and medical facilities

Beyond the Numbers: A Data Analyst Journey


Rather than reading, we invite you to watch Anna Leach's TEDx talk on YouTube or the TED

platform to learn about another interesting journey as a data analyst.

The simplest way to think about decision-making is that it's a choice between consequences,

good, bad, or a combination of both

You might also like