0% found this document useful (0 votes)
4 views15 pages

ETL Prepare The Data

The document outlines the comprehensive process of data analysis using Microsoft Power BI, covering data preparation, cleaning, transformation, modeling, visualization, and deployment. It also delves into artificial intelligence concepts, including machine learning, deep learning, and natural language processing, highlighting their applications and advancements. Additionally, it emphasizes the importance of identifying key stakeholders in data analysis projects and provides a framework for engaging them effectively.

Uploaded by

ebadmohdhusain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views15 pages

ETL Prepare The Data

The document outlines the comprehensive process of data analysis using Microsoft Power BI, covering data preparation, cleaning, transformation, modeling, visualization, and deployment. It also delves into artificial intelligence concepts, including machine learning, deep learning, and natural language processing, highlighting their applications and advancements. Additionally, it emphasizes the importance of identifying key stakeholders in data analysis projects and provides a framework for engaging them effectively.

Uploaded by

ebadmohdhusain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Prepare the data

Get data from data sources

 Identify and connect to a data source


 Change data source settings, including credentials, privacy levels, and data source
locations
 Select a shared dataset, or create a local dataset
 Choose between DirectQuery, Import, and Dual mode
 Change the value in a parameter

Clean the data

 Evaluate data, including data statistics and column properties


 Resolve inconsistencies, unexpected or null values, and data quality issues
 Resolve data import errors

Transform and load the data

 Select appropriate column data types


 Create and transform columns
 Transform a query
 Design a star schema that contains facts and dimensions
 Identify when to use reference or duplicate queries and the resulting impact
 Merge and append queries
 Identify and create appropriate keys for relationships
 Configure data loading for queries

Model the data

Design and implement a data model

 Configure table and column properties


 Implement role-playing dimensions
 Define a relationship's cardinality and cross-filter direction
 Create a common date table
 Implement row-level security roles

Create model calculations by using DAX

 Create single aggregation measures


 Use CALCULATE to manipulate filters
 Implement time intelligence measures
 Identify implicit measures and replace with explicit measures
 Use basic statistical functions
 Create semi-additive measures
 Create a measure by using quick measures
 Create calculated tables

Optimize model performance


 Improve performance by identifying and removing unnecessary rows and columns
 Identify poorly performing measures, relationships, and visuals by using Performance
Analyzer
 Improve performance by choosing optimal data types
 Improve performance by summarizing data

Visualize and analyze the data

Create reports

 Identify and implement appropriate visualizations


 Format and configure visualizations
 Use a custom visual
 Apply and customize a theme
 Configure conditional formatting
 Apply slicing and filtering
 Configure the report page
 Use the Analyze in Excel feature
 Choose when to use a paginated report

Enhance reports for usability and storytelling

 Configure bookmarks
 Create custom tooltips
 Edit and configure interactions between visuals
 Configure navigation for a report
 Apply sorting
 Configure sync slicers
 Group and layer visuals by using the Selection pane
 Drill down into data using interactive visuals
 Configure export of report content, and perform an export
 Design reports for mobile devices
 Incorporate the Q&A feature in a report

Identify patterns and trends

 Use the Analyze feature in Power BI


 Use grouping, binning, and clustering
 Use AI visuals
 Use reference lines, error bars, and forecasting
 Detect outliers and anomalies
 Create and share scorecards and metrics

Deploy and maintain assets

Create and manage workspaces and assets

 Create and configure a workspace


 Assign workspace roles
 Configure and update a workspace app
 Publish, import, or update assets in a workspace
 Create dashboards
 Choose a distribution method
 Apply sensitivity labels to workspace content
 Configure subscriptions and data alerts
 Promote or certify Power BI content
 Manage global options for files

Manage datasets

 Identify when a gateway is required


 Configure a dataset scheduled refresh
 Configure row-level security group membership
 Provide access to datasets

Microsoft Power BI Analyst Professional Certificate


At this point, you may be wondering if you’ll have access to the information that you need to
pass the Exam PL-300 Microsoft Power BI Data Analyst successfully.

The answer is yes!

By the end of this professional certificate you will have covered the following topics:

 Prepare the data


 Model the data
 Visualize and analyze the data
 Deploy and maintain assets

Artificial intelligence
Artificial intelligence (AI) is the field of computing focused on creating systems capable of
performing tasks that would typically require human intelligence. These tasks include
reasoning, learning, problem-solving, perception, language understanding, and even the
ability to move and manipulate objects. AI technologies leverage algorithms and dynamic
computing environments to enable machines to solve complex problems, adapt to new
situations, and learn from past experiences. Central to AI is machine learning (ML), where
algorithms detect patterns and infer probabilities from data, allowing the machine to improve
its performance over time. AI systems can range from simple, rule-based algorithms to
complex neural networks modeled on the human brain.

Machine learning
Machine learning (ML) is a critical domain within artificial intelligence that emphasizes the
development of algorithms and statistical models that enable computers to perform specific
tasks without explicit instructions. Instead, these systems learn and make predictions or
decisions based on data. Here's a more technical breakdown:
1. Types of learning:
 Supervised learning: Algorithms learn from labeled training data,
aiming to predict outcomes for new inputs.
 Unsupervised learning: Algorithms identify patterns in data without
needing labeled responses, often used for clustering and association.
 Reinforcement learning: Models learn to make sequences of
decisions by receiving feedback on the actions' effectiveness.
2. Algorithms and techniques:
 Common algorithms include linear regression, decision trees, and neural
networks.
 Advanced techniques involve deep learning, which uses layered neural
networks to analyze various levels of data features.
3. Data handling and processing:
 Effective machine learning requires robust data preprocessing, including
normalization, handling missing values, and feature selection to improve
model accuracy.
4. Performance evaluation:
 ML models are evaluated based on metrics such as accuracy, precision, recall,
and the area under the receiver operating characteristic (ROC) curve, ensuring
that they perform well on unseen data.
5. Application areas:
 ML is applied in various fields such as finance for algorithmic trading,
healthcare for predictive diagnostics, and autonomous vehicles for navigation
systems.

Deep learning
Deep learning (DL) is an advanced branch of ML that uses artificial neural networks with
multiple layers, known as deep neural networks. These networks are capable of learning from
large amounts of unstructured data. DL models automatically extract and learn features at
multiple levels of abstraction, enabling the system to learn complex patterns in large datasets.
The learning process can be:

 Supervised - where the model is trained with labeled data


 Semi-supervised - which uses a mix of labeled and unlabeled data
 Unsupervised - which relies solely on unlabeled data

This technique is particularly effective in areas such as image recognition, natural language
processing (NLP), and speech recognition, where conventional machine-learning techniques
may fall short due to the data structures' complexity. DL has propelled advancements in
generative AI, enabling the creation of sophisticated models like generative adversarial
networks (GANs) that can generate new data instances that mimic real data.

Neural networks
Neural networks (NN) are a cornerstone of AI. They are particularly effective in pattern
recognition and data interpretation tasks, which they achieve through a structure inspired by
the human brain. Comprising layers of interconnected nodes, or neurons, each with its
weights and biases, NN processes input data through these nodes. The connections between
nodes represent synapses and are weighted according to their importance. As data passes
through each layer, the network adjusts the weights, which is how learning occurs. This
structure enables neural networks to learn from vast amounts of data to make decisions,
classify data, or predict outcomes with high accuracy. NN are particularly crucial in fields
such as computer vision, speech recognition, and NLP where they can recognize complex
patterns and nuances better than traditional algorithms. The training process involves
techniques such as backpropagation, where the model learns to minimize errors by adjusting
weights to produce the most accurate outputs possible.

Generative adversarial networks (GAN)


GANs are a sophisticated class of AI algorithms used in ML, characterized by their unique
structure of two competing NNs: the generator and the discriminator. The generator is tasked
with creating data that is indistinguishable from genuine data, while the discriminator
evaluates whether the generated data is real or fake. This adversarial process, much like a
teacher-student dynamic, continuously improves the accuracy of the generated outputs. The
training involves the discriminator learning to better distinguish between real and generated
data, while the generator strives to produce increasingly convincing data, enhancing its ability
to deceive the discriminator. This setup not only helps in generating new data samples but is
also useful in unsupervised learning, semi-supervised learning, and reinforcement learning.
GANs are particularly renowned for their applications in image generation, video creation,
and voice synthesis, where they can produce highly realistic outputs.

Natural language processing (NLP)


NLP is an advanced area of AI that focuses on the interaction between computers and
humans through natural language. The goal of NLP is to read, decipher, understand, and
make sense of human languages in a manner that is valuable. It involves several disciplines,
including computer science and computational linguistics, in an effort to bridge the gap
between human communication and computer understanding. Key techniques in NLP include
syntax tree parsing, entity recognition, and sentiment analysis, among others. These
techniques help computers to process and analyze large amounts of natural language data.
NLP is used in a variety of applications, such as automated chatbots, translation services,
email filtering, and voice-activated global position systems (GPS). Each application requires
the computer to understand the input provided by humans, process that data in a meaningful
way, and if necessary, respond in a language that humans understand.

Transformers
Transformers represent a significant advancement in deep learning, particularly in the field of
NLP. Introduced by Google researchers in the seminal 2017 paper "Attention is All You
Need", transformers use a mechanism known as self-attention to weigh the importance of
each word in a sentence, regardless of its position. Unlike previous models that processed
data sequentially, transformers process all words or tokens in parallel, which significantly
increases efficiency and performance on tasks that require understanding context over long
distances within text. This architecture avoids recurrence and convolutions entirely, relying
instead on stacked self-attention and point-wise, fully connected layers for both the encoder
and the decoder components. This design allows for more scalable learning and has been
fundamental in developing models that achieve state-of-the-art results on a variety of NLP
tasks, including machine translation, text summarization, and sentiment analysis. The
transformer's ability to handle sequential data extends beyond text, making it versatile in
other domains like image processing and even music generation.

Generative pre-trained transformers


Generative pre-trained transformers (GPT) are state-of-the-art language models developed by
OpenAI that use DL techniques, specifically the transformer architecture, for natural
language understanding and generation. These models are first pre-trained on a diverse range
of internet text to develop a broad understanding of language structure and context. The pre-
training involves unsupervised learning, where the model predicts the next word in a sentence
without human-labeled corrections. This allows GPT models to generate coherent and
contextually appropriate text sequences based on the prompts they are given. Once pre-
trained, GPT models can be fine-tuned on specific tasks such as translation, question-
answering, and summarization, enhancing their applicability across various domains. Their
ability to generate human-like text and perform language-based tasks has implications across
fields such as AI-assisted writing, conversational agents, and automated content creation.
Each successive version of GPT has been larger and more complex, with GPT-4, the latest
iteration, containing 175 billion parameters, which significantly advances its learning and
generative capabilities.

Tokenization, Word2vec, and BERT


Tokenization in NLP involves splitting text into smaller units known as tokens, which can be
words, characters, or subwords. This step is crucial for preparing text for processing with
various NLP models, as it standardizes the initial input into manageable pieces for algorithms
to process. Word2vec, developed by researchers at Google, is a technique that embeds words
into numerical vectors using shallow, two-layer NNs. The models are trained to reconstruct
the linguistic contexts of words, thereby capturing the relationships and multiple degrees of
similarity among them. Meanwhile, Bidirectional Encoder Representations from
Transformers (BERT) represents a significant advancement in pre-training language
representations. Developed also by Google, BERT incorporates a transformer architecture
that processes words in relation to all the other words in a sentence, rather than one-by-one in
order. This allows BERT to capture the full context of a word based on all its surroundings,
leading to a deeper understanding of language nuances. BERT's ability to handle context
from both directions makes it exceptionally powerful for tasks where context is crucial, such
as question answering and sentiment analysis.

Conclusion
In this reading, you examined the foundational concepts of generative AI. You learned about
ML, DL, and NLP, and unraveled their roles and applications in various industries.
Additionally, you delved into emerging advancements like GANs, transformers, and GPT,
recognizing their pivotal role in generating innovative content.

Understanding these foundational terms in generative AI not only enriches the conversation
among tech enthusiasts but also empowers professionals to leverage this technology in
various industries effectively. As AI continues to advance, keeping abreast of terminologies
and concepts will provide the necessary tools to navigate this dynamic field successfully.

Exemplar: Identifying stakeholders


Introduction
In the exercise Identifying stakeholders, you were tasked with identifying the key stakeholders to
interview for the sales data analysis project.

More specifically, you were asked to:

 Identify and list the key stakeholders that you will interview as part of your data analysis
project, their level of influence and/or their interest in the project, and the reason you
selected them.
 Write a list of 3 questions that you will ask each stakeholder that could help direct the
data analysis in achieving the project goals.

This reading presents one version of the expected outcome. Your answer may differ but still be
correct.

Step 1: Understand the context


In the first part of the exercise, your task was to read the case study carefully to understand the
context and goals. An example of an acceptable answer for this is:

The primary objective of the analysis is to analyze the sales data to drive new marketing
campaigns and improve the company's market share. To do this effectively, a full understanding
of the business context and the challenges the company faces is required. In addition, familiarity
with the stakeholder’s role and duties is essential.

Step 2: Identify the key stakeholders


In this step, you were asked to identify and list the key stakeholders that you think should be
interviewed and give the reason you selected them. Then you were asked to classify the chosen
stakeholders by the level of influence and interest each might have in the sales data analysis
project. 3 examples of the type of work that you could have submitted for this step of the exercise
are presented below.

Key stakeholder: Sales Manager - Kane


 Level of influence: High—Kane's high level of influence comes from his role as
Sales Manager and his responsibility to ensure that the sales team meets their targets.
Kane's deep understanding of the company's sales processes and performance directly
impacts the company's bottom line. As a result, Kane's involvement in the sales data
analysis project is essential to ensure that the project aligns with the company's overall
sales goals.
 Level of interest: High—Kane's high level of interest in the project comes from his
responsibility to improve the sales team's performance. The insights gained from the
sales data analysis will help Kane identify areas where the sales team can improve,
allowing data-driven decisions that will enhance the sales process and increase the
company's market share.
 Reason for selection: Directly involved in sales performance and understanding
customer preferences. Kane plays a critical role in the success of the sales data analysis
project and, as Sales Manager, is directly responsible for the company's sales
performance. Kane's expertise in analyzing sales data, understanding customer
preferences, and identifying trends make him an important asset.

Key stakeholder: Marketing Director - Renee


 Level of influence: High—Renee's high level of influence comes from her role as
Marketing Director, where her responsibility is to drive marketing campaigns that boost
the company's sales. Renee's insights into the effectiveness of existing marketing
campaigns, customer preferences, and trends will help create a targeted analysis that
can inform future marketing strategies.
 Level of interest: High—Renee's high level of interest in the project is rooted in her
responsibility to improve the company's market share and attract new customers. The
sales data analysis will help her identify trends and customer preferences, allowing her to
make data-driven decisions to create more effective marketing campaigns that resonate
with the target audience.
 Reason for selection: Directly involved in creating and implementing marketing
campaigns. Renee is a key stakeholder in the sales data analysis project because she
creates and implements marketing campaigns that promote the company's products and
services. Renee's extensive experience in marketing, deep understanding of the
company's marketing strategy, and ability to identify areas for improvement make her a
vital source for the project.

Key stakeholder: Customer Service Manager - Ricci


 Level of influence: Medium—Ricci's medium level of influence is due to her role in
customer service, which, while not directly responsible for sales or marketing, plays a
crucial role in retaining existing customers and maintaining the company's reputation.
Ricci's insights into customer satisfaction can help the data analyst understand the
factors that impact the company's sales and inform recommendations for improving the
customer experience.
 Level of interest: High—Ricci's high level of interest in the project comes from her
responsibility to ensure that customers are satisfied with the company's products and
services. The sales data analysis will help her identify trends and areas of improvement,
allowing Ricci to make data-driven decisions to enhance customer satisfaction and
contribute to the company's overall sales and market share.
 Reason for selection: Directly involved in customer satisfaction and feedback, Ricci
is an important stakeholder in the sales data analysis project because of her role in
ensuring customer satisfaction. Ricci’s understanding of customer complaints, feedback,
and preferences can provide valuable insights into areas where the company can
improve its products and services. These insights can help the data analyst identify
growth opportunities and inform future sales and marketing strategies.

Step 3: Prepare interview questions


Based on the stakeholders selected, you needed to write a list of 3 questions you would prepare
to ask each stakeholder. The questions should steer the data analysis toward achieving the
project goals. Below are some examples of questions you could create for this part of the
exercise.
Renee (Marketing Director)
 Can you provide an overview of the current marketing strategy and how it has evolved
over the years?
 How do you measure the effectiveness of marketing campaigns, and what key
performance indicators (KPIs) do you track?
 What have been the most successful marketing campaigns in the past, and what factors
contributed to their success?
 Are there any marketing campaigns or strategies that have underperformed or not met
expectations? If so, what were the challenges and learnings from those experiences?
 How do you segment the target audience for marketing campaigns, and what are the key
demographics and preferences of our customers?

Kane (Sales Manager)


 Can you describe the current sales strategy and how it has evolved over time to address
changing market dynamics?
 What are the key performance indicators (KPIs) that you use to track the sales team's
performance, and how do you ensure that targets are met?
 How do you identify trends in customer buying patterns, preferences, and behaviors that
impact on the company's sales performance?
 What have been the most significant challenges the sales team has faced, and how have
you addressed those challenges?
 What types of data and insights would be most valuable to you for improving the sales
team's performance and meeting sales targets?

Ricci (Customer Service Manager)


 How do you measure customer satisfaction, and what key performance indicators (KPIs)
do you track to ensure that customers are happy with the company's products and
services?
 What are the most common customer complaints or feedback that you receive, and how
have they informed changes in the company's products or services?
 Can you provide any examples of how customer feedback has directly impacted the
company's sales or marketing strategies?
 How do you segment customers in terms of their preferences and needs, and what are
the key demographics and preferences of our customer base?
 What types of data and insights would be most valuable to you for improving customer
satisfaction and maintaining a strong relationship with the company's customer base?

Conclusion
You should now have a better understanding of identifying stakeholders to gather relevant
insights. By understanding the high-level skill of stakeholder analysis, you can more effectively
navigate complex business environments. Preparing targeted interview questions allows you to
gather specific information that is actionable and aligned with a project’s goals. Stakeholder
analysis can help you gather valuable insights that can inform your data analysis, leading to
relevant insights that can be used by businesses to make strategic decisions.

Exercise: Stakeholder experience


Introduction
Previously, you learned that analysis insights and visualizations in reports are more likely to be
impactful and useful when they are tailored to stakeholder experience (the needs, preferences,
and expectations of the stakeholders who will engage with the visualizations in your data
analysis report). In this exercise, you’ll work through a case study in which a data analyst
undertakes a six-step process to inform data visualization based on stakeholder experience.
You’ll then apply the insights you’ve gained to answer questions related to this process.

Case study

To help boost sales, Adio, the data analyst at Adventure Works, is tasked with investigating
sales, marketing, and customer data. Adio is instructed to create and share a data report with
visualizations, based on the insights, patterns, and trends he uncovers during the analysis. Adio
knows that creating visualizations with stakeholder experience in mind contributes to improved
stakeholder understanding and decision-making. To understand the needs, preferences, and
expectations of the stakeholders that form his audience (stakeholder experience), Adio engages
in the following process:

Step 1: Identifying stakeholders


The first step in the stakeholder experience process is to determine which stakeholders have an
interest in the data analysis and visualizations. Adio identifies the executives, marketing team,
and product managers as stakeholder groups for his current project.

Step 2: Defining stakeholder goals


Aware that these stakeholder groups will have diverse needs, preferences, and expectations,
Adio moves on to the next step—understanding the different stakeholders' goals, priorities, and
requirements. Through consultation with each group of stakeholders, Adio determines their
respective interests in the sales data analysis:

 The group of executives is focused on improving the overall performance of the company
and is interested in high-level insights that can drive strategic decisions related to their
competitors, products, and customer marketing.
 The marketing team is interested in insights into marketing campaign effectiveness and
how to improve conversion rates from their website and social media sites to increase
sales.
 The product managers want to understand customer behavior, such as product
popularity, preferences of different customer segments or groups, and the profitability of
each of these segments.
Step 3: Choosing the right visualization type
Adio can now choose visualization types that are tailored to the stakeholders' goals and needs.
Bar charts, line charts, and pie charts are common visualization types, but there are many others
to consider. The visualizations you choose will depend on the type of data you're working with.
You also have to think about which visualization types will best communicate the insights your
stakeholders need while being visually appealing and easy to understand. For example:

 For executives, a dashboard with high-level metrics and key performance indicators (or
KPIs) such as revenue and profit margins may be the right choice.
 The marketing professionals may find a conversion funnel that tracks the progress of
customers from the stage of visiting marketing channels to the final stage of product
purchases more useful. A bar chart to compare the effectiveness of different marketing
channels may also be an appropriate choice.
 For the product managers, a map that visualizes the distribution of customer segments
may be suitable.

Step 4: Designing with stakeholder experience in mind


Next, you need to design the visualizations with stakeholder experience in mind, ensuring
visualizations are meaningful to each group. This means designing visualizations that answer
stakeholder questions, are visually appealing, and easy to read, navigate and understand. When
designing his visualizations, Adio keeps in mind that:

 The executives prefer visualizations that are concise, easy to understand, and convey
key takeaways quickly.
 The marketing professionals are interested in visualizations that can help them identify
trends, patterns, and opportunities for improvement.
 The product managers value visualizations that can help them identify gaps,
opportunities for growth, and potential issues with specific products.

Step 5: Making visualizations interactive


You can enhance stakeholder engagement, exploration, and understanding of data insights by
adding interactive features to your visualizations. For example, you can add filters and sorting
options, or explanations that appear when stakeholders hover over different parts of a
visualization. Adio uses Power BI to create interactive data visualizations that the stakeholders
can explore and interact with. For example, he adds filtering and sorting options. That way, when
Adio shares the report, the marketing team can view data by marketing channels or campaigns.
Similarly, the product team can view sales distribution data by customer segment or product
category.

Step 6: Testing and iterating


Before sharing the visualizations with stakeholders, Adio tests them with a small group of users
that represent his target audience. Focus groups involve selecting and recruiting a diverse and
representative sample of individuals from the different stakeholder groups and then conducting
the focus group sessions. In this final step, you should gather and analyze feedback and make
any necessary adjustments to ensure that your visualizations effectively communicate the
intended insights. In doing so, you ensure that the stakeholders’ needs are met.

Instructions
Create a document
Create a new Word document called Stakeholder experience. Use this document to record your
answers to the exercise questions.

Answer questions about stakeholder experience


1. What is the primary goal of data visualization in the data analysis process?
2. Briefly define stakeholder experience in the context of data analysis and visualization.
3. Explain two reasons why stakeholder experience is important when creating data
visualizations.
4. List the six steps in the stakeholder experience process.
5. How can you identify the goals and preferences of different stakeholder groups when
designing visualizations?
6. What is one challenge you think Adio may face in designing data visualizations that meet
the needs of the different stakeholders?
7. Briefly discuss what you need to consider when choosing the right visualization type for
stakeholders.
8. What is the purpose of making visualizations interactive?
9. A focus group finds a visualization Adio designed difficult to understand. Adio adds more
interactive features and updates the design. Which step in the process is Adio engaging
in?
10. In no more than two sentences, discuss how you think stakeholder experience can
contribute to improving business outcomes through data-driven decision-making.

Conclusion
In this exercise, you were introduced to tailoring visualizations based on stakeholder experience.
You discovered how these tailored visualizations are essential to communicating data insights
and can impact stakeholder decision-making.

Exercise: Evaluating an analysis


process
Introduction
Imagine you're sipping a delicious cup of coffee at your favorite neighborhood café. The opening
of more coffee chains and independent shops in the area is making it increasingly difficult for the
café to stand out and attract new customers. The owner, Taylor, realizes that she needs a data-
driven approach to help her café regain its momentum and adapt to the changing market. She
hires a data analyst to guide her through this process.

In this exercise, you’ll evaluate the data analysis process undertaken by the data analyst for the
café. By working through this case study, you will consolidate your learning regarding the steps
involved in the data analysis process, understanding the importance of each step in the process
—from data collection to fostering a data-driven culture within an organization. You’ll also have
the opportunity to explore how the process can be tailored to a business context. Additionally,
you’ll discover the role of data analysis in helping business owners like Taylor to make well-
informed, data-driven decisions to remain competitive and regain momentum.
Note: To help you understand the concepts of the data analysis process, the familiar context of
a small, local coffee store is used here as an example. As a data analyst, you are more likely to
encounter these concepts within a larger organization, where the requirement for an analysis
process is the same, but at a larger scale.

Case study
The café operates as a charming coffee shop known for its warm ambiance, friendly staff, and
delicious coffee. The café offers a wide array of beverages, from classic espressos to specialty
lattes, as well as a selection of fresh pastries and sandwiches to cater to the diverse tastes of its
patrons. The coffee shop has established a loyal customer base, and people enjoy spending
time there to socialize, work, or simply relax with a great cup of coffee. However, as more coffee
chains and independent shops have opened in the area, the café has found it increasingly
difficult to stand out and attract new customers. The coffee market has become saturated, and
the local competition is fierce. The café's owner, Taylor, has noticed a decline in foot traffic and
sales, and she's concerned about the future of her beloved establishment.

Taylor has decided to take a data-driven approach to address her business challenges. She's
hired a data analyst to help her better understand the café's performance and uncover potential
opportunities for growth. In this exercise, you'll need to apply the knowledge you’ve gained
regarding the data analysis process and the best practices for each step, evaluating whether the
data analyst has conducted a thorough and accurate data analysis process.

Instructions
Create a document
Create a new Word document called Stages in data analysis – Evaluating an
analysis process. In this document, you will answer questions about the data analysis
process you’ll examine below.

Examine the steps performed in the data analysis process

Stage 1: Data collection

The data analyst started the data analysis process by gathering data from various sources, such
as point-of-sale (POS) systems, customer feedback forms, online reviews, social media, and
website analytics. They aimed to gather information on sales trends, customer demographics,
preferences, and behavioral patterns. This data could, for example, allow the analyst to extract
insights about the most popular beverages and food items, peak hours, and seasonal
fluctuations.

Stage 2: Data organization and cleaning

After gathering the data from multiple sources, the data analyst carefully organized and cleaned
the data in preparation for data analysis.

Stage 3: Data analysis

With clean datasets in hand, the data analyst began analyzing the data to uncover trends,
patterns, and opportunities. The analyst aimed to identify the most profitable menu items,
discovering the preferences of specific customer segments, and pinpointing the most effective
marketing channels. They made use of statistical techniques to explore relationships between
variables and gain valuable insights.

Here is a sample of the data insights gained through data analysis:

Data type Data insights


Customer  The primary customer demographic in the area has changed, with the café serving only
data segment of the possible customer audience.
 There is a demand for more plant-based milk options.
Sales data  Certain menu items are not selling well.
 Plant-based milk options are limited and often out of stock.
 There are patterns in the decline of sales, with sales dropping on weekdays and at vario
of the day.
Competitor  Certain menu items are being sold at significantly higher price points by competitors.
data  Competitors focus more on short waiting times and takeaway offers. They also have a s
social media presence and offer electronic rewards systems.

Stage 4: Data visualization

The analyst then went on to create charts, graphs, and dashboards based on their findings from
the data analysis. For example, they created a bar chart comparing the sales performance of
different menu items.

Stage 5: Generating data-driven recommendations

Based on the analysis, the data analyst then developed actionable recommendations to help the
café improve its performance. The recommendations were supported by the data insights they
gathered and tailored to address the café's unique challenges and opportunities.

Stage 6: Implementing recommendations and monitoring results

After making data-driven recommendations and giving Taylor the final report, the data analyst left
the process of implementation to Taylor and her team, concluding the data analysis process.

Evaluate the data analysis process


Once you have read through the data analysis process undertaken by the data analyst for the
café, answer the questions that follow to evaluate the process:

Data collection
1. The data analyst began the data analysis process by gathering data. What should data
analysts do in preparation for data collection to ensure the effectiveness of the data
analysis process?
2. As a part of data collection, the data analyst gathered data from various sources. Why is
this an important best practice?

Data organization and cleaning


1. Before proceeding with data analysis, the data analyst organized and cleaned the data.
What is the purpose of this step in the data analysis process?
2. What are two common issues the data analyst may have encountered during the data
organization and cleaning step?

Data analysis
1. Briefly discuss two data sources that the data analyst may have analyzed to generate the
sample of insights.

Data visualization
1. What is the role of visualizations in the data analysis process?

Generating data-driven recommendations


1. Data analysts make recommendations based on the insights gained during data analysis.
Why are data-driven recommendations important for businesses like the café?
2. Based on the data insights gained, list two actionable data-driven recommendations you
could make to help the café improve its foot traffic and sales.

Implementing the recommendations and monitoring the results


1. What should the data analyst have done during implementing recommendations and
monitoring results step?
2. Why is the step of implementing recommendations and monitoring results important?

Additional steps
1. An additional step is fostering a data-driven culture. How could the data analyst work with
Taylor to promote a data-driven culture throughout the process? Why do you think this is
important?
2. It is also important to monitor and evaluate the data analysis process itself. This can be
done as a part of the overall process or as a separate step once it has ended. Why do
you think it is important to evaluate whether a data analysis process is done correctly?

Conclusion
By completing this exercise, you have gained a deeper understanding of the data analysis
process and its application to business challenges. By embracing a data-driven approach, data
analysts can empower organizations like Taylor's to thrive in the face of adversity and adapt to
an ever-evolving competitive landscape. You can apply the knowledge and skills acquired in this
exercise to a variety of business contexts and industries. As you continue to hone your skills and
embrace the power of data, you will be well-positioned to help organizations overcome
challenges, identify opportunities, and achieve lasting success.

You might also like