0% found this document useful (0 votes)
12 views178 pages

BAUE

Uploaded by

Vijay M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views178 pages

BAUE

Uploaded by

Vijay M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 178

MODULE 1

What is business
analytics
A c c o r d i n g t o W i k i p e d i a ,” B u s i n e s s a n a l y t i c s r e f e r s t o t h e
skills, technologies, practices for continuous iterative
exploration and investigation of past business
performance to gain insight and drive business planning

Business analytics begins with a data set or commonly


with a database

“Business analytics refers to the skills, technologies,


and practices for continuous iterative exploration and
investigation of past business performance to gain
insight and drive business planning”

Business Analytics started getting its attention with the


explosion of data and evolution of data processing, data
m i n i n g m e t h o d s , a n d a ff o r d a b i l i t y o f d a t a p r o c e s s i n g
software
SCOPE OF
BUSINESS
ANALYTICS
Business analytics has a wide
range of application and usages
Business analytics has been existence since
very long time and has evolved with
availability of newer and better technologies

Evolution of
It has its roots in operations research, which
Business was extensively used during World War II

Analytics
Operations research was an analytical way to
look at data to conduct military operations
Components of Business
Analytics

Data Visualization– It is the


Data Storage– The data is stored process of graphically
by the computers in a way that it representing the information or
can be further used in the future insights drawn through the
analysis of data

Insights– Insights are the


Data Security– One of the most
outputs and inferences drawn
important components of
from the analysis of data by
Business Analytics is Data
implementing business analytics
Security
techniques and tools
The Business Analytics Process

Business Problem Framing: In this step, we basically find out what business problem we
are trying to solve, e.g., when we are looking to find out why the supply chain isn’t as
effective as it should be or why we are losing sales

Analytics Problem Framing: Once we have the problem statement, what we need to think
of next is how analytics can be done for that business analytics problem

Data: The moment we identify the problem in terms of what needs to be analyzed, the
next thing that we need is data, which needs to be analyzed

Methodology selection and model building: Once the data gets ready, the tricky part
begins
The Business Analytics Process

Deployment: Post the selection of the model and the statistical ways
of analyzing data for the solution, the next thing we need to do is to
test the solution in a real-time scenario
Use cases of Business Analytics
A ) F i n a n c e B u s i n e s s A n a l y t i c s c a n h e l p fi n a n c i a l
organizations to optimize budgeting, determine
creditworthiness in case of a loan, and also suggest the
chances of a customer defaulting on a loan

B ) B u s i n e s s A n a l y t i c s h e l p s i n e x t ra c t i n g c r u c i a l i n f o r m a t i o n
h i d d e n b e h i n d t h e c r e d i t a n d d e b i t t ra n s a c t i o n s a n d l e t s t h e
b u s i n e s s k n o w, t h e s p e n d i n g h a b i t s , l i f e s t y l e p r e f e r e n c e s ,
a n d fi n a n c i a l s t a n d i n g , ra i s i n g r e d fl a g s w h e r e v e r t h e r e i s a
probability of loss of business

C ) B u s i n e s s a n a l y t i c s b u i l t i n t o t o d a y ’s C R M s y s t e m s , e n a b l e
b u s i n e s s e s t o g a i n d e e p i n s i g h t s i n t o d e m o g ra p h i c s , s o c i o -
economic information and lifestyle of their customer groups
a n d w h a t w o u l d b e t h e b e s t fi t s t ra t e g y t o r e t a i n a n d
increase the customer base
D) Manufacturing

It becomes crucial to stay on top of things, so you cover enough to stay protected
against equipment downtime, delays in raw material supply, the inventory levels to
maintain, and the maintenance expense of machines among others

Business analytics helps you decide on the optimum levels of inventory to maintain and
how much to make up for equipment downtime and keep production at optimum levels
and much more

Also, business analytics also encompasses continuous improvement, identifying corners


to cut and helping streamline the business and nimble
Business analytics plays an important role in
determining the effectiveness of marketing campaigns
by generating insights on which kind of campaign is
E) Marketing most effective and which one is most penetrative in the
market
F) E Retailing

Today the e-retailing business is There are many players in the


expanding like never before with market and it becomes necessary for
more and more people preferring to the e-retailer to keep a hawks eye
order online than visit brick-and- on inventories to maintain with
mortar stores with covid pandemic suppliers and keep the pricing
attenuating it further competitive while cutting losses
Importance of Business
Analytics

Business analytics is a methodology or tool to make a sound commercial


decision

Facilitates better understanding of available primary and secondary data,


which again affect operational efficiency of several departments

Provides a competitive advantage to companies

Converts available data into valuable information


Data for Analytics

Business analytics uses data from It uses business data such as annual It uses the database which contains
three sources for construction of the reports, financial ratios, marketing various computer files and
business model research, etc information coming from data
analysis
TYPES OF
BUSINESS
ANALYTICS
A) Descriptive Analytics

Descriptive analytics looks


at data statistically to tell Analyzing assessment Tracking the use of
you what happened in the grades and assignments learning resources
past

Analyzing the time taken


Comparing the test results
by the learner to complete
of learners
the course
B) Diagnostic Analytics

Diagnostic analytics takes descriptive data a step further and


provides deeper analysis to answer the question: Why did this
happen?
C) Predictive
Analytics
Predictive analytics takes historical data and
feeds it into a machine learning model that
considers key trends and patterns

Predictive Modelling –What will happen next, if ?

R o o t C a u s e A n a l y s i s -W h y t h i s h a p p e n ?

D a t a M i n i n g - Id e n t i f y i n g c o r r e l a t e d d a t a

Forecasting- What if the existing trends


continue?

Monte-Carlo Simulation – What could happen?

Pa t t e r n Id e n t i fi c a t i o n a n d A l e r t s – W h e n s h o u l d
action be invoked to correct a process
D) Prescriptive
Analytics

Prescriptive analytics takes predictive data to


the next level

Optimization that helps achieve the best


outcomes

Stochastic optimization helps understand how


to achieve the best outcome and identify data
uncertainties to make better decisions
Business Analytics for
Problem Solving and
Decision Making
By analyzing consumer trends, we can provide
unique customer experiences

Provides a Better Customer Experience

Improve overall performance

Conducts better Risk assessment & management

S i g n i fi c a n t l y i n c r e a s i n g s e r v i c e m e t r i c s
performance

Providing insight into risk management and how


to improve overall management

A c c o u n t i n g P r o c e s s e s A r e S i m p l i fi e d

Enhances the Supply Chain


MODULE 2
STORYTELLING IN
A DIGITAL ERA

This chapter sets the context for the rest of this


book with an introductory discussion on data
visualization and visual data storytelling

It explores how these two concepts are similar


a n d d i ff e r e n t a n d h o w b o t h p r a c t i c e s h a v e b e e n
transformed in the digital era by new
t e c h n o l o g i e s a n d b i g g e r, m o r e d i v e r s e , a n d m o r e
dynamic data

L a s t l y, t h e c h a p t e r e x p l o r e s t h e v a l u e o f v i s u a l
data storytelling for data communication, and
establishes how data storytelling is the perfect
skill to bridge the very broad and expansive
business— IT gap
A VISUAL
REVOLUTION
Raw Data
Excel
AN EXAMPLE OF “OLD”
DATA VISUALIZATION COMPARED
TO ITS MODERN EQUIVALENT
FROM VISUALIZATION TO VISUAL DATA STORYTELLING: AN
EVOLUTION

In reality, the practice of graphing


With all the current focus on data information—and communicating visually—
visualization as the best way to see and reaches back all the way to some of our
understand today’s biggest and most diverse earliest prehistoric cave drawings where we
data, it’s easy to think of the practice as a charted minutiae of early human life,
relatively new way of representing data and through initial mapmaking, and into more
other statistical information modern advances in graphic design and
statistical graphics
FROM VISUALIZATION TO VISUAL
DATA STORYTELLING: AN EVOLUTION

A l o n g t h e w a y, t h e p ra c t i c e o f d a t a v i s u a l i z a t i o n h a s b e e n
a i d e d b y b o t h a d va n c e m e n t s i n v i s u a l d e s i g n a n d c o g n i t i v e
science as well as technology and business intelligence, and
t h e s e h a v e g i v e n r i s e t o t h e a d va n c e m e n t s t h a t h a v e l e d t o
our current state of data visualization

I n t o d a y ’s d a t a - d r i v e n b u s i n e s s e n v i r o n m e n t , a n e m e r g i n g
new approach to storytelling attempts to combine data with
g ra p h i c s a n d t e l l t h e w o r l d ’s s t o r i e s t h r o u g h t h e p o w e r o f
information visualization

A m e r i c a n a u t h o r Ku r t Vo n n e g u t i s q u o t e d a s h a v i n g f a m o u s l y
said, “There is no reason that the simple shapes of stories
c a n’ t b e f e d i n t o a c o m p u t e r — t h e y h a v e b e a u t i f u l s h a p e s .”
FROM VISUALIZATION TO VISUAL DATA STORYTELLING: AN
EVOLUTION

Just as much as today ’s approach to data visualization has changed


the way we see and understand our data, data storytelling has
equally—if not more—been the catalyst that has radically changed
the way we talk about our data
• Ask questions

The • Understanding the situational context,


including the audience, communication
importance mechanism, and desired tone

• ensure that context is fully understood.


of context
Simple text, table, heatmap, line graph,
slopegraph, vertical bar chart, vertical
stacked bar chart, waterfall chart,
horizontal bar chart, horizontal stacked bar
char t, and square area graph.

Choose an
effective
visual
• Continue to examine how people see an
how you can use that to your advantage
when crafting visuals.
Focus your
• This includes a brief discussion on sight
audience’s and memory that will act to frame up
the importance of pre-attentive
attention
attributes like size, colour, and position
on page.
FROM VISUAL TO
STORY: BRIDGING
THE GAP

According to the
BIC3 Survey
published in 2014,
communication skills
outrank technical
skills for getting a
business analysis job
• Visualizations distill complex data into digestible
forms, aiding comprehension and pattern
recognition.

• They off er a snapshot of trends, correlations, and


anomalies, empowering stakeholders to grasp
insights quickly.
1. The Power of
• From bar charts to heat maps, visuals provide a
Visuals common language for conveying information
across diverse audiences.
• Stor ytelling contextualizes data, weaving a
narrative that connects insights to real-world

2. The Essence implications.

• Through stories, data becomes relatable,

of Storytelling resonating with stakeholders on an emotional


level and driving action.

• Narratives provide coherence, guiding audiences


through the data journey and highlighting key
takeaways.
Identifying Insights: Begin by mining the data for
compelling insights that align with organizational
objectives.

Crafting the Narrative : Develop a storyline that


signifi es the insights, addressing the "So what?" and
"Why does it matter?" questions.

Integrating Visuals: Seamlessly weave visualizations


3. Bridging the into the narrative, using them as supporting evidence
to bolster key points.
Gap
Engagement through Emotion : Inject emotion into
the story, leveraging anecdotes, metaphors, and
human-centric elements to captivate audiences.

Iterative Refi nement : Continuously refi ne the


narrative based on feedback, ensuring clarity,
relevance, and impact.
1. Identifying Key Insights

2. Creating Narrative Arcs


Strategies for 3. Visual Storytelling

Bridging the Gap 4. Contextualization

5. Visual Consistency

6. Interactive Storytelling
Conclusion • By bridging the gap between
visualizations and storytelling.

(Strategies for • Organizations can unlock the full


potential of their business analytics

Bridging the eff or ts.

• Driving informed decision-making and

Gap) achieving tangible business outcomes.


• cover strategies for eff ective
stor ytelling

• Including the power of repetition,


Lessons in narrative fl ow, considerations with
spoken and written narratives
storytelling • Use various tactics to ensure that our
stor y comes across clearly in our
communications.
SUMMARY
This chapter focused on providing an
introductor y discussion on data visualization
and visual data stor ytelling by taking a look at
how these concepts are similar and diff erent,
and how both have been transformed in the
digital era

The next chapter takes a closer look at the


power of visual data stories to help us
understand what makes them so power ful and
impor tant in today ’s data deluge
THE SCIENCE OF STORYTELLING

In a September 2016 interview with NPR Marketplace,


N a t i o n a l G e o g ra p h i c e d i t o r - i n - c h i e f S u s a n G o l d b e r g s p o k e
t o h o s t Ka i Ry s s d a l a b o u t t h e p o w e r o f v i s u a l s t o r y t e l l i n g ,
w h i c h h a s p r o v i d e d a t ra n s f o r m a t i v e c o n d u i t f o r t h e
p u b l i c a t i o n i n t h e n e w d i g i t a l e ra

I t ’s w o r t h n o t i n g t h a t N a t i o n a l G e o g ra p h i c i s d o m i n a t i n g
visual storytelling online, using powerful imagery to
c a p t i va t e a n d e d u c a t e 1 9 m i l l i o n S n a p c h a t u s e r s , 6 0 m i l l i o n
I n s t a g ra m f o l l o w e r s , a n d 5 0 m i l l i o n Fa c e b o o k f o l l o w e r s

Media and journalists are not the only ones putting


emphasis on data storytelling, although they arguably have
been a particularly imaginative bunch of communicators
Visualizing
• Consider the difference in reading a novel and
versus watching a film.

• When reading, you are tasked with using your


presenting imagination—you’re reading the raw data of
words and building the story in your own mind.

• Conversely, when watching a film, your


imagination is off the hook.
The Brain on
Stories

• The visual cortex (colors and shapes)

• The olfactory cortex (scents)

• The auditory cortex (sounds)

• The motor cortex (movement)

• The sensory cortex /cerebellum (language


comprehension)
The Human on Stories
Stor yte lling ha s be e n a n inte gra l pa r t of huma n
e xpr e s s ion a nd cultur e thr oughout time

All huma n cultur e s te ll s tor ie s , a nd mos t pe ople


de r ive a gr e a t de a l of ple a s ur e f r om the m—e ve n if
the y a r e untr ue

Stor ie s a ls o have the a bility to tra ns por t us ; w e give


the a uthor lice ns e to s tr e tch the tr uth—a lthough, in
da ta s tor yte lling, this lice ns e e xte nds only a s f a r a s
it ca n be for e the da ta los e s its e la s ticity a nd be gins
to br e a k dow n
Fitness

T he s e conce pts a r e a t the he a r t of D a r w inia n the or y of na tura l s e le ction:


s ur viva l of the fi tte s t a s the me cha nis m, a nd our a bility to ove r come , fi tne s s

H uma n biology a s ide to s ur vive in compe titive a nd of te n uns ta ble e nvir onme nts
—w he the r w ilde r ne s s or bus ine s s —one thing w e ’ ve a lways ha d to do is
unde r s ta nd othe r pe ople

In f a ct, one of our mos t e xpe ns ive cognitive ta s ks w he r e w e e xe r t a n impr e s s ive


a mount of e ne r gy is in tr ying to fi gur e out othe r pe ople : pr e dict w ha t the y ’ r e
going to do, unde r s ta nd motiva tions , a s s e s s r e la tions hips , a nd s o for th
Closure
It wa s na me d for Sovie t ps ychologis t Bluma
Ze iga r nik w ho de mons tra te d tha t pe ople have a
be tte r me mor y for unfi nis he d ta s ks tha t the y do for
fi nis he d one s

Today, the Ze iga r nik e ff e ct is k now n for ma lly a s a


“ps ychologica l de vice tha t cr e a te s dis s ona nce a nd
une a s ine s s in the ta r ge t a udie nce .”

No ma tte r the s tor y ’s goa l—to focus , a lign, te a ch, or


ins pir e —w e build na r ra tive s to fos te r ima gina tion,
e xcite me nt, e ve n s pe cula tion
THE POWER OF STORIES

Sometimes the only way A good story should meet


to see the story in data is its goals—and it should be
visually actionable

A story should change, Storytelling evolves—


challenge, or confirm the don’t be afraid to try
way you think something new
The Classic Visualization
Example
One of the cor e te na nts of a vis ua l da ta s tor y is tha t
it us e s diff e r e nt for ms of da ta vis ua liz a tion—cha r ts ,
gra phs , infogra phics , a nd s o on—to br ing da ta to life

C ons tr ucte d in 1 9 7 3 by s ta tis ticia n Fra ncis Ans combe ,


the s e four da ta s e ts a ppe a r ide ntica l w he n compa r e d
by the ir s umma r y s ta tis tics

Ans combe ’s e xa mple might be cla s s ic in te r ms of


putting s ome s uppor t be hind vis ua l hor s e pow e r, but it
only br us he s the tip of the ice be r g in te r ms of vis ua l
da ta s tor yte lling
Using Small Personal Data for Big Stories

G ra phic de s igne r C he ls e a C a r ls on de cide d to ta ke this a ppr oa ch to a pe r s ona l


le ve l

In a 2 0 1 6 e xpe r ime nt, C he ls e a focus e d on a na lyz ing he r pe r s ona l Ne tfl ix


vie w ing ha bits to s e e w ha t s tor y he r ow n da ta might te ll a bout he r te le vis ion
binging ha bits , ta s te s a nd pr e fe r e nce s , a nd—pe r ha ps mor e impor ta nt in a
s tr e a ming T V ma r ke t s a tura te d w ith mor e ne w s how s e ve r y day—pos s ibly e ve n
he lp he r pr e dict a ne w f avor ite by te lling he r e xa ctly w ha t to look for

A s a vis ua l s tor yte lle r, C he ls e a w or ke d thr ough vis ua l dis cove r y a nd a va r ie ty of


gra ph type s tha t include d s ca tte r plots , pa cke d bubble cha r ts , time line s , a nd
e ve n pie cha r ts to build he r da ta s tor y
Using Small Personal Data for Big
Stories

As a result, Chelsea was able to come away


with a rich visual data story encapsulated
within a series of very deliberately crafted
visualizations
Napoleon’s March
As I’ve mentioned, using visualizations to tell stories about data is not a new
technique

French civil engineer Charles Joseph Minard has been credited for several
signifi cant contributions in the fi eld of information graphics, among them his
ver y unique visualizations of two militar y campaigns—Hannibal’s march from
Spain to Italy some 2,200 years ago and Napoleon’s invasion of Russia

In Minard’s fl ow map of Napoleon’s invasion of Russia —unoffi cially titled


“Napoleon’s March by Minard”—tells the stor y of Napoleon’s army, par ticularly
its size as it made its way from France to Russia and home again

However, as a visual stor y around human drama, it has earned the distinction
of becoming known as one of the best stor ytelling examples in histor y
Napoleon’s March

Minard’s second militar y visualization, Hannibal’s journey through


the Alps , is similar in concept to Napoleon’s march, although it
didn’ t quite pull off the same memorable story

Most stories have an inherent amount of entropy—we need to tell


them quickly and succinctly, and many times this means we only get
one chance
MODULE 3
• The human brain is wired in such a way that it
responds much faster to images than to words.

• It is estimated that we process visual


information up to 60,000 times better than
textual information.

• Hence, if you want your audience to


comprehend the data, it is always better to
visualize it.

• So data visualization is a key skill in a Business


Analyst’s toolbox.
Bad Data
Visualization
Is a database,

Is a business intelligence tool,

Graphical presentation software,

What is Can be used to monitor existing


operations,

Tableau Can be used to discover new business


oppor tunities, and

Is relatively non-technical.
Tableau
vs Excel
Getting Started with
Tableau
T h e g o a l o f t h i s c h a p t e r i s t o h e l p yo u g e t yo u r
fo o t i n g w i t h t h e Ta b l e a u p r o d u c t e c o s y s t e m a n d
u s e t h e b a s i c Ta b l e a u i n t e r f a c e s o t h a t yo u a r e
familiar enough with the tool to begin working
hands-on with data

T h i s c h a p t e r c o ve r s h o w t o g e t s t a r t e d w i t h
Ta b l e a u , r e v i e w s t h e t o o l ’s b a s i c f u n c t i o n a l i t y,
d i s c u s s e s h o w t o c o n n e c t t o d a t a , a n d p r ov i d e s a n
o ve r v i e w o f d a t a t y p e s i n Ta b l e a u

F r o m h e r e , yo u w i l l b e a b l e t o m ove o n t o t h e
v i s u a l a n a l y s i s p r o c e s s , c u ra t i n g v i s u a l s , a n d
building stories
USING TABLEAU
Standing out against many other data visualization tools on the market,
Tableau is an industr y-leading, best-of-breed tool that delivers an
approachable, intuitive environment for self-service users of all levels to help
them prepare, analyze, and visualize their data

Tableau’s stated mission is to help everyone “see and understand” their data,
and to facilitate this the company off ers a suite of software products,
including a recently released free mobile app called Vizable, designed to suit
the needs of a diverse group of clients from enterprise-level organizations to
academic users and visualization hobbyists who want to visualize data in a
mobile-fi rst format
USING TABLEAU

Tableau Desktop can connect to a wide variety


of data, stored in a variety of places—from
local spreadsheets, to multidimensional
databases, and even some cloud database
sources, like Google Analytics, Amazon
Redshif t, or Salesforce—and the number of
connections is always increasing
WHY TABLEAU?

Many impressive data visualization and stor ytelling


tools are available, but Tableau was—at least in my
opinion—always at least one step ahead of the pack
with its intuitive user inter face

Today, much like Google has outgrown its noun-


based role of search engine and data collection
superpower and become a common use verb that
encompasses all Internet searching, Tableau has
expanded beyond the boundaries of a sof tware
package and become a required job skill—and one
that is top of the list for employers
WHY TABLEAU?
We searched Labor Insight, an analytics
sof tware company powered by the largest and
most sophisticated database of labor market
data , to analyze data visualization–related IT
job descriptions posted between the period of
March 2017–Februar y 2018 across the nation,
and can you guess what popped up as the
second most in demand skill of applicants—
right behind data visualization and SQL itself ?
THE TABLEAU PRODUCT
PORTFOLIO
1. Tableau Desktop : Tableau Desktop is the company's
fl agship product, used for creating interactive data
visualizations, dashboards, and repor ts

2. Tableau Prep: Tableau Prep is a data preparation tool


that helps users clean, reshape, and combine data from
diff erent sources

3. Tableau Ser ver: Tableau Ser ver is a self-ser vice


analytics platform that allows users to share interactive
dashboards and visualizations with others within their
organization

4. Tableau Online: Tableau Online is a cloud-based


version of Tableau Ser ver, which allows users to share
dashboards and visualizations securely over the internet
THE TABLEAU
PRODUCT PORTFOLIO

5 . Ta b l e a u M o b i l e : Ta b l e a u M o b i l e i s a m o b i l e a p p
t h a t a l l o w s u s e r s t o a c c e s s a n d i n t e ra c t w i t h
Ta b l e a u d a s h b o a r d s a n d v i s u a l i z a t i o n s o n t h e i r
smartphones and tablets

6 . Ta b l e a u P u b l i c : Ta b l e a u P u b l i c i s a f r e e ve r s i o n
o f Ta b l e a u t h a t a l l o w s u s e r s t o c r e a t e a n d s h a r e
i n t e ra c t i ve v i s u a l i z a t i o n s p u b l i c l y o n t h e w e b

7 . Ta b l e a u Re a d e r : Ta b l e a u Re a d e r i s a f r e e
desktop application that allows users to view and
i n t e ra c t w i t h Ta b l e a u v i s u a l i z a t i o n s c r e a t e d b y
others
GETTING STARTED
The fi rst thing you need to do to get started
with Tableau is to get your hands on a license

If you have not done so already, refer to the


Introduction for guidance on how to get a
free trial of Tableau Desktop

You can also visit the Tableau website to


explore trial and purchase options
CONNECTING TO DATA

Connect Connect: A long list of native connections to various data sources

Open: As you create your own workbooks, recently opened workbooks appear
Open here for quick access

Sample Sample Workbooks: These are default samples provided by Tableau

Discover: This pane connects you to various Tableau training, visualization, and
Discover other resources
Connecting to Tables
Connect to a fi le : Tableau allows you to connect to fi les
such as Excel spreadsheets, CSV fi les, and text fi les

Connect to a database: Tableau suppor ts connecting to


a variety of databases, including SQL Ser ver, Oracle,
MySQL, and PostgreSQL

Connect to a cloud data source: Tableau also allows


you to connect to cloud data sources such as Salesforce,
Google Analytics, and Amazon Redshif t . Connect to a web
data source: Tableau can also connect to web data
sources such as HTML tables, XML data, and JSON data

Connections: You can add additional data sources by


clicking Add

Sheets: This pane displays all the sheets in the Excel


fi le, corresponding to the names of individual worksheet
tabs
Connecting to
Tables

The data displays in the preview


pane below the data connection
canvas

A “Go To Worksheet” icon displays


Live Vs Extract in Tableau
Live connection means that Tableau queries
the data source directly and retrieves the data
in real-time

Extract connection means that Tableau


retrieves a subset of the data from the data
source and stores it in a Tableau-specifi c
format called an extract
1. New
connection
to the data
source in
Tableau
Desktop
2. Choose
the data
you want to
bring into
Tableau.
3. Sign into
Tableau
Online in
the Server
menu
Connecting to Multiple Tables with Joins
• Open Tableau and select "Connect to Data" from the star t page

• Select the database you want to connect to and enter your credentials

• In the "Connection" tab, select the tables you want to join

• Drag and drop the tables onto the "Join area" in the bottom lef t corner of the
screen

• Select the type of join you want to use

• Specify the join conditions by dragging and dropping the fi elds from each
table onto the appropriate join fi elds

• Click "Update Now" to preview the joined data

• Once you have previewed the data, click "Sheet" to star t building your
visualizations
• It is often necessary to combine data from multiple places
—different tables or even data sources—to perform a
desired analysis.

• Depending on the structure of the data and the needs of the


analysis, there are several ways to combine the tables
Create a join
Types of Joins in Tableau
Inne r Join: An inne r j oin r e tur ns only the ma tching
r e cor ds f r om both ta ble s

Le f t Join: A le f t j oin r e tur ns a ll r e cor ds f r om the le f t


ta ble a nd only ma tching r e cor ds f r om the r ight ta ble

R ight Join: A r ight j oin r e tur ns a ll r e cor ds f r om the


r ight ta ble a nd only ma tching r e cor ds f r om the le f t
ta ble

Full Oute r Join: A f ull oute r j oin r e tur ns a ll r e cor ds


f r om both ta ble s , including r e cor ds tha t do not have
a ma tching va lue in the othe r ta ble
An inner join returns only the matching records from
both tables

Inner Join
A left join returns all records from the left
table and only matching records from the
right table

Left Join
A right join returns all records from the
right table and only matching records
from the left table

Right Join
A full outer join returns all records from
both tables, including records that do not
have a matching value in the other table

Full Outer
Join
BASIC DATA PREP WITH DATA INTERPRETER

Tableau Desktop delivers some features to help automatically


reshape fi les to get them ready for analysis in Tableau

Primar y among them is Data Interpreter, Tableau’s built-in tool for


preparing data for analysis
The original data is provided in the table format of
the following Excel sheet. We can easily identify the
Transform recorded number of off enses for males and females
in diff erent years. However, the data is not yet in an

Data ideal format for Tableau to interpret because not all


cells in the Excel sheet are data inputs; the rows

Format with are not at the same level of analysis. There are
titles, footnotes, empty cells, merged cells, and pre-

Data aggregated data mixed in among the actual data


fi elds. Unfortunately for doing secondary analysis,
many data-based reports are available only with this
Interpreter kind of mixed-up formatting. It must be detangled
before we can use it.
Therefore, if we import the data, we will see that
Tableau cannot distinguish the non-data cells and the
actual data fi elds, and thus titles are included along
Transform with lots of empty cells. This is not what we want to
be working with.

Data However, Tableau is actually smart enough to


distinguish the actual data fi elds. On the lef t-hand
Format with side, there is the checkbox for Use Data
Interpreter. Check the box in front of Use Data

Data Interpreter.

As we can see, with Data Interpreter turned on,


Interpreter Tableau is now able to distinguish the actual data
from the header and the footnotes, and able to pick
up column names correctly.
Data Interpreter also allows us to manually inspect its
interpretation process to ensure accuracy. If you are
Transform having second thoughts about the reliability of Data
Interpreter, click the blue underlined text Review the

Data results. Tableau will prompt you to open the data an


Excel workbook to let you see how Tableau is now

Format with seeing the data with Data Interpreter.

As we can see, the green area indicates what Tableau

Data interprets as the main data and the red area shows
what Tableau interprets as column names. The

Interpreter uncolored cells indicate the cells that Tableau has


excluded.
NAVIGATING THE TABLEAU
INTERFACE
S h e e t s : Fo r c r e a t i n g i n d i v i d u a l v i s u a l i z a t i o n s

D a s h b o a r d s : Fo r c o m b i n i n g m u l t i p l e s h e e t s a s w e l l a s o t h e r
o b j e c t s l i ke i m a g e s , t e x t , a n d w e b p a g e s , a n d a d d i n g
i n t e ra c t i o n s b e t w e e n t h e m l i ke fi l t e r i n g a n d h i g h l i g h t i n g

S t o r i e s : T h e s e f ra m e w o r k s c a n b e b a s e d o n v i s u a l i z a t i o n s o r
d a s h b o a r d s , o r b a s e d o n d i ff e r e n t v i e w s a n d e x p l o ra t i o n s o f a
s i n g l e v i s u a l i z a t i o n , s e e n a t d i ff e r e n t s t a g e s , w i t h d i ff e r e n t
m a r k s fi l t e r e d a n d a n n o t a t i o n s a d d e d — h o w e ve r i s b e s t s u i t e d t o
n a r ra t e t h e s t o r y i n yo u r d a t a

Menus and toolbar

Data window

S h e l ve s a n d c a r d s

Legends
M e n u s a n d To o l b a r

Logo: The Tableau logo


button brings you back to
the original Connect to
Data screen

Undo: There is no limit to


how much you can undo in
Tableau, which is an
important feature for
exploration and discovery

Save: There is no
automatic save in Tableau
Data Window

Data: At the top of the Data tab is a list of all


open data connections and the fi elds from that
data source categorized as either dimensions or
measures

Analytics: The Analytics tab enables you to


bring out pieces of your analysis—summaries,
models, and more—as drag-and-drop elements
Shelves and Cards
Columns and Rows shelves: Control grouping
headers and axes

Pages shelf: Lets you break a view into a series


of pages so you can better analyze how a specifi c
fi eld aff ects the rest of the data

Filters shelf: Filters visualizations by dimensions


or measures

Marks card: Controls the visual characteristics of


a visualization, including encoding of color, size,
labels, tooltip text, and shape

“Show Me” card : A collapsible card that shows


application visualization types for a selected
Legends

Legends will be created and automatically


appear when you place a fi eld on the Color,
Size, or Shape card

To change the order of fi elds in a


visualization, drag them around in the
legend

Hide legends by clicking on the menu and


selecting Hide Card
UNDERSTANDING
DIMENSIONS AND MEASURES

When you bring a data source into Tableau,


Tableau automatically classifi es each fi eld
as a dimension or a measure

The diff erences between these two are


impor tant, though they can be tricky to
those new at analysis

Perhaps the best way to diff erentiate these


two classifi cations is as this: dimensions
are categories, whereas measures are fi elds
you can do math with
Dimension

Dimensions are things that you can group


data by or drill down by

They are usually—but not always—


categories , and they can be grouped into
strings, dates, or geographic fi elds
Measures

M e a s u r e s a r e g e n e ra l l y n u m e r i c a l d a t a o n
w h i c h yo u w a n t t o p e r fo r m c a l c u l a t i o n s —
s u m m i n g , a ve ra g i n g , a n d s o o n

Re m e m b e r, s e t t i n g a fi e l d a s a m e a s u r e o r
dimension can be adjusted in the Data Source
screen by clicking on the data type icon

Yo u c a n a l s o c h a n g e t h i s d i r e c t l y i n t h e s h e e t
b y e i t h e r d ra g g i n g a n d d r o p p i n g a d i m e n s i o n
t o m e a s u r e , o r v i c e ve r s a , o r b y c l i c k i n g t h e
d r o p - d o w n m e n u b y a n y fi e l d a n d s e l e c t i n g
t h e C o n ve r t to M e a s u r e o p t i o n
MODULE 4
DESCRIPTIVE
ANALYTICS
Visualizing and Exploring Data
Impor t your data: Start by importing your data into
Excel

Create a pivot table: Pivot tables are a powerful tool


for summarizing and analyzing large amounts of data

Create char ts and graphs: Once you have created a


pivot table, you can use it to create charts and
graphs that help you visualize your data

Apply fi lters: Filters allow you to focus on specifi c


aspects of your data

Use conditional formatting: Conditional formatting


allows you to highlight specifi c values or ranges of
values in your data
Dashboards
A dashboard is a visual representation of a set
of key business measures

It is derived from the analogy of an


automobile’s control panel, which displays
speed, gasoline level, temperature, and so on

Dashboards provide important summaries of


key business information to help manage a
business process or function
Creating Charts
in Microsoft Excel
Micr os of t E xce l pr ovide s a compr e he ns ive
cha r ting ca pa bility w ith ma ny fe a tur e s

Ce r ta in cha r ts w or k be tte r for ce r ta in type s


of da ta , a nd us ing the w r ong cha r t ca n ma ke
it diffi cult for the us e r to inte r pr e t a nd
unde r s ta nd

While E xce l off e r s ma ny ways to ma ke cha r ts


unique a nd f a ncy, na ive us e r s of te n focus
mor e on the a tte ntion-gra bbing a s pe cts of
cha r ts ra the r tha n the ir e ff e ctive ne s s of
Types of Charts in Excel
Column char t: A column char t is used to compare
values across categories

Bar char t: A bar char t is similar to a column char t,


but the categories are displayed horizontally

Line char t: A line char t is used to show trends in


data over time

Pie char t: A pie char t is used to show the


propor tion of each categor y in a data set

Scatter char t: A scatter char t is used to show the


relationship between two sets of data

Area char t: An area char t is similar to a line char t,


but the area under the line is fi lled with color

Stock char t: A stock char t is used to show the price


trend of a stock over time
Column
chart A column chart is a type of graph that visualizes data
using vertical bars. Each bar represents a category, and
the length of the bar corresponds to the value it
represents.
Bar chart

A bar chart is similar to a column chart, but the


categories are displayed horizontally
Line chart

A line chart is used to show trends in data over time


Pie chart

A pie chart is used to show the proportion of each


category in a data set
Area chart

An area chart is similar to a line chart, but the area


under the line is filled with color
Scatter chart

A scatter chart is used to show the relationship


between two sets of data
Stock chart

A stock chart is used to show the price trend of a


stock over time
Descriptive Measures
to Summarize the
Data

Mean: The mean is the average value of a data set

Median: The median is the middle value in a data set

Mode: The mode is the value that occurs most frequently in


a data set

R a n g e : T h e r a n g e i s t h e d i ff e r e n c e b e t w e e n t h e m a x i m u m
and minimum values in a data set

Va r i a n c e : T h e v a r i a n c e i s a m e a s u r e o f t h e v a r i a b i l i t y o f a
data set

Standard deviation: The standard deviation is the square


root of the variance

Interquartile range: The interquartile range is the range of


the middle 50% of the data
IQR
Application of Excel Descriptive
Statistics Tool
Identifying the central tendency of the data: The mean,
median, and mode functions in Excel can be used to
calculate the average value of the data set

Analyzing the variability of the data: The range, variance,


and standard deviation functions in Excel can be used to
analyze the spread of the data

Detecting outliers: Excel's descriptive statistics tool can be


used to identify outliers in the data

Comparing data sets: Excel's descriptive statistics tool can


be used to compare two or more data sets

Making data-driven decisions: By using Excel's descriptive


statistics tool, you can gain insights into the
characteristics of the data set and make data-driven
decisions
Excel Correlation
Tool
The Data Analysis Correlation
tool computes correlation
coeffi cients for more than two
arrays

The output of this tool is a


matrix giving the correlation
between each pair of variables

This tool provides the same


output as the CORREL function
for each pair of variables
Probability Distributions and
Data Modelling
P r o b a b i l i t y D i s t r i b u t i o n s : E xc e l h a s s e ve ra l b u i l t- i n f u n c t i o n s t o
c a l c u l a t e t h e p r o b a b i l i t i e s o f va r i o u s p r o b a b i l i t y d i s t r i b u t i o n s ,
including

N o r m a l D i s t r i b u t i o n : E x c e l h a s t h e f u n c t i o n s N O R M . DI ST a n d
N O R M . I N V t o c a l c u l a t e p r o b a b i l i t i e s a n d i n ve r s e p r o b a b i l i t i e s o f
n o r m a l d i s t r i b u t i o n r e s p e c t i ve l y

B i n o m i a l D i s t r i b u t i o n : E xc e l h a s t h e f u n c t i o n s B I N O M . DI ST a n d
B I N O M . I N V t o c a l c u l a t e p r o b a b i l i t i e s a n d i n ve r s e p r o b a b i l i t i e s o f
b i n o m i a l d i s t r i b u t i o n r e s p e c t i ve l y

Po i s s o n D i s t r i b u t i o n : E xc e l h a s t h e f u n c t i o n s P OI SS O N . DI ST a n d
P OI SS O N . I N V t o c a l c u l a t e p r o b a b i l i t i e s a n d i n ve r s e p r o b a b i l i t i e s
o f Po i s s o n d i s t r i b u t i o n r e s p e c t i ve l y

D a t a M o d e l i n g : E xc e l c a n a l s o b e u s e d t o m o d e l d a t a u s i n g
va r i o u s s t a t i s t i c a l t e c h n i q u e s
Probability Distributions and
Data Modelling

Linear Regression: Excel's built-in LINEST


function can be used to perform linear
regression analysis on data sets

Exponential Regression: Excel's built-in


GROWTH function can be used to perform
exponential regression analysis on data sets

Time Series Analysis: Excel has several built-


in functions for time series analysis, including
AVERAGEIF, AVERAGEIFS, and FORECAST
Sampling and Inferential statistical methods

Sampling: Sampling is the process of selecting a subset of individuals from a larger population to
study

RAND: This function generates a random number between 0 and

R A N D B E T W E E N : T h i s f u n c t i o n g e n e r a t e s a r a n d o m i n t e g e r b e t w e e n t w o s p e c i fi e d va l u e s

Inferential Statistical Methods: Inferential statistical methods are used to make inferences about a
population based on a sample

T. T E ST: T h e T. T E ST f u n c t i o n i s u s e d t o p e r f o r m a t - t e s t t o d e t e r m i n e i f t h e r e i s a s i g n i fi c a n t d i ff e r e n c e
between the means of two samples

A r r a y 1 : T h e fi r s t s e t o f d a t a

Array2: The second set of data

Ta i l s : T h e n u m b e r o f t a i l s f o r t h e t e s t

Ty p e : T h e t y p e o f t - t e s t t o p e r f o r m

C O N F I D E N C E : T h e C O N F I D E N C E f u n c t i o n i s u s e d t o c a l c u l a t e t h e c o n fi d e n c e i n t e r va l f o r a s a m p l e m e a n
Sampling and Inferential
statistical methods
A l p h a : T h e s i g n i fi c a n c e l e ve l

Standard_dev: The standard deviation of the population

Size: The sample size

Z . T E ST: T h e Z . T E ST f u n c t i o n i s u s e d t o p e r fo r m a z- t e s t
t o d e t e r m i n e i f t h e r e i s a s i g n i fi c a n t d i ff e r e n c e b e t w e e n
the means of two samples

A r ra y 1 : T h e fi r s t s e t o f d a t a

A r ra y 2 : T h e s e c o n d s e t o f d a t a

Sigma: The standard deviation of the population


Using Excel Data Analysis add in for
estimation and hypothesis testing

I n s t a l l i n g t h e D a t a A n a l y s i s A d d -I n : I f t h e D a t a A n a l y s i s
add-in is not already installed in your version of Excel, you
will need to install it

E s t i m a t i o n : T h e D a t a A n a l y s i s a d d - i n p r o v i d e s s e v e ra l
tools for estimation, including

Descriptive Statistics: This tool provides a summary of the


d a t a , i n c l u d i n g m e a s u r e s o f c e n t ra l t e n d e n c y a n d
measures of dispersion

Re g r e s s i o n : T h i s t o o l i s u s e d t o e s t i m a t e t h e r e l a t i o n s h i p
b e t w e e n t w o o r m o r e va r i a b l e s

M o v i n g Av e ra g e : T h i s t o o l i s u s e d t o e s t i m a t e t h e t r e n d i n
a t i m e s e r i e s d a t a s e t b y c a l c u l a t i n g t h e a v e ra g e o f a
c e r t a i n n u m b e r o f o b s e r va t i o n s
Using Excel Data Analysis add in for
estimation and hypothesis testing
Exponential Smoothing: This tool is used to estimate the trend
i n a t i m e s e r i e s d a t a s e t b y w e i g h t i n g t h e o b s e r va t i o n s s o t h a t
m o r e r e c e n t o b s e r va t i o n s h a ve a g r e a t e r i m p a c t o n t h e e s t i m a t e

H y p o t h e s i s Te s t i n g : T h e D a t a A n a l y s i s a d d - i n a l s o p r o v i d e s
s e ve ra l t o o l s fo r h y p o t h e s i s t e s t i n g , i n c l u d i n g

Tw o - S a m p l e A s s u m i n g E q u a l Va r i a n c e s : T h i s t e s t i s u s e d w h e n
t h e va r i a n c e s o f t h e t w o p o p u l a t i o n s a r e a s s u m e d t o b e e q u a l

Tw o - S a m p l e A s s u m i n g U n e q u a l Va r i a n c e s : T h i s t e s t i s u s e d
w h e n t h e va r i a n c e s o f t h e t w o p o p u l a t i o n s a r e a s s u m e d t o b e
unequal

Pa i r e d Tw o - S a m p l e fo r M e a n s : T h i s t e s t i s u s e d w h e n t h e t w o
s a m p l e s a r e r e l a t e d o r p a i r e d , s u c h a s i n a b e fo r e - a n d - a f t e r
study
Using Excel Data Analysis add in for
estimation and hypothesis testing

One -Way A NOVA: T his te s t is us e d w he n the r e is only


one f a ctor or inde pe nde nt va r ia ble

Tw o-Way A NOVA: T his te s t is us e d w he n the r e a r e tw o


f a ctor s or inde pe nde nt va r ia ble s

C hi-Squa r e Te s t for Inde pe nde nce : T his te s t is us e d


w he n the r e a r e tw o ca te gor ica l va r ia ble s a nd the
r e s e a r che r wa nts to de te r mine if the y a r e r e la te d or
not

C hi-Squa r e Te s t for G oodne s s of F it: T his te s t is us e d


w he n the r e s e a r che r wa nts to de te r mine if a s a mple
of da ta fi ts a s pe cifi c dis tr ibution
MODULE 5
Identify trends and patterns: Predictive
analytics can be used to analyze large amounts
of data and identify trends and patterns that
might not be apparent through manual analysis

Make accurate forecasts: Predictive analytics


can be used to make accurate forecasts about
future events, such as sales or customer
behaviour, which can help businesses plan and
prepare accordingly

Reduce risk: Predictive analytics can be used to


identify potential risks and take steps to
mitigate them, such as identifying fraudulent
transactions or predicting equipment failures
Predictive Analytics
Improve decision-making : Predictive analytics can provide businesses
with the information they need to make better decisions, such as which
products to develop, which customers to target, and which marketing
campaigns to run

Data collection : The fi rst step in the process is to collect and prepare the
data that will be used in the analysis

Data cleaning : Once the data has been collected, it must be cleaned and
preprocessed to remove any errors, outliers, or missing values

Data exploration : Af ter the data has been cleaned, it can be explored to
identify patterns, trends, and relationships between variables

Model building : Once the data has been explored, a predictive model can
Predictive
Analytics

Model evaluation: The model


must then be evaluated to
ensure that it is accurate and
reliable

Deployment: Once the model


has been evaluated, it can be
deployed in a production
environment and used to make
predictions
Statistical Model
1. Linear regression models: These models are used to
p r e d i c t t h e va l u e o f a c o n t i n u o u s d e p e n d e n t va r i a b l e
b a s e d o n o n e o r m o r e i n d e p e n d e n t va r i a b l e s
2. Logistic regression models : These models are
used to predict the probability of an event
occurring based on one or more independent
variables
3. Time series models: These models are
used to predict future values of a variable
based on its past values

Statistical Model
4. Multilevel models : These models are used to
analyze data that have a hierarchical structure,
such as data from individuals within groups

Statistical
Model
4. Bayesian models: These models are used to
incorporate prior knowledge or beliefs into the
Statistical statistical analysis

Model
Explaining relationships between variables :
Statistical models can help identify the
relationships between variables and explain how
they are related
Statistical Predicting outcomes: Statistical models can be
used to predict future outcomes based on historical
Model data

Evaluating the eff ectiveness of interventions :


purposes Statistical models can be used to evaluate the
eff ectiveness of interventions or treatments by
comparing outcomes in diff erent groups

Testing hypotheses: Statistical models can be used


to test hypotheses about the relationships between
variables
Inference about Regression
Coeffi cient
I n a l i n e a r r e g r e s s i o n m o d e l , t h e r e g r e s s i o n c o e ffi c i e n t i s a
measure of the strength and direction of the relationship
b e t w e e n t h e i n d e p e n d e n t va r i a b l e a n d t h e d e p e n d e n t
va r i a b l e

T h e i n f e r e n c e a b o u t t h e r e g r e s s i o n c o e ffi c i e n t i s b a s e d o n
t h e h y p o t h e s i s t e s t i n g f ra m e w o r k , w h e r e t h e n u l l h y p o t h e s i s
i s t h a t t h e r e g r e s s i o n c o e ffi c i e n t i s e q u a l t o z e r o , a n d t h e
alternative hypothesis is that it is not equal to zero

T h e t- t e s t c a l c u l a t e s t h e t- s t a t i s t i c , w h i c h m e a s u r e s t h e
d i ff e r e n c e b e t w e e n t h e e s t i m a t e d r e g r e s s i o n c o e ffi c i e n t a n d
t h e h y p o t h e s i z e d va l u e , r e l a t i v e t o t h e s t a n d a r d e r r o r o f t h e
estimate
• Diffi culty in interpreting the coeffi cients :
When independent variables are highly
correlated, it becomes diffi cult to interpret
the coeffi cients of the regression model, as
the eff ects of one independent variable
cannot be distinguished from the eff ects of
the other independent variables

• Instability of coeffi cients : When there is


multicollinearity, the coeffi cients in the
regression model can be highly unstable and
can change drastically when the data set is
modifi ed

• Reduced precision of coeffi cients :


Multicollinearity can lead to imprecise
estimates of the regression coeffi cients,
which reduces the accuracy of the
predictions made by the model
Multicollinearity
• Re duce d s ta tis tica l s ignifi ca nce : Whe n the r e is
multicolline a r ity, the s ta nda r d e r r or s of the
r e gr e s s ion coe ffi cie nts te nd to be high, w hich ca n
le a d to r e duce d s ta tis tica l s ignifi ca nce

• Re move one of the highly cor r e la te d inde pe nde nt


va r ia ble s f r om the mode l

• C ombine the highly cor r e la te d inde pe nde nt


va r ia ble s into a s ingle va r ia ble

• U s e r e gula r iz a tion te chnique s , s uch a s r idge


r e gr e s s ion or la s s o r e gr e s s ion, w hich ca n he lp
mitiga te the e ff e cts of multicolline a r ity
Stepwise Regressions
Ste pw is e r e gr e s s ion is a s ta tis tica l me thod us e d to
ide ntify the mos t impor ta nt pr e dictor s of a de pe nde nt
va r ia ble by s e que ntia lly a dding or r e moving
inde pe nde nt va r ia ble s f r om a r e gr e s s ion mode l

T he goa l is to cr e a te a mode l tha t be s t e xpla ins the


va r ia nce in the de pe nde nt va r ia ble us ing the fe w e s t
pos s ible inde pe nde nt va r ia ble s

By us ing s te pw is e r e gr e s s ion, the r e s e a r che r is a ble


to ide ntify the mos t impor ta nt pr e dictor s of hous e
pr ice a nd cr e a te a mode l tha t be s t pr e dicts the pr ice
of a hous e ba s e d on the s e va r ia ble s
The Partial F-test
• The Par tial F test is a statistical test used in
regression analysis to determine the
signifi cance of individual regression coeffi cients
in a multiple regression model

• It is also known as the Type III sum of squares


test

• The null hypothesis of the Par tial F test is that


the coeffi cient of a par ticular independent
variable in the multiple regression model is
equal to zero, implying that this variable does
not have a signifi cant impact on the dependent
variable

• Thus, we conclude that independent variable X2


does not have a signifi cant impact on the
dependent variable Y in this multiple regression
model
Outliers

In st a t ist ic s, a n out lie r is a da t a poin t t h a t is


sign ifi c a nt ly diff e re nt f rom ot h e r da t a point s in
a da t a se t

Outlie rs c an oc c ur du e to me a su re me nt e rrors,
da ta e n t r y e rrors , or due to n a t ura l va ria t ion in
the da t a

Outlie rs c an be ide nt ifi e d u sing va rious


me thods, su c h a s vis ua l in spe c t ion of a
boxplot , c a lc ula t in g z-s c ore s , or u sing
sta tist ic a l te s t s s uc h a s t h e G ru bbs ' te st or t he
Dixon 's te s t
Violation of Regression Assumptions
Non-linearity: Regression assumes a linear relationship between the
dependent variable and the independent variables

Heteroscedasticity: Regression assumes that the variance of the


errors is constant across all levels of the independent variables

Autocorrelation: Regression assumes that the errors are independent


of each other

Multicollinearity: Regression assumes that the independent variables


are not highly correlated with each other

Outliers: Regression assumes that the errors are normally distributed


and have constant variance
Multiple Regression
Select the independent variables : In this case, we
have three independent variables - height, age,
and gender

Formulate the multiple regression equation : The


multiple regression equation is of the form

Estimate the regression coeffi cients : We use a


method called least squares to estimate the
regression coeffi cients

Evaluate the model : We can evaluate the model


using measures such as R-squared, adjusted R-
squared, and the F-test

Make predictions: Once we have a satisfactory


model, we can use it to make predictions on new
data
Interpretation of
Standard error of
estimate and R Square

Standard error of estimate : The


standard error of estimate is a
measure of the variability or scatter
of the actual values around the
predicted values

R-squared : R-squared is a measure of


the propor tion of variability in the
dependent variable that is explained
by the independent variables in the
regression model
Multiple
Regression
Modelling Possibilities
Linear regression : This is a common modeling technique
used to predict a continuous dependent variable based
on one or more independent variables

Logistic regression : This is a type of regression used to


model the probability of a binar y outcome based on one
or more independent variables

Time series analysis : This is a modeling technique used


to analyze data that changes over time

Decision trees : This is a type of modeling technique


used to identify patterns in data and make predictions

Clustering : This is a modeling technique used to group


together similar data points based on their
characteristics

Neural networks : This is a type of modeling technique


that uses a set of interconnected nodes to simulate the
function of a biological brain
Modelling
Possibilities

Bayesian modeling : This is a


statistical modeling
technique that involves
updating prior knowledge
about a system with new
data to make predictions
about future outcomes
Validation of Fit
Va l i d a t i o n o f fi t i s a n i m p o r t a n t s t e p i n a n y m o d e l i n g p r o c e s s

I t i n vo l ve s c h e c k i n g w h e t h e r t h e m o d e l i s a g o o d fi t fo r t h e d a t a a n d w h e t h e r i t i s a b l e t o
g e n e ra l i z e w e l l t o n e w d a t a

T h e fi r s t s te p i n va l i d a t i o n o f fi t i s t o s p l i t t h e d a t a i n t o t ra i n i n g a n d t e s t i n g s e t s

T h e r e fo r e , i t ' s i m p o r t a n t t o p e r fo r m f u r t h e r va l i d a t i o n o f fi t u s i n g t e c h n i q u e s l i ke c r o s s -
va l i d a t i o n , w h i c h i n vo l ve s s p l i t t i n g t h e d a t a i n t o m u l t i p l e t ra i n i n g a n d t e s t i n g s e t s a n d
e va l u a t i n g t h e m o d e l ' s p e r fo r m a n c e o n e a c h s e t

I n s u m m a r y, va l i d a t i o n o f fi t i s a n i m p o r t a n t s t e p i n m o d e l i n g t h a t i n vo l ve s c h e c k i n g w h e t h e r
t h e m o d e l i s a g o o d fi t fo r t h e d a t a a n d w h e t h e r i t i s a b l e t o g e n e ra l i z e w e l l t o n e w d a t a
Binomial Logistic Regression and Multinomial Logistic
Regression

Binomial logistic regression : Suppose we have data on the admission


status of students to a par ticular university. YES or NO

Multinomial logistic regression : Suppose we have data on the type of


car purchased by customers at a car dealership
MODULE 6
Time Series Analysis
• Time series analysis is a statistical
technique that deals with data collected
over time

• Time series data can be found in various


fi elds, including fi nance, economics,
engineering, and the natural sciences

• Trend analysis involves identifying long-


term patterns in the data, while seasonal
analysis looks for recurring patterns that
occur at fi xed intervals, such as ever y year
or ever y month
Time Series Vs Regression
• Time series analysis and regression analysis are
both statistical techniques used to analyze data,
but they diff er in several ways

• Time series analysis is used when the data being


analyzed is collected over time, whereas
regression analysis is used when the data
consists of variables that are not necessarily
related to time

• Time series analysis is suited for data collected


over time, where identifying patterns and trends
is impor tant, while regression analysis is useful
for understanding relationships between
variables, regardless of whether they are related
to time
Components – Predictable, Unpredictable, Local Global, Trend,
Seasonality
• Predictable: Local predictable par t consists of auto
regressive behaviour. It shows how time series is
infl uenced by its immediate past

If we know few time stamps immediately preceding time


we can predict the time series values.
Example: Give yesterday ’s price we can predict
today ’s price but not price af ter a year.

• Local: The local component represents the short-


term fl uctuations in the data that are
unpredictable and are not related to the trend or
seasonality
• Global: In Global predictable part the present
time series does not depend on the immediate
past.

• Example: Temperature in January does not


depend on the December or November
temperature.

This Global predictable part consists of two parts


Trend and Seasonality.

1. Trend: Trend refers to any pattern that talks


about the overall increase or decrease in the
values.

2. Seasonality: Seasonality refers to a repeating


pattern of values seen in the data.
Note: Trend and Seasonality can both appear in
Additive & Multiplicative
models
Additive and multiplicative models are two common
approaches to modeling the trend and seasonality
components of time series data

Both additive and multiplicative models have their


strengths and weaknesses, and the choice between them
depends on the specifi c characteristics of the data and the
goals of the analysis

Additive models are of ten preferred when the magnitude


of the seasonal fl uctuations is relatively constant over
time, while multiplicative models are more appropriate
when the seasonal fl uctuations are propor tional to the
overall level of the time series
No w clearly this d ata has b oth typ es o f p atterns, it
has trend as overall p attern suggests an increase in
sales. Ho w ever the sales are also seasonal, as a
sim ilar up and d o w n p attern is rep eating itself
ever y 12 m onths.
Diff erent ways in w hich these co m p onents can b e
related to each o ther are
Additive m odel: When the m agnitud e o f the
seaso nal p attern in the d ata d oes no t d irectly
co rrelate w ith the value o f the series. In A d d itive
m o d el all the co m p o nents are ad d ed

Characteristics :

Suitab le w hen the seaso nal variatio ns are ro ughly


co nstant over tim e.

Co m p o nents are ad d ed to gether.

E xam ple : Sup p ose m o nthly sales d ata show s a


stead y up ward trend and a seaso nal eff ect that
ad d s a fi xed increase ever y Decem b er d ue to
Multiplicative model: When the magnitude of the
seasonal pattern in the data increases with an increase
in data values and decreases with a decrease in the
data values. In Multiplicative model all the components
are multiplied

Characteristics:

Suitable when the seasonal variations increase or


decrease proportionally with the level of the time
series.

Components are multiplied together.

Example: Suppose monthly sales data shows an


increasing trend, and the seasonal eff ect is
proportional, meaning sales double every December
compared to other months.
Cyclicity, Seasonality, Stationary, Noise
Cyclicity refers to fl uctuations in a time series that occur over a longer
period than the seasonal fl uctuations

Seasonality refers to fl uctuations in a time series that occur at regular


inter vals

Stationarity is an impor tant proper ty of many time series models. Stationar y


time series will have no long-term predictable patterns such as trends or
seasonality. Time plots will show the series to roughly have a horizontal
trend with the constant variance.
First if we take a time series we will model the
Global trend and seasonality and then remove
them from the time series and it will result in a
weakly stationary time series.

Noise: It refers to the random, unpredictable


fl uctuations in a time series that cannot be
explained by any known factors
Single and Double Exponential Smoothing

Single and double exponential smoothing are time series forecasting


techniques that use a moving average to predict future values of a
time series

These methods are also known as Holt’s method, named after


Charles Holt who developed the technique in the 1950 s
Single Exponential Smoothing
S i n g l e E x p o n e n t i a l S m o o t h i n g i s a fo r e c a s t i n g m e t h o d
t h a t u s e s a w e i g h t e d a ve ra g e o f p a s t o b s e r va t i o n s t o
m a ke p r e d i c t i o n s a b o u t f u t u r e o b s e r va t i o n s

The method is called 'single' because it uses only one


s m o o t h i n g p a ra m e t e r o r a l p h a t o c a l c u l a t e t h e w e i g h t e d
a ve ra g e o f p a s t o b s e r va t i o n s

T h e fo r m u l a fo r s i n g l e e x p o n e n t i a l s m o o t h i n g i s g i ve n
b e l o w : w h e r e , F t + 1 = Fo r e c a s t fo r t h e n e x t t i m e p e r i o d
Y t = A c t u a l va l u e fo r t h e c u r r e n t t i m e p e r i o d F t =
Fo r e c a s t fo r t h e c u r r e n t t i m e p e r i o d α = S m o o t h i n g
p a ra m e t e r T h e s m o o t h i n g p a ra m e t e r d e t e r m i n e s t h e
w e i g h t g i ve n t o t h e m o s t r e c e n t o b s e r va t i o n
Double Exponential Smoothing
Double Exponential Smoothing is an extension of Single
Exponential Smoothing that can handle trends in the data

T h i s m e t h o d u s e s t w o s m o o t h i n g p a ra m e t e r s o r a l p h a a n d
beta to calculate the forecast

The formula for Double Exponential Smoothing is given


b e l o w : w h e r e , F t + 1 = Fo r e c a s t f o r t h e n e x t t i m e p e r i o d Y t =
A c t u a l va l u e f o r t h e c u r r e n t t i m e p e r i o d F t = Fo r e c a s t f o r t h e
c u r r e n t t i m e p e r i o d T t = Tr e n d c o m p o n e n t f o r t h e c u r r e n t
t i m e p e r i o d α = S m o o t h i n g p a ra m e t e r f o r t h e l e v e l
c o m p o n e n t β = S m o o t h i n g p a ra m e t e r f o r t h e t r e n d
c o m p o n e n t T h e fi r s t e q u a t i o n c a l c u l a t e s t h e f o r e c a s t u s i n g
the same approach as Single Exponential Smoothing
Autocorrelation Function

The autocorrelation function is a mathematical tool used to measure


the correlation between a time series and its lagged versions

The ACF is of ten plotted as a function of the lag k, with the ACF
values on the y-axis and the lag values on the x-axis

The ACF is useful for identifying patterns and dependencies in time


series data
AR Model
Moving Averages model
A movi ng ave ra g e s mod e l i s a typ e of ti me s e r i e s
mod e l us e d for for e ca s t i ng

It a s s ume s tha t t he val ue of a va r i a b l e a t a g i ve n


t i me p oi nt i s a l i ne a r comb i na t i on of p a s t e r r or te r ms

T he e r r or t e r ms r e p r e s e nt the d i ff e r e nce b e t w e e n the


a ctua l val ue of t he t i me s e r i e s and i t s p r e d i ct e d va l ue
A n A R M A m o d e l i s a t y p e o f t i m e s e r i e s m o d e l u s e d fo r
fo r e c a s t i n g

An ARMA model combines these two approaches by


m o d e l i n g t h e r e l a t i o n s h i p b e t w e e n a va r i a b l e a n d i t s

ARMA Model
p a s t va l u e s , a s w e l l a s t h e r e l a t i o n s h i p b e t w e e n a
va r i a b l e a n d t h e e r r o r t e r m s o f a m ov i n g a ve ra g e m o d e l

T h e g e n e ra l fo r m o f a n A R M A m o d e l i s : T h e A R M A m o d e l
is a popular tool in time series analysis because it can
capture both the trend and seasonal patterns in the
d a t a , a n d i t c a n b e u s e d t o m a ke fo r e c a s t s fo r f u t u r e
time periods
ARIMA Model
ARIMA is a type of time series model used for
forecasting

ARIMA models are widely used in time series


analysis because they can capture the complex
patterns and trends in time series data,
including seasonality and irregularities
GARCH Model
G A R C H is a type of time s e r ie s mode l us e d for
for e ca s ting vola tility in fi na ncia l ma r ke ts

It is a n e xte ns ion of the A R C H mode l, w hich wa s


de ve lope d to a ccount for the f a ct tha t the
va r ia nce of fi na ncia l r e tur ns is not cons ta nt ove r
time , but ra the r va r ie s w ith the le ve l of the
r e tur ns

T he G A R C H mode l is de s igne d to ca ptur e the


pe r s is te nce of vola tility in fi na ncia l r e tur ns by
mode ling the conditiona l va r ia nce of the r e tur ns
a s a f unction of the ir pa s t va lue s a nd the pa s t
va r ia nce s

You might also like