0% found this document useful (0 votes)
1 views

Business Analytics DGV

The document provides an overview of business analytics, focusing on statistics and various types of data analysis including descriptive, diagnostic, predictive, and prescriptive analytics. It categorizes data into quantitative (discrete and continuous) and qualitative (nominal and rank) types, and discusses different sampling techniques and potential errors in sampling. Additionally, it covers measures of central tendency and dispersion, essential for understanding data distributions.

Uploaded by

priya24laasya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Business Analytics DGV

The document provides an overview of business analytics, focusing on statistics and various types of data analysis including descriptive, diagnostic, predictive, and prescriptive analytics. It categorizes data into quantitative (discrete and continuous) and qualitative (nominal and rank) types, and discusses different sampling techniques and potential errors in sampling. Additionally, it covers measures of central tendency and dispersion, essential for understanding data distributions.

Uploaded by

priya24laasya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 104

Business Analytics

Statistics and Analytics 2

Statistics is the mathematics of estimating parameters of


populations based on data from representative samples of those
populations

Analytics is broad term, which may refer to almost any type of data
analysis, especially statistical analysis, data mining and machine
learning.
Analytics 3

Analytics is a generic word without a specific


meaning that can apply to virtually any form of
data analysis, especially statistical analysis, data
mining, and artificial intelligence
4
Analytics divided in to four levels 5

▪ Descriptive
▪ Diagnostic
▪ Predictive
▪ Prescriptive
Scales of measurement 6

• They are ordered with their


increasing
• Accuracy
• Powerfulness of
measurement
• Preciseness
• Wide application of
statistical techniques
Scales of measurement 7
Different types of data 8

In statistics, data are classified into two broad categories:

• Quantitative / Numerical • Qualitative / Categorical


– Discrete – Nominal

– Continuous – Rank
Quantitative / Numerical data 9

Continuous data represent the numerical values of a continuous variable. A continuous


variable is the one that can assume any value between any two points on a line
segment, thus representing an interval of values.

All characteristics such as weight, length, height, thickness, temperature, Time of arrival,
Marks scored, etc., represent continuous variables

It may be noted that a continuous variable assumes the finest unit of measurement.
Finest in the sense that it enables measurements to the maximum degree of precision.
Quantitative / Numerical data 10

Discrete data are the values assumed by a discrete variable. A discrete variable is the
one whose outcomes are measured in fixed numbers. Such data are essentially count
data.

These are derived from a process of counting, such as the number of items possessing
or not possessing a certain characteristic.

The number of orders received in a year, number of defective products produced,


number of employees left the organization last year, Number of people passed the
exam are all examples of discrete data.
Qualitative / Categorical 11

Nominal data are the outcome of classification into two or more categories of items or
units comprising a sample or a population according to some quality characteristic.

Classification of Customers according to geography, students according to sex,


Workers according to skill (as skilled, semi-skilled, and unskilled), employees according
to the level of education (as undergraduates, and post-graduates), all result into
nominal data.

Given any such basis of classification, it is always possible to assign each item to a
particular class and make a summation of items belonging to each class. The count
data so obtained are called nominal data.
Qualitative / Categorical 12

Rank data, on the other hand, are the result of assigning ranks to specify order in terms
of the integers 1,2,3, ..., n. Ranks may be assigned according to the level of
performance in a test. a contest, a competition, an interview, or a show.

The candidates appearing in an interview, for example, may be assigned ranks in


integers ranging from I to n, depending on their performance in the interview. Ranks so
assigned can be viewed as the values of a variable involving performance as the
quality characteristic.
Classification of data 13

• Univariate

• Bivariate

• Multivariate
Univariate data 14

Univariate Data is data that concerns only one variable. The data
concerning the Weights of a Finance class of 30 students presented in the
following table is an example of Univariate Data

Individual Weight (kg)


1 75
2 71
3 82

30 78
Bivariate data 15

Bivariate Data is data concerning only two variables. Continuing with our
earlier example, if we add the Height of each student along with his/her
Weight, it will be a bivariate data.

Individual Weight (kg) Height (cms)


1 75 172
2 71 169
3 82 174
-
-
30 78 176
Multivariate data 16

Multivariate Data is data concerning more than two variables.

Individual Weight (kgs) Height(cms) Marks in a Exam(max 100) Gender

1 75 172 80 Male
2 71 169 75 Male
3 82 174 82 Female

30 78 176 69 Male
Data Sources 17

Data sources could be seen as of two types, viz., secondary and primary.

1. Secondary data

• They already exist in some form: published or unpublished - in an


identifiable secondary source. They are, generally, available from
published source(s), though not necessarily in the form actually required.

• Examples: Customer data available in ERP, Satisfaction scores of previous


years, Consumer price index data of last 10 years available in government
website.
Data Sources 18

Data sources could be seen as of two types, viz., secondary and primary…

2. Primary data

• Those data which do not already exist in any form, and thus have to be
collected for the first time from the primary source(s). By their very nature,
these data require fresh and first-time collection covering the whole
population or a sample drawn from it.

• Examples: Customer satisfaction survey, Market research for a new


product, Real-time performance data of sales team etc.,
Data Collection methods 19

• Surveys

• Focus Group Discussions

• Interviews

• Experiments

• Observations
Classification of Sampling techniques 20

Sampling Techniques

Nonprobability Probability
Sampling Techniques Sampling Techniques

Convenience Judgmental Quota Snowball


Sampling Sampling Sampling Sampling

Simple Random Systematic Stratified Cluster Other Sampling


Sampling Sampling Sampling Sampling Techniques
Non Probability sampling 21

Convenience sampling attempts to obtain a sample of convenient


elements. Interviews are conducted at locations/places where our target
population is likely to be.

 Interview at shop front

 Street corner interviews

 In one or two friendly neighborhood


Non Probability sampling 22

Judgmental sampling is a form of convenience sampling in which the


population elements are selected based on the judgment of an Expert.

The judgment of the expert is about the appropriateness of the


respondent unit for the purpose of the study
Non Probability sampling 23

Quota sampling is judgmental sampling with the constraint that the sample
includes a minimum number of specified sub-groups.
Non Probability sampling 24

In snowball sampling, an initial group of respondents is selected, usually at


random.

After being interviewed, these respondents are asked to identify others


who belong to the target population of interest.

Subsequent respondents are


selected based on the referrals

 Commonly used for low penetration products, High value products


(e.g. Club members, Audi owners)
Probability sampling – Simple Random 25

 Each element in the population has a known and equal probability of


selection.

 Each possible sample of a given size (n) has a known and equal probability
of being the sample actually selected.

Random number generation is a common


method to achieve Simple Random Sampling
Probability sampling – Systematic sampling 26

 The sample is chosen by selecting a random starting point and then picking
every ith element in succession from the sampling frame.

The sampling interval, i, is determined by dividing


the population size N by the sample size n and
rounding to the nearest integer.

 For example, there are 100,000 elements in the population and a sample of
1,000 is desired. In this case the sampling interval, i, is 100. A random number
between 1 and 100 is selected. If, for example, this number is 64, the sample
consists of elements 64, 164, 264, 364, 464, 564, and so on.
Probability sampling – Stratified sampling 27

 A two-step process in which the population is first partitioned into subpopulations, or


strata, and then a sample is drawn from each stratum.

Rules Of Strata

The strata should be mutually exclusive and collectively


exhaustive in that every population element should be
assigned to one and only one stratum and no
population elements should be omitted

The elements within a stratum should be as


homogeneous as possible, but the elements in different
strata should be as heterogeneous as possible.

The stratification variables should also be closely related


to the characteristic under study.
Probability sampling – Stratified sampling 28

A two-step process in which the population is first partitioned into subpopulations, or


strata,
The second step involves selecting elements from each
stratum by a random procedure, usually SRS, or
systematic sampling

In proportionate stratified sampling, the size of the


sample drawn from each stratum is proportionate to
the relative size of that stratum in the total population.

In disproportionate stratified sampling, the size of the


sample from each stratum is proportionate to the
standard deviation of the distribution of the
characteristic of interest among all the elements in that
stratum. The more homogenous a stratum is the lesser
the sample size required in that stratum.
Probability sampling – Cluster Sampling 29

 The target population is first divided into mutually exclusive and collectively
exhaustive subpopulations, or clusters.

 In stage 2 a random sample of clusters is selected, based on a probability


sampling technique such as SRS.

 In stage 3 ,for each selected cluster, either all the elements are included in
the sample (one-stage) or a sample of elements is drawn probabilistically
(two-stage).
Probability sampling – Cluster Sampling 30

The target population is first divided into mutually exclusive and collectively
exhaustive subpopulations, or clusters.

Cluster Rules

Population Elements within a cluster should be as heterogeneous as possible, but


clusters themselves should be as homogeneous as possible.

Clusters
Ideally, each cluster should be a small-scale representation of
the population.

In stage 2 a random sample of clusters is selected, based on a


Samples probability sampling technique such as SRS.

In stage 3 ,for each selected cluster, either all the elements


are included in the sample (one-stage) or a sample of
elements is drawn probabilistically (two-stage).
Sampling Error
31

◼ Sampling errors:
❖ Faulty selection of sample – This may be due to defective sampling technique.
Purposive or Judgment sampling, in which reseacher deliberately selects non-
representative sample

❖ Substitution – Sometimes an investigator substitutes a convenient member of


population

❖ Faulty demarcation of sampling units

❖ Variability of the population


Sampling Error
32

◼ Non -Sampling errors or Bias:


❖ This may be due to human factors which always varies from
one investigator to other
❖ Negligence and carelessness

❖ Faulty planning of sampling

❖ Faulty selection of sample units

❖ Error in compilation

❖ Wrong statistical measure


Sampling Error
33
The error arising from drawing inferences on the basis of observations
on a part (sample) is termed as Sampling Error. It decreases with
increase in sample size. Normally, after certain stage, increase in
sample size does not result in substantial reduction in error. The
optimum sample size is worked out based on this behaviour, taking
into account the required precision and cost consideration.
Error

Sample Size
Branches of Statistics 34
35
Statistics 36

Descriptive Statistics
Mean 490.8
Standard Error 6.542348114
Median 475
Mode 450
Standard Deviation 54.73721146
Sample Variance 2996.162319
Kurtosis -0.334093298
Skewness 0.924330473
Range 190
Minimum 425
Maximum 615
Sum 34356
Count 70
Types of measures 37

◼ Measures of Central Tendency

◼ Measures of Dispersion

◼ Measures of asymmetry (skewness)

◼ Measures of relationship
Measures of Central tendency 38

◼ Measures of Central Tendency


◼ Describes the center position of the data

◼ Mean, Median, Mode


Measures of Central tendency 39
Arrange the
data in
65 100 69 91 72 85 72 84 75 descending
order

100 91 85 84 75 72 72 69 65

Mode
Median

Mean = 79.22
What measure to use: Mean, Median, Mode
40

* Mode may not be a good representation if the data set is not normal
Measures of Dispersion 41

◼ Measures of Dispersion
◼ Describes the spread of the data (how
scores are scattered or dispersed)

◼ Range, Variance, Standard deviation,


Interquartile Range
Measures of Dispersion 42

Range
◼ The range is calculated by taking the maximum value and
subtracting the minimum value.

2 4 6 8 10 12 14 Range = 14 - 2 = 12

◼ If we divide range or spread of scores into four equal parts, these are called
“quartiles”

◼ When we divide range into 10 equal parts, these are called “deciles”

◼ When we divide range into 100 equal parts, these are called “percentiles”
Measures of Dispersion 43

Percentiles
Arrange the data in ascending order.

Compute index i, the position of the pth percentile.

i = (p/100)n

If i is not an integer, round up. The p th percentile


is the value in the i th position.

If i is an integer, the p th percentile is the average


of the values in positions i and i +1.
Measures of Dispersion 44

80th Percentile
◼ Example: Apartment Rents
i = (p/100)n = (80/100)70 = 56

80th Percentile = 56th data = 535

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.


Measures of Dispersion 45

Quartiles

◼ Quartiles are specific percentiles.


◼ First Quartile = 25th Percentile
◼ Second Quartile = 50th Percentile = Median
◼ Third Quartile = 75th Percentile
Measures of Dispersion 46

Third Quartile
◼ Example: Apartment Rents
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
Measures of Dispersion 47

IQR
◼ Example: Apartment Rents
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.


Identifying Outliers 48

The lower limit is located 1.5 (IQR) below Q1

Lower limit: Q1 – 1.5 (IQR) = 445 – 1.5 (80)


= 445 – 120
= 325

The upper limit is located 1.5 (IQR) above Q3

Upper limit: Q3 + 1.5 (IQR) = 525 + 1.5 (80)


= 525 + 120
= 645

There are no outliers (values less than 325 and


above 645) in the data given
Identifying Outliers & Box whisker plot 49

Activity: Excel, demo

Construct a Box Whisker plot


for the data given in excel
file and identify the outliers
IQR 50

The interquartile range or the quartile deviation is a better


measure of variation in a distribution than the range. Here,
avoiding the 25 percent of the distribution at both the ends
uses the middle 50 percent of the distribution

Many times the interquartile range is reduced in the form of semi-


interquartile range or quartile deviation as shown below:
Semi interquartile range or Quartile deviation = (Q3 – Ql)/2
IQR 51

When quartile deviation is small, it means that there is a small deviation


in the central 50 percent items. In contrast, if the quartile deviation is
high, it shows that the central 50 percent items have a large variation. It
may be noted that in a symmetrical distribution, the two quartiles, that
is, Q3 and QI are equidistant from the median.

Since it is not influenced by the extreme values in a distribution, it is


particularly suitable in highly skewed or erratic distributions.
Measures of Dispersion 52

Standard Deviation
A measure of how widely the data points tend to diverge from the mean. A small
standard deviation indicates most values are close to the mean, and a large
standard deviation indicates they are much more or much less than the mean. The
basic idea is that you’d like to sum up how different the individual data points are
from the average. You could just sum up the individual differences, but what about
the fact that some are less than the mean and others are greater? That would tend
to make them cancel out. The way to get around that is to square the differences,
because any time you square a number, the result is positive. Later, after we have
added them together, we take a square root, to reduce the value down to
something more manageable and reasonable
Measures of Dispersion 53
Measures of Dispersion 54

If the n observation in a sample are denoted by x1 , x2, x3


………………. Xn Then the variance is given by

n 2
 (xi x)
S2 = i =1
n 1
The Standard deviation, S is the positive square root of the variance.
Measures of Dispersion 55

1) If the Standard deviation is a relatively small number, the


variability in the data about the average is small i.e. the data will
be more nearly clustered near the average.

2) if the standard deviation is a relatively large number, there is


more variability in the data and the data will be spread out more
i.e. the data will be more away from the average.
Types of Distributions 56

◼ Discrete theoretical distributions


◼ Binomial distribution
◼ Poisson distribution
◼ Rectangular distribution
◼ Multinomial distribution etc.,

◼ Continuous theoretical distributions


◼ Normal distribution
◼ Students t-distribution
◼ Chi-square distribution
◼ F-distribution
Histogram 57
Normal Distribution 58
Properties of Normal Distribution:

μ + 2σ

μ + 3σ
μ - 3σ

μ - 2σ

μ+σ
μ-σ

μ
1) Normal Distribution is completely designated by two
parameters (μ and σ)
2) μ used for location and σ for spread.
3) Normal curve is bell shaped.
Normal Distribution 59

Properties of Normal Distribution (Contd):

4) Normal distribution is symmetric around μ.


5) Normal distribution extends from -∞ to ∞. But for all practicable
purposes the region μ - 3σ to μ + 3σ covers most of the distribution.
6) The mean, mode and median will be the same for this curve.
7) Area under the normal curve is equal to 1
8) 68.26% of values lie within 1σ limits.


9) 95.44% of values lie within 2σ limits.

10)99.73% of values lie within 3σ limits.
Normal Distribution 60
Non Normal Distribution 61
Asymmetry 62

◼ Measures of asymmetry
◼ When the distribution of item in a series happens to be perfectly
symmetrical, we have normal distribution. Such a curve is perfectly bell
shaped. But if the curve is distorted (whether on the right side or on the
left side) we have asymmetrical distribution which indicates that there is
skewness.

◼ If the curve is distorted on the towards left we have negative skewness


and vice versa
Asymmetry 63

◼ Skewness is, thus, a measure of asymmetry and shows the manner in


which the items are clustered around the average

◼ The difference between the mean, median or mode provides an


easy way of expressing skewness in a series
Asymmetry 64

An important measure of the shape of a distribution is called skewness.

The formula for the skewness of sample data is

 xi − x 
3
n
Skewness =  
(n − 1)(n − 2)  s 

Skewness can be easily computed using statistical software.


Asymmetry 65

Symmetric (not skewed)


• Skewness is zero.
• Mean and median are equal.
Skewness = 0
.35
.30
Relative Frequency

.25
.20
.15
.10
.05
0
Parametric Test 66

Parametric statistics make assumptions (such as normality) about the population


values (called parameter)

For example, one assumption for the one way ANOVA is that the data comes from a
normal distribution. If your data isn’t normally distributed, you can’t run an ANOVA,
but you can run the nonparametric alternative—the Kruskal-Wallis test.
Non parametric Test 67

A non parametric test (sometimes called a distribution free test) does not assume
anything about the underlying distribution (for example, that the data comes from
a normal distribution). That’s compared to parametric test, which makes assumptions
about a population’s parameters (for example, the mean or standard deviation);
When the word “non parametric” is used in stats, it doesn’t quite mean that you
know nothing about the population. It usually means that you know the population
data does not have a normal distribution.
Non parametric vs Parametric Test 68

NONPARAMETRIC TEST PARAMETRIC ALTERNATIVE

1-sample sign test One-sample Z-test, One sample t-test


1-sample Wilcoxon Signed Rank test One sample Z-test, One sample t-test
Friedman test Two-way ANOVA
Kruskal-Wallis test One-way ANOVA

Mann-Whitney test Independent samples t-test

Mood’s Median test One-way ANOVA

Spearman Rank Correlation Correlation Coefficient


Test of Hypothesis 69

What is hypothesis:
Ordinarily, when one talks about hypothesis, one simply means mere
assumption or some supposition to be proved or disproved.

But for researcher hypothesis is a formal question that he intends to resolve.


Basic concepts on testing hypothesis 70

1. Null and alternative hypothesis:


If we are to compare method A with method B about its superiority and if
we proceed on the assumption that both are equally good, then this
assumption is termed as the null hypothesis. As against this, we may think
that method A is superior or method B is inferior, we are then stating what is
termed as alternative hypothesis

Null hypothesis is generally symbolized as Ho


Alternative hypothesis as Ha
Basic concepts on testing hypothesis 71

The null hypothesis states that there is no difference between


groups or no relationship between variables.

When your sample contains sufficient evidence, you can reject the null
and conclude that the effect is statistically significant. Statisticians often
denote the null hypothesis as H0 or HA.

• Null Hypothesis H0: No effect exists in the population.


• Alternative Hypothesis HA: The effect exists in the population.
Understanding variables 72

Variables can be thought of as ‘‘fields’’ of data, or individual pieces of data;

typically, they are the columns on a spreadsheet. All of those columns—age;

gender; years of service; education; performance score A, B, or C; and so on—

are the variables. Why call them variables? Because they are not constants—the

data for each of these variables vary for each case (think of a case as a person).

Breaking the mass of data down into variables is a first step in getting a handle on

the information that likely sits before you.


Categorical Vs Continuous variables 73

Variables come in different types. The simplest way to break them down is to

determine whether they are categorical or continuous. As the name implies,

categorical variables are made up of types, classes, or categories. Think about

the variable ‘‘gender.’’ It has two categories—male and female.

Continuous variables, on the other hand, are numbers or numerical. Anything you

would report as a number is a continuous variable. Years of service could be one

continuous variable; numeric age would be another.


Dependent & Independent variables 74
Dependent & Independent variables 75

Dependent variable (Y) Independent Variable (X)


Company performance X1. Top management support
X2. Cross functional team work
X3. NPD process
X4. NPD strategies
X5. Market research activities

Dependent variable (Y) Independent Variable (X)


Sales X1. Brand perception
X2. Promotional activities
X3. Competition
X4. Price
X5. Quality
X6. Salesperson competency
How Variables relate to each other? 76

Well, that’s the central question. If you can figure out how variables

relate to each other, you can gain greater understanding of the

way things in the system under examination work. There are two

major ways to think about how variables relate to each other:

interdependently and dependently.


Measures of Relationship 77

◼ Measures of relationship
◼ Describes the relationship of the data

◼ Karl Pearson’s coefficient of correlation

◼ Spearman’s Rank order correlation


Measures of Relationship 78

Training Productivity (units /


Year Program
Mandays labour hr)
2010 - 0 19
2011 Awareness training 50 19
2012 LPS training to first batch 100 20
2013 LPS training to second batch 100 21
2014 LPS training to third batch 150 22
2015 LPS training to fourth batch 150 23
2016 Advanced program 200 26
2017 Advanced program 200 28
2018 Coaching, handholding & Refresher progam 250 31
2019 Coaching, handholding & Refresher progam 250 35
Measures of Relationship 79

Karl Pearson’s Correlation coefficient


X Y x=X-X y =Y-Y x2 y2 xy
xy
r= 0 20
x2 y2
50 20
100 21
X: Training man-days on 150 22
lean production system 150 23
150 24
Y: Productivity in units / 200 26

labour hour 200 28


250 31
250 35

x2 = y2 = xy =


Measures of Relationship 80

X Y x=X-X y =Y-Y x2 y2 xy
3250
0 20 -150 -5 22500 25 750 r=
60000 * 226
50 20 -100 -5 10000 25 500
100 21 -50 -4 2500 16 200 3250
150 22 0 -3 0 9 0 r=
13560000
150 23 0 -2 0 4 0
150 24 0 -1 0 1 0 3250
r=
200 26 50 1 2500 1 50 3682.39
200 28 50 3 2500 9 150
250 31 100 6 10000 36 600 r = 0.88
250 35 100 10 10000 100 1000

X bar =150 Y bar =25 x2 = 60000 y2 = 226 xy = 3250
Measures of Relationship 81

Training Vs Productivity (units / labour hr)


40

35
Productivity units / labour hour

30

25

20

15

10

0
0 50 100 150 200 250 300
Measures of Relationship 82

Correlation

◼ Correlation measures how closely two variables are related.

◼ Correlation coefficients vary from +1 to -1

◼ A value close to +1 indicates that a high value in one variable


will be reflected by a high value in other

◼ A value close to -1 indicates that a high value in one variable


will be reflected by a low value in other

◼ Near Zero indicates no correlation


Measures of Relationship 83

Types of simple correlation

1. Perfect positive correlation (r= +1.00)

2. High degree of positive correlation (r=+0.85)

3. Low degree of positive correlation (r=+0.35)

4. Perfect negative correlation (r=-1.00)

5. High degree of negative correlation (r=-0.85)

6. Low degree of negative correlation (r=-0.35)

7. Zero correlation (r= 0)


Measures of Relationship 84
Measures of Relationship 85

Scatter diagram
Measures of Relationship 86

Scatter diagram
Correlation 87

▪ Correlating market data and business data is definitely a step in the right
direction. It shows the organization that we are pulling information together
and making important connections.
▪ Correlations are used to understand how data sets are related. In other
words of variable “X” changes does variable “Y” change
Correlation 88
89

Predictive Analytics
What can Predictive Analytics do in Business? 90

◼ Predictive modelling in Business focuses mostly on finding predictive patterns of


Sales revenue, Forecasting sales, and workforce planning

◼ Forward looking – It combines algorithms, historical information and data mining


to solve problems, realize an outcome or answer a question. For example
◼ How likely a customer will stay with the business

◼ What mixture of skills, experience, and competencies would most likely guarantee a
high performance.

◼ With this information, analysis can be applied to predict how successful different
courses of action will be
Simple linear Regression 91

◼ Simple linear regression involves one independent variable and


one dependent variable.

◼ The relationship between the two variables is approximated by a


straight line.

◼ Regression analysis involving two or more independent variables


is called multiple regression.
Simple linear Regression model 92

◼ The equation that describes how y is related to x and an error


term is called the regression model.

◼ The simple linear regression model is:

y = 0 + 1x +

where:

b0 and b1 are called parameters of the model, e is a random


variable called the error term.
Simple linear Regression Equation 93

Positive Linear Relationship

E(y)

Regression line

Intercept Slope 1
0 is positive

x
Simple linear Regression Equation 94

Negative Linear Relationship

E(y)

Intercept
0 Regression line

Slope 1
is negative

x
Simple linear Regression Equation 95

No Relationship

E(y)

Intercept Regression line


0
Slope 1
is 0

x
Established Simple linear Regression Equation 96

The estimated simple linear regression equation

ŷ = b0 + b1 x

• The graph is called the estimated regression line.


• b0 is the y intercept of the line.
• b1 is the slope of the line.
• ŷ is the estimated value of y for a given x value.
Least Squares method 97

 Slope for the Estimated Regression Equation

b1 =  ( x − x )( y − y )
i i

 (x − x )
i
2

where:
xi = value of independent variable for ith observation
yi = value of dependent variable for ith observation
_
x = mean value for independent variable
_
y = mean value for dependent variable
Least Squares method 98

y-Intercept for the Estimated Regression Equation

b0 = y − b1 x
What is Linear Regression 99
What is Linear Regression 100
What is Logistic Regression 101
What is Logistic Regression 102
Contact 103

Phone : 9600066166

Web : www.transbizconsulting.com

Email : [email protected] /
[email protected]

Twitter : Transbiz1

Linked in : linkedin.com/company/transbizconsulting

Facebook : facebook.com/transbizconsulting
Thank You

You might also like