Business Analytics
Business Analytics
Unit 1
Introduction to Basic Statistics
Statistics is the science of collecting, analyzing, interpreting, presenting, and organizing data.
It is a branch of mathematics that helps us understand and make decisions based on data.
Statistics is widely used in many fields, including economics, business, healthcare, education,
and social sciences.
Here is an overview of some fundamental concepts in statistics:
1. Types of Data
Data can be categorized in various ways:
Qualitative (Categorical) Data: Data that represents categories or labels. For example,
gender, color, and nationality.
o Nominal: Categories without a natural order (e.g., red, blue, green).
o Ordinal: Categories with a natural order (e.g., low, medium, high).
Quantitative (Numerical) Data: Data that represents amounts or quantities.
o Discrete: Countable data (e.g., number of students in a class).
o Continuous: Measurable data that can take any value within a range (e.g.,
height, weight).
2. Descriptive Statistics
Descriptive statistics is used to summarize and describe the features of a dataset. Key
measures include:
Measures of Central Tendency: These give us an idea of the "center" or typical value
in the dataset.
o Mean: The average of all data points.
o Median: The middle value when data points are arranged in order.
o Mode: The value that appears most frequently.
Measures of Dispersion (Spread): These describe the spread or variability of the data.
o Range: The difference between the highest and lowest values.
o Variance: The average of squared differences from the mean, indicating how
spread out the data is.
o Standard Deviation: The square root of the variance, giving a measure of how
much the data deviates from the mean.
3. Probability
Probability is the study of uncertainty. It helps in understanding the likelihood of an event
occurring. Basic concepts include:
Probability of an Event: A number between 0 and 1, representing how likely an event
is to occur.
Independent and Dependent Events: Independent events are those where the
occurrence of one does not affect the other, while dependent events are related.
Probability Distributions: These describe how the values of a random variable are
distributed (e.g., normal distribution).
4. Inferential Statistics
Inferential statistics allows us to make predictions or inferences about a population based on
a sample.
Sampling: A subset of the population is selected for analysis.
o Random Sampling: Every individual has an equal chance of being selected.
o Stratified Sampling: The population is divided into groups, and samples are
taken from each group.
Hypothesis Testing: A method of making inferences about a population by testing
assumptions or claims. Common tests include:
o Null Hypothesis (H₀): The hypothesis that there is no effect or no difference.
o Alternative Hypothesis (H₁): The hypothesis that there is an effect or a
difference.
o P-value: The probability of obtaining a result at least as extreme as the one
observed, assuming the null hypothesis is true.
Confidence Intervals: A range of values used to estimate the true value of a
population parameter, with a certain level of confidence (e.g., 95%).
5. Common Statistical Graphs
Visualizing data is important for understanding trends and patterns. Common types of
graphs include:
Histograms: Show the frequency distribution of data.
Bar Charts: Represent categorical data.
Box Plots: Show the distribution of data and highlight outliers.
Scatter Plots: Show relationships between two variables.
6. Correlation and Regression
Correlation: A measure of the relationship between two variables. A correlation
coefficient close to +1 or -1 indicates a strong relationship, while 0 indicates no
relationship.
Regression: A statistical method used to understand the relationship between
variables and make predictions. Linear regression is commonly used to model the
relationship between a dependent variable and one or more independent variables.
Conclusion
Basic statistics provide essential tools for analyzing data, making informed decisions, and
drawing conclusions from data. Whether you are conducting research, making business
decisions, or interpreting data in daily life, an understanding of basic statistics is crucial for
dealing with uncertainty and variability.
Measures of Central Tendency
Measures of central tendency are statistical measures that describe the center or typical
value of a dataset. These measures provide a single value that represents the data as a
whole. The three most common measures of central tendency are mean, median, and
mode.
1. Mean (Arithmetic Average)
The mean is the sum of all data points divided by the number of data points. It is the most
commonly used measure of central tendency.
Formula for Mean:
Mean=∑Xn
Where:
∑X is the sum of all data points.
n is the number of data points.
Example:
Consider the dataset: 2, 4, 6, 8, 10.
Mean=2+4+6+8+105=305/5=61
2. Median
The median is the middle value of a dataset when the data points are arranged in ascending
or descending order. If there is an even number of data points, the median is the average of
the two middle values.
Steps to Find the Median:
1. Arrange the data in order (ascending or descending).
2. If the number of data points is odd, the median is the middle value.
3. If the number of data points is even, the median is the average of the two middle
values.
Example 1 (Odd number of data points):
Consider the dataset: 3, 5, 7, 9, 11.
Arrange: 3, 5, 7, 9, 11.
The median is the middle value: 7.
Example 2 (Even number of data points):
Consider the dataset: 2, 4, 6, 8.
Arrange: 2, 4, 6, 8.
The median is the average of the two middle values: 4+62=5\frac{4 + 6}{2} = 524+6
=5.
3. Mode
The mode is the value that appears most frequently in the dataset. A dataset may have:
No mode: If no value repeats.
One mode (unimodal): If one value appears most frequently.
Two modes (bimodal): If two values appear with the same highest frequency.
Multiple modes (multimodal): If more than two values have the highest frequency.
Example 1 (Unimodal):
Consider the dataset: 1, 2, 2, 3, 4, 5.
The mode is 2 (because it appears more frequently than other values).
Example 2 (Bimodal):
Consider the dataset: 1, 2, 2, 3, 3, 4.
The modes are 2 and 3.
Example 3 (No mode):
Consider the dataset: 1, 2, 3, 4, 5.
There is no mode because no value repeats.
When to Use Each Measure:
Mean: Best used for datasets without outliers or extreme values, as it takes all data
points into account.
Median: Preferred when the dataset has outliers or skewed data, as it is less affected
by extreme values.
Mode: Useful when you want to know the most frequent value in a dataset,
especially for categorical data.
Summary:
Mean gives the average of all values.
Median provides the middle value when data is ordered.
Mode identifies the most frequent value in the dataset.
Choosing the appropriate measure of central tendency depends on the nature of the data
and the context of the analysis.
Measures of Dispersion
Measures of dispersion (also known as measures of variability or spread) describe the extent
to which data points in a dataset differ from the central value (such as the mean or median).
These measures give an idea of how spread out the data is. The most commonly used
measures of dispersion are range, variance, and standard deviation.
1. Range
The range is the simplest measure of dispersion. It represents the difference between the
maximum and minimum values in a dataset.
Formula for Range:
Range=Maximum Value−Minimum Value
Example:
Consider the dataset: 3, 5, 7, 9, 12.
Range=12−3=9
Pros: Simple and easy to calculate.
Cons: Sensitive to outliers. A single extreme value can greatly affect the range.
2. Variance
Variance measures the average squared deviation of each data point from the mean. It
provides a more accurate measure of dispersion because it considers the differences
between all data points, not just the extreme values.
Formula for Variance (Population Variance):
Variance=∑(Xi−μ)2N
Where:
Xi= Each individual data point.
μ = Mean of the data.
N = Total number of data points.
Formula for Sample Variance:
Variance (sample)=∑(Xi−Xˉ)2n−1
Where:
Xi = Each individual data point.
X = Sample mean.
n = Number of data points in the sample.
Pros: Provides a detailed measure of spread.
Cons: The units of variance are squared, which can make interpretation difficult.
3. Standard Deviation
The standard deviation is the square root of the variance. It is a more interpretable measure
of dispersion because it is in the same units as the original data. A higher standard deviation
indicates more variability in the data, while a lower standard deviation indicates that the
data points are closer to the mean.
Formula for Standard Deviation (Population Standard Deviation):
Standard Deviation=∑(Xi−μ)2N
Formula for Sample Standard Deviation:
Standard Deviation (sample)=∑(Xi−Xˉ)2n−1
Example:
Using the same dataset: 2, 4, 6, 8.
Variance = 5 (as calculated earlier).
Standard deviation = 5≈2.24
Pros: More interpretable because it is in the same units as the data.
Cons: Like variance, it can be influenced by extreme values.
4. Interquartile Range (IQR)
The interquartile range is another measure of dispersion that focuses on the spread of the
middle 50% of the data. It is the difference between the third quartile (Q3) and the first
quartile (Q1), which represent the 75th and 25th percentiles, respectively.
Formula for IQR:
IQR=Q3−Q1
Where:
Q1 = First quartile (25th percentile).
Q3 = Third quartile (75th percentile).
Example:
Consider the dataset: 1, 3, 5, 7, 9, 11, 13.
Q1 = 3 (the median of the lower half of the data).
Q3 = 11 (the median of the upper half of the data).
IQR = 11−3=8
Pros: Not affected by outliers or extreme values, as it only looks at the middle 50% of the
data.
Cons: Less precise than variance or standard deviation for understanding overall spread.
5. Coefficient of Variation (CV)
The coefficient of variation is a relative measure of dispersion that expresses the standard
deviation as a percentage of the mean. It is useful for comparing the dispersion of datasets
with different units or scales.
Formula for Coefficient of Variation:
CV=σμ×100
Where:
σ = Standard deviation.
μ = Mean.
Pros: Useful for comparing variability between datasets with different units or scales.
Cons: Can be misleading for datasets with a mean of 0 or values close to zero.
Summary of Measures of Dispersion:
Range: The simplest measure, indicating the difference between the highest and
lowest values in a dataset. However, it's sensitive to outliers.
Variance: The average of squared deviations from the mean. Provides a detailed
measure of spread but is in squared units, making interpretation less straightforward.
Standard Deviation: The square root of variance. It is more interpretable as it is in the
same units as the original data and is widely used to describe variability.
Interquartile Range (IQR): Measures the spread of the middle 50% of the data and is
resistant to outliers.
Coefficient of Variation (CV): Provides a relative measure of variability, useful for
comparing datasets with different units or scales.
Each measure of dispersion provides different insights into how data varies. The choice of
which to use depends on the nature of the data and the specific goals of analysis.
The measure of shape and relative location
The measure of shape and relative location are concepts often used in various fields like
geography, mathematics, design, and data analysis. Here's an explanation of both terms:
Measure of Shape
The "measure of shape" refers to the ways we can quantify or describe the characteristics of
a shape or object. In mathematics and geometry, this can involve:
1. Geometrical Properties: Such as the perimeter, area, volume, and angles of the
shape.
o For 2D shapes (like triangles, squares, or circles), we might measure the
perimeter (the boundary length) and area (the space enclosed).
o For 3D shapes (like spheres, cubes, or pyramids), we look at surface area and
volume.
2. Shape Analysis: In fields like computer vision or data analysis, shape may be
measured in more abstract terms, such as:
o Compactness: How round or compact a shape is, often measured as a ratio of
area to perimeter (for example, a circle has the highest compactness).
o Symmetry: How much a shape can be divided into similar halves or how
symmetric it is in one or more directions.
o Aspect Ratio: The ratio of the shape's width to height, often used in image
processing.
o Convexity: Whether a shape's boundary is convex (no indentations) or
concave (has indentations).
3. Fractal Dimension: In more complex shapes, such as those found in nature (e.g.,
coastlines), the measure of shape might involve fractals, where the concept of
dimensionality is used to describe irregular or self-similar patterns.
Relative Location
Relative location refers to the position of a point or place in relation to another, typically
using directions or distances instead of absolute coordinates.
1. In Geography:
o It is often described in terms of nearby landmarks, regions, or coordinates.
For example, "New York is located north of Washington D.C."
o Relative location might also include descriptions of proximity like "next to,"
"east of," or "adjacent to."
2. In Mathematics and Coordinate Systems:
o It can refer to the position of one point relative to another within a
coordinate system (such as Cartesian or polar coordinates). For example, in a
2D Cartesian coordinate system, a point might be described relative to the
origin (0,0) as (3, 4), meaning it's 3 units along the x-axis and 4 units along the
y-axis.
3. In Data Science and Spatial Analysis:
o In datasets involving geographical locations or spatial distributions (such as in
GIS), relative location could describe the positioning of data points in relation
to each other, like clusters of points, distances between them, or regions of
influence.
In summary:
Measure of shape involves quantifying the properties or features of a shape, either
geometrically or through other descriptors like symmetry or compactness.
Relative location describes where something is positioned in relation to something
else, often using terms of direction, distance, or proximity.
Skewness and kurtosis are statistical measures used to describe the shape of a data
distribution. These concepts help us understand the symmetry and the "tailedness" of a
distribution, respectively.
Skewness
Skewness measures the asymmetry of the probability distribution of a real-valued random
variable. In simpler terms, it tells us whether the data is skewed or tilted to one side.
Positive skew (right skew): The right tail (larger values) of the distribution is longer or
fatter than the left tail (smaller values).
Negative skew (left skew): The left tail of the distribution is longer or fatter than the
right tail.
Zero skew: The distribution is perfectly symmetrical (e.g., a normal distribution has
zero skewness).
Mathematically, skewness is defined as:
Skewness = 3(mean-median)/standard deviation.
Interpretation:
o Skewness > 0: Distribution is positively skewed (right tail is longer).
o Skewness < 0: Distribution is negatively skewed (left tail is longer).
o Skewness = 0: Distribution is symmetric (like a normal distribution).
Kurtosis
Kurtosis measures the tailedness or the sharpness of the peak of a data distribution. In
simple terms, kurtosis tells us whether the distribution has heavy or light tails compared to a
normal distribution.
Leptokurtic (kurtosis > 3): The distribution has a sharper peak and heavier tails than
the normal distribution (more outliers).
Platykurtic (kurtosis < 3): The distribution is flatter than the normal distribution with
lighter tails.
Mesokurtic (kurtosis = 3): The distribution has the same shape as the normal
distribution.
Interpretation:
o Kurtosis > 3: Leptokurtic distribution (heavy tails and sharp peak).
o Kurtosis < 3: Platykurtic distribution (light tails and flatter peak).
o Kurtosis = 3: Mesokurtic distribution, which is typical of a normal distribution.
Theorem and Relationship
1. Skewness Theorem:
o If the skewness γ1=0, the distribution is symmetrical.
o A positive skew indicates that the data is more concentrated on the left side
of the mean, with a longer tail on the right.
o A negative skew indicates that the data is more concentrated on the right side
of the mean, with a longer tail on the left.
2. Kurtosis Theorem:
o For a normal distribution, the kurtosis is exactly 3, which is referred to as
mesokurtic.
o Leptokurtic distributions (kurtosis > 3) have more extreme outliers.
o Platykurtic distributions (kurtosis < 3) have fewer extreme outliers and a
flatter peak.
Chebyshev's Theorem (also known as Chebyshev's Inequality) is a fundamental result in
probability theory and statistics. It provides a bound on how much of the data in any
distribution (regardless of its shape) lies within a certain number of standard deviations from
the mean.
Statement of Chebyshev's Theorem
Chebyshev's Theorem: ( 1 − 1 k 2 ) × 100 , where k equals the number of standard
deviations; k must be >1.
Key Points of Chebyshev's Theorem
Generality: Chebyshev’s inequality applies to any distribution, not just normal
distributions. This is one of its key advantages because it doesn't require assumptions
about the shape or normality of the data.
Conservative Bound: The inequality gives a conservative estimate; it does not provide
the exact percentage of data within a certain range, but rather a lower bound. In
other words, the actual proportion of data within k standard deviations could be
higher, but it will never be less than the value given by the inequality.
No Assumption of Distribution: Unlike many other statistical results (e.g., the
empirical rule for normal distributions), Chebyshev’s theorem doesn’t assume the
data follows a specific distribution.
Why is Chebyshev's Theorem Important?
1. Robustness: It’s particularly useful when you don’t know the exact shape of the
distribution (i.e., it's not necessarily normal).
2. Worst-case Scenario: It helps in understanding the worst-case scenario for how
spread out data can be. If you don’t know the distribution of data but only have
information about its mean and variance, Chebyshev’s theorem can give you a
reliable bound on the spread.
3. Non-normal Distributions: While many statistical techniques assume normality (e.g.,
in the Central Limit Theorem or Z-scores), Chebyshev’s inequality is a valuable tool
when working with non-normal data.
Limitations of Chebyshev's Theorem
The bounds provided by Chebyshev’s inequality are not tight. In other words, the
actual proportion of data within k standard deviations can often be much larger than
what Chebyshev’s inequality predicts.
For normal distributions, the empirical rule (68-95-99.7 rule) is more accurate and
efficient than Chebyshev’s theorem because it provides a much tighter estimate of
where the majority of the data lies.
Summary
Chebyshev's Theorem provides a powerful tool for understanding the spread of data,
especially when we don't know the underlying distribution. It tells us that, regardless of the
shape of the distribution, a certain percentage of the data will always lie within a specific
number of standard deviations from the mean, offering useful insight for non-normal
datasets.
UNIT 2
Introduction to Probability
Probability is the branch of mathematics that deals with the likelihood or chance of an event
occurring. It is used to quantify uncertainty and is applied in various fields such as statistics,
finance, science, engineering, and everyday life.
Key Concepts in Probability
1. Experiment: An action or process that leads to an outcome. For example, tossing a
coin or rolling a die.
2. Outcome: The result of an experiment. In the case of a coin toss, the possible
outcomes are "heads" and "tails."
3. Sample Space: The set of all possible outcomes of an experiment. For a coin toss, the
sample space is {heads, tails}. For a die roll, it is {1, 2, 3, 4, 5, 6}.
4. Event: A specific outcome or a group of outcomes that we are interested in. For
example, an event could be "the coin shows heads," or "the die roll is an even
number."
Probability of an Event
The probability of an event A, denoted P(A), is a number between 0 and 1 that indicates the
likelihood of the event happening. The probability is calculated as:
P(A)=Number of favorable outcomes/Total number of possible outcomes in the sample
If P(A)=0, the event will not occur.
If P(A)=1, the event will certainly occur.
If 0<P(A)<1, the event has some chance of occurring.
Types of Events
1. Independent Events: Two events are independent if the occurrence of one event
does not affect the probability of the other event. For example, tossing a coin and
rolling a die are independent events.
2. Dependent Events: Two events are dependent if the occurrence of one event affects
the probability of the other event. For example, drawing two cards from a deck
without replacement.
3. Mutually Exclusive Events: Two events are mutually exclusive if they cannot both
happen at the same time. For example, getting heads and tails in a single coin toss
are mutually exclusive.
4. Complementary Events: The complement of an event A is the event that A does not
occur. The probability of the complement of A, denoted A′, is given by:
P(A′)=1−P(A)
Theory of Probability
The Theory of Probability is a mathematical framework that deals with the analysis of
random events. It provides the tools and principles for calculating the likelihood of different
outcomes, understanding the behavior of random phenomena, and making decisions based
on uncertain information. Probability theory is foundational to fields such as statistics,
machine learning, economics, physics, and many other disciplines that involve uncertainty or
randomness.
Fundamental Principles of Probability
1. Random Experiments:
o A random experiment is an action or process that leads to one of several
possible outcomes, but the exact outcome cannot be predicted in advance.
For example, rolling a die, drawing a card from a deck, or measuring the
temperature on a given day.
2. Sample Space:
o The sample space (S) is the set of all possible outcomes of a random
experiment. For example, if you flip a coin, the sample space is
S={Heads, Tails}
o If a die is rolled, the sample space is S={1,2,3,4,5,6}
3. Events:
o An event is any subset of the sample space. It represents the outcomes of
interest. For example:
If rolling a die, an event might be "rolling an even number,"
represented by the subset {2,4,6}\{2, 4, 6\}{2,4,6}.
An event can be as simple as a single outcome (e.g., "rolling a 3") or
more complex, involving multiple outcomes.
4. Probability Function:
o A probability function assigns a probability to each event in the sample space.
The probability of an event A, denoted by P(A), is a number between 0 and 1,
which measures the likelihood of the event occurring.
o The probability of the sample space is always 1, and the probability of the
empty set (no outcome) is 0: P(S)=1,P(∅)=0
o The probability of any event must satisfy two key conditions:
1. 0≤P(A)≤1 for any event A
2. P(S)=1P(S) = 1P(S)=1
Types of Probability
1. Classical Probability:
o This is used when all outcomes of an experiment are equally likely. If there
are nnn equally likely outcomes, and event AAA contains mmm outcomes, the
probability of AAA is given by: P(A)=mnP(A) = \frac{m}{n}P(A)=nm
o Example: When rolling a fair die, there are 6 equally likely outcomes. The
probability of rolling a 4 is: P(rolling a 4)=16P(\text{rolling a 4}) = \frac{1}
{6}P(rolling a 4)=61
2. Empirical Probability (or Frequentist Probability):
o This type of probability is based on observed data or experiments. It is the
ratio of the number of favorable outcomes to the total number of trials:
P(A)=Number of times event A occursTotal number of trialsP(A) = \frac{\
text{Number of times event } A \text{ occurs}}{\text{Total number of
trials}}P(A)=Total number of trialsNumber of times event A occurs
o Example: If you flip a coin 100 times and get heads 55 times, the empirical
probability of getting heads is: P(Heads)=55100=0.55P(\text{Heads}) = \
frac{55}{100} = 0.55P(Heads)=10055=0.55
3. Subjective Probability:
o This is based on personal belief or judgment about how likely an event is to
occur, often used in situations where there is no clear empirical or classical
data.
o For example, an economist might estimate the probability of a market crash
based on personal experience or expert opinions.
Addition and Multiplication Laws of Probability
The Addition Law and Multiplication Law are two fundamental rules in probability theory
that help in calculating the probability of combined events. These laws apply to different
types of events (e.g., independent, dependent, mutually exclusive, etc.) and are essential in
solving complex probability problems.
P(A∪B)=P(A)+P(B)
where:
P(A∪B)=P(A)+P(B)−P(A∩B)
where:
o P(A∩B) is the probability that both events AAA and BBB happen
simultaneously.
Example:
P(A∩B)=P(A)×P(B∣A)
Where:
P(A∩B) is the probability that both events AAA and BBB occur.
P(A) is the probability of event AAA.
P(B∣A) is the conditional probability of event BBB occurring given that event AAA has
already occurred.
Special Case: Independent Events
If events AAA and BBB are independent, the occurrence of AAA does not affect the
occurrence of BBB, so the multiplication rule becomes:
P(A∩B)=P(A)×P(B)
Example:
Dependent Events:
If events AAA and BBB are dependent, the occurrence of event AAA affects the probability of
event BBB. In this case, the multiplication rule becomes:
P(A∩B)=P(A)×P(B∣A)
Here, P(B∣A)represents the probability of BBB occurring given that AAA has occurred.
P(A∣B)=P(B∣A)⋅P(A)/P(B)
Where:
P(A∣B): The posterior probability – the probability of event AAA occurring given that
BBB has occurred (what we want to find).
P(B∣A): The likelihood – the probability of event BBB occurring given that AAA has
occurred.
P(A): The prior probability – the initial probability of event AAA occurring before any
evidence (i.e., event BBB) is considered.
P(B): The marginal likelihood – the total probability of event BBB occurring,
regardless of whether AAA occurs or not.
Interpretation
Bayes' Theorem shows us that even with a highly accurate test, the rarity of the disease
(prior probability) can lead to a relatively low probability that a person actually has the
disease after testing positive. This is due to the fact that there are still a significant number
of false positives in the general population (those who do not have the disease but still test
positive).
Summary
Bayes' Theorem provides a way to update the probability of an event based on new
information or evidence.
It is used for calculating conditional probabilities and plays a crucial role in many
fields such as medical diagnosis, machine learning, and decision-making.
Bayes' Theorem is particularly powerful because it allows us to incorporate both
prior knowledge and new data to make better-informed decisions.
Probability Theoretical Distributions
A probability distribution is a mathematical function that provides the probabilities of
occurrence of different possible outcomes in an experiment. It describes how probabilities
are distributed over the values of the random variable. There are two main types of
probability distributions:
1. Discrete Probability Distributions: Used when the random variable can take only
specific, distinct values (e.g., integer values).
2. Continuous Probability Distributions: Used when the random variable can take any
value within a given range or interval.
Below are the key theoretical distributions in probability theory.
The two main types of probability distributions are:
1. Discrete Probability Distributions: These are used when the random variable can
take on a finite or countable number of possible outcomes. In a discrete distribution,
each outcome has a specific probability. Examples of discrete probability
distributions include:
o Binomial distribution: Describes the number of successes in a fixed number
of independent Bernoulli trials.
o Poisson distribution: Models the number of events occurring in a fixed
interval of time or space.
2. Continuous Probability Distributions: These are used when the random variable can
take on an infinite number of possible values within a given range. In continuous
distributions, the probability of the variable taking any exact value is zero, but
probabilities are described over intervals. Examples of continuous probability
distributions include:
o Normal distribution: A bell-shaped curve that describes many natural
phenomena, such as heights or test scores.
o Exponential distribution: Describes the time between events in a Poisson
process.
These two types of distributions help to model different types of random processes in
statistics and probability theory.
Binomial Distribution
Concept:
The binomial distribution models the number of successes in a fixed number of independent
trials of a binary (success/failure) experiment. It is used when:
The trials are independent.
Each trial has two possible outcomes (success or failure).
The probability of success is the same for every trial.
The number of trials is fixed.
Formula: The probability of observing exactly kkk successes in nnn trials is given by:
P(x=k) = (n/k )p^k q^n-k
Where:
n = number of trials,
k = number of successes,
p = probability of success on a single trial,
(n/k) is the binomial coefficient, which represents the number of ways to choose
ksuccesses from n trials.
Application:
Coin tosses: For example, if you flip a fair coin 10 times, you can use the binomial
distribution to find the probability of getting exactly 6 heads.
Quality control: In a factory, if 95% of the products pass a quality test, you can use
the binomial distribution to calculate the likelihood that, out of 20 products, 18 pass
the test.
Survey analysis: If 70% of people in a population support a candidate, you can
calculate the probability that, in a sample of 100 people, 75 or more support the
candidate.
2. Poisson Distribution
Concept:
The Poisson distribution models the number of events occurring in a fixed interval of time or
space, given the average number of events in that interval. These events must occur
independently, and the average rate at which they happen is constant. It is particularly
useful for modeling rare events that occur randomly over time or space.
Formula:
Application:
Traffic accidents: The Poisson distribution can model the number of traffic accidents
at an intersection over a month, given an average number of accidents per month.
Call centers: It can be used to model the number of calls received by a call center in a
given hour.
Web page hits: The number of times a website receives hits during a day can be
modeled by a Poisson distribution if the average number of hits is known.
3. Normal Distribution
Concept:
The normal distribution, also known as the Gaussian distribution, is a continuous probability
distribution that is symmetric around its mean. The normal distribution is characterized by
two parameters:
The mean (μ) represents the center of the distribution.
The standard deviation (σ) controls the spread of the distribution (larger σ means
wider distribution).
The bell-shaped curve is symmetric, with most of the values clustering around the mean,
and the probability of extreme values (far from the mean) decreases rapidly.
Formula:
The standard normal distribution (z distribution) is a normal distribution with a mean of 0
and a standard deviation of 1. Any point (x) from a normal distribution can be converted to
the standard normal distribution (z) with the formula z = (x-mean) / standard deviation.
Application:
Heights of individuals: The heights of people in a population are often normally
distributed, with most people clustering around the mean height, and fewer
individuals being extremely short or tall.
Measurement errors: In scientific experiments, measurement errors often follow a
normal distribution, meaning the errors are likely to be small and centered around
zero.
IQ scores: IQ scores are typically modeled by a normal distribution, with a mean of
100 and a standard deviation of 15, with most people scoring close to the average.
UNIT 3
Correlation Analysis in Business Analytics
Correlation analysis is a statistical technique used to measure and analyze the relationship
between two or more variables. In business analytics, correlation analysis helps
organizations understand how different factors are related to each other, which can aid in
making informed decisions, predictions, and strategies.
Here’s how correlation analysis can be applied in business analytics:
1. Understanding Correlation
Definition: Correlation quantifies the degree to which two variables are related. If
one variable changes, how does the other change?
Correlation Coefficient: The most common way to measure correlation is through the
correlation coefficient, typically denoted as r.
o r = 1: Perfect positive correlation (both variables move in the same direction).
o r = -1: Perfect negative correlation (variables move in opposite directions).
o r = 0: No correlation (no predictable relationship between variables).
o 0 < r < 1: Positive correlation (as one increases, so does the other).
o -1 < r < 0: Negative correlation (as one increases, the other decreases).
2. Applications of Correlation in Business Analytics
Sales Forecasting: Businesses can use correlation analysis to examine relationships
between sales and other variables such as advertising spend, seasonal trends, or
economic indicators.
o Example: Analyzing the correlation between advertising spend and sales
growth to understand if increased advertising leads to higher sales.
Customer Behavior: Businesses can study the correlation between customer
demographic factors and purchasing behavior.
o Example: Correlating customer age and income with product preference,
helping to tailor marketing strategies.
Product Performance: Analyzing correlations between product attributes (price,
quality, marketing) and customer satisfaction.
o Example: Investigating the correlation between product pricing and customer
satisfaction scores.
Operational Efficiency: By analyzing correlations between operational factors (like
employee training hours and performance or inventory levels and sales), businesses
can streamline operations.
o Example: A company might analyze the correlation between stock-out rates
and customer satisfaction.
Financial Analysis: Investors and financial analysts often use correlation to analyze
how various stocks or assets perform in relation to one another. A correlation matrix
helps assess portfolio diversification.
o Example: Analyzing the correlation between stock returns and economic
indicators like GDP growth or interest rates.
3. Types of Correlation
Pearson Correlation: Measures the linear relationship between two continuous
variables. It assumes normal distribution.
Spearman Rank Correlation: Used when data does not meet the assumptions of
normality or when dealing with ordinal data.
Kendall’s Tau: Another non-parametric correlation measure, often used for small data
sets or when dealing with ordinal data.
Interpretation:
ρ=1: Perfect positive correlation (the ranks of X and Y match perfectly).
ρ=−1: Perfect negative correlation (the ranks are exactly opposite).
ρ=0: No correlation.
Spearman’s rank correlation is best suited for:
Ordinal data (data that can be ranked but not necessarily measured numerically, e.g.,
customer satisfaction on a scale of 1 to 5).
Non-linear relationships between variables.
Sensitivity to
Less sensitive to outliers Sensitive to outliers
Outliers
Conclusion:
Pearson’s Coefficient is ideal for measuring linear relationships between continuous
variables when data is normally distributed.
Spearman’s Rank Correlation is more appropriate for ordinal data or when you
suspect a non-linear relationship or when data contains outliers that might affect
Pearson's calculation.
Both methods are essential in business analytics, with the choice depending on the nature
of the data and the type of relationship you're investigating.
Properties of Correlation
Correlation analysis is fundamental in statistics and business analytics. It helps us understand
the strength and direction of the relationship between two variables. The correlation
coefficient (typically denoted as r) is used to quantify this relationship. The following are the
key properties of correlation that define its behavior and usage:
Correlation Coefficient Properties
The correlation coefficient is all about establishing relationships between two variables.
Some properties of the correlation coefficient are as follows:
1) The correlation coefficient remains in the same measurement as in which the two
variables.
2) The sign that correlations of coefficient have will always be the same as the variance.
3) The numerical value of the correlation of coefficient will be between -1 to + 1. It is known
as the real number value.
4) The negative value of the coefficient suggests that the correlation is strong and negative.
And if ‘r’ goes on approaching -1, then it means that the relationship is going towards the
negative side.
When ‘r’ approaches the side of + 1, then it means the relationship is strong and positive. By
this, we can say that if +1 is the result of the correlation, then the relationship is in a positive
state.
5) The weak correlation is signalled when the coefficient of correlation approaches zero.
When ‘r’ is near zero, then we can deduce that the relationship is weak.
6) Correlation coefficient can be very dicey because we cannot say whether the participants
are truthful or not.
The coefficient of correlation is not affected when we interchange the two variables.
7) The coefficient of correlation is a pure number without the effect of any units on it. It also
does not get affected when we add the same number to all the values of one variable. We
can multiply all the variables by the same positive number. It does not affect the correlation
coefficient. As we discussed, ‘r’ is not affected by any unit because ‘r’ is a scale-invariant.
8) We use correlation for measuring the association, but that does not mean we are talking
about causation. By this, we simply mean that when we are correlating the two variables,
then it might be the possibility that the third variable may be influencing them.
Regression Analysis:
Regression analysis is a statistical technique used to model and analyze the relationship
between a dependent variable (also called the outcome or response) and one or more
independent variables (also called predictors or features). The goal of regression analysis is
to understand how the dependent variable changes when one or more independent
variables are varied and to make predictions based on this relationship.
In simple terms, regression analysis is used to predict the value of the dependent variable
based on the values of the independent variables.
Key Components in Regression Analysis:
1. Dependent Variable (Y): The variable that you are trying to predict or explain. It is the
outcome or the response variable. For example, in a business context, it could be
sales revenue.
2. Independent Variables (X): The variables that explain the changes in the dependent
variable. These are also called predictor variables. For example, in a business context,
independent variables could be advertising budget, product price, and number of
salespeople.
Example:
Suppose we run a regression analysis to examine the relationship between the amount of
exercise (independent variable) and weight loss (dependent variable). The results are:
- R² = 0.7
- Adjusted R² = 0.65
- F-statistic = 10.2 (p-value < 0.01)
- Slope (b1) = -2.5 (p-value < 0.01)
- Intercept (b0) = 10.2
Interpretation:
- The regression model explains 65% of the variance in weight loss.
- For every additional hour of exercise, weight loss increases by 2.5 pounds.
- The intercept indicates that, on average, individuals who do not exercise lose 10.2 pounds.
Note: This is a simplified example and actual regression results may require more nuanced
interpretation.
Conclusion:
Fitting a regression line involves calculating the regression equation and interpreting key
results such as the slope, intercept, R2R^2R2, p-values, and residuals. These results help in
understanding the strength, direction, and statistical significance of the relationship
between variables, and they allow businesses and researchers to make informed predictions
and decisions based on the data.
Properties of Regression Coefficient
1. The regression coefficient is denoted by b.
2. We express it in the form of an original unit of data.
3. The regression coefficient of y on x is denoted by byx. The regression coefficient of x on y is
denoted by bxy.
4. If one regression coefficient is greater than 1, then the other will be less than 1.
5. They are not independent of the change of scale. There will be change in the regression
coefficient if x and y are multiplied by any constant.
6. AM of both regression coefficients is greater than or equal to the coefficient of
correlation.
7. GM between the two regression coefficients is equal to the correlation coefficient.
8. If bxy is positive, then byx is also positive and vice versa.
Relationship Between Regression and Correlation
Regression and correlation are both statistical techniques used to examine the relationship
between two or more variables. While they are related and often used together, they serve
different purposes and provide different kinds of information about the data.
Here’s a breakdown of the relationship between regression and correlation:
Summary of Key Differences and Relationships:
Predict the value of the dependent Measure the strength and direction of
Purpose variable based on independent a linear relationship between two
variables. variables.
Conclusion:
Regression and correlation are both valuable tools for analyzing relationships
between variables, but they serve different purposes.
o Regression is focused on prediction and understanding causal relationships
between a dependent and independent variable.
o Correlation is focused on measuring the strength and direction of the
relationship between two variables, without implying causality.
The correlation coefficient is closely related to the slope of the regression line, and in
simple linear regression, they are connected mathematically, but the two tools offer
different insights into the data.
UNIT 4
Linear Programming (LP) is a mathematical optimization technique used to find the best
possible outcome in a model with linear relationships, subject to a set of linear constraints.
The goal of linear programming is to maximize or minimize a linear objective function while
satisfying certain conditions (or constraints).
Let me break down the key components of linear programming and its various aspects:
1. Linear Programming:
Objective: Linear programming aims to find the best outcome (such as maximum
profit or minimum cost) given a set of constraints, where both the objective function
and the constraints are linear.
Formulation:
Number of
Focuses on one variable only Involves multiple variables
Variables
Summary
Descriptive Analytics focuses on analyzing historical data to understand past
behaviors and trends. It uses techniques like aggregation, data visualization, and
summary statistics to provide insights into what has happened.
Predictive Analytics aims to predict future outcomes using historical data and
statistical methods.
o Univariate Predictive Analytics focuses on predicting an outcome based on a
single variable.
o Multivariate Predictive Analytics predicts outcomes by analyzing the
relationship between multiple variables.
Both descriptive and predictive analytics are crucial in data-driven decision-making.
Descriptive analytics helps in understanding past performance, while predictive analytics
helps in forecasting future trends and making proactive decisions.