DAV practical 4
DAV practical 4
Practical Number 4
Title of Practical To implement Correlation and Covariance.
Prior Concept:
Covariance
It’s a statistical term demonstrating a systematic association between two random variables,
where the change in the other mirrors the change in one variable.
Definition and Calculation of Covariance
Covariance implies whether the two variables are directly or inversely proportional.
The covariance formula determines data points in a dataset from their average value. For
instance, you can compute the Covariance between two random variables, X and Y, using
the following formula:
Where,
Page | 1
Positive Covariance
If the relationship between the two variables is a positive covariance, they are progressing in
the same direction. It represents a direct relationship between the variables. Hence, the
variables will behave similarly.
The relationship between the variables will be positive Covariance only if the values of one
variable (smaller or more significant) are equal to the importance of another variable.
Negative Covariance
A negative number represents negative Covariance between two random variables. It
implies that the variables will share an inverse relationship. In negative Covariance, the
variables move in the opposite direction.
In contrast to the positive Covariance, the greater of one variable correspond to the smaller
value of another variable and vice versa.
Zero Covariance
Zero Covariance indicates no relationship between two variables.
Significance of Covariance in Assessing Linear Relationship
Covariance is significant in determining the linear relationship between variables. It suggests
the direction (negative or positive) and magnitude of the relationship between variables.
A higher covariance value indicates a strong linear relationship between the variables, while
a zero covariance suggests no ties.
Limitations and Considerations of Covariance
The scales of measurements influence the Covariance and are highly affected by outliers.
Covariance is restricted to measuring only the linear relationships and doesn’t apprehend
the direction or strength.
Moreover, comparing covariances across various datasets demand caution due to different
variable ranges.
Correlation
Unlike Covariance, correlation tells us the direction and strength of the relationship
between multiple variables. Correlation assesses the extent to which two or more random
variables progress in sequence.
Definition and Calculation of Correlation Coefficient
Correlation is a statistical concept determining the relationship potency of two numerical
variables. While deducing the relation between variables, we conclude the change in one
variable that impacts a difference in another.
Page | 2
When an analogous movement of another variable reciprocates the progression of one
variable in some manner or another throughout the study of two variables, the variables are
correlated.
The formula for calculating the correlation coefficient is as follows:
Where,
In a negative correlation, one variable’s value increases while the second one’s value
decreases. A perfect negative correlation has a value of -1.
The negative correlation appears as follows:
Page | 3
Just like in the case of Covariance, a zero correlation means no relation between the
variables. Therefore, whether one variable increases or decreases won’t affect the other
variable.
Strength and Direction of Correlation
Correlation assesses the direction and strength of a linear relationship between multiple
variables. The correlation coefficient varies from -1 to 1, with values near -1 or 1 implying a
high association (negative or positive, respectively) and values near 0 suggesting a weak or
no correlation.
Pearson Correlation Coefficient and Its Properties
The Pearson correlation coefficient (r) measures the linear connection between two
variables. The properties of the Pearson correlation coefficient include the following:
Strength: The coefficient’s absolute value indicates the relationship’s strength. The closer
the value of the coefficient is to 1, the stronger the correlation between variables. However,
a value nearer to 0 represents a weaker association.
Direction: The coefficient’s sign denotes the direction of the relationship. If the value is
positive, there is a positive correlation between the two variables, which means that if one
variable rises, the other will also rise. If the value is negative, there is a negative correlation,
which suggests that when one variable increases, the other will fall.
Range: The coefficient’s content varies from -1 to 1. The perfect linear relationship is
represented by several -1, the absence of a linear relationship is represented by 0, and an
ideal linear relationship is denoted by a value of 1.
Independence: The Pearson correlation coefficient quantifies how linearly dependent two
variables are but does not imply causality. There is no guarantee that a strong correlation
indicates a cause-and-effect connection.
Linearity: The Pearson correlation coefficient only assesses linear relationships between
variables. The coefficient could be insufficient to describe non-linear connections fully.
Sensitivity to Outliers: Outliers in the data might influence the correlation coefficient’s
value, thereby boosting or deflating its size.
Page | 4
Advantages and Disadvantages of Covariance
Following are the advantages and disadvantages of Covariance:
Advantages
Easy to Calculate: Calculating covariance doesn’t require any assumptions of the underlying
data distribution. Hence, it’s easy to calculate covariance with the formula given above.
Apprehends Relationship: Covariance gauges the extent of linear association between
variables, furnishing information about the relationship’s magnitude and direction (positive
or negative).
Beneficial in Portfolio Analysis: Covariance is typically employed in portfolio analysis to
evaluate the diversification advantages of integrating different assets.
Disadvantages
Restricted to Linear Relationships: Covariance only gauges linear relationships between
variables and does not capture non-linear associations.
Doesn’t Offer Relationship Magnitude: Covariance doesn’t offer a standardized estimation
of the intensity or strength of the relationship between variables.
Scale Dependency: Covariance is affected by the variables’ measurement scales, making
comparing covariances across various datasets or variables with distinct units challenging.
Advantages and Disadvantages of Correlation
The advantages and disadvantages of correlation are as follows:
Advantages
Determining Non-Linear Relationships: While correlation primarily estimates linear
relationships, it can also demonstrate the presence of non-linear connections, especially
when using alternative correlation standards like Spearman’s rank correlation coefficient.
Standardized Criterion: Correlation coefficients, such as the Pearson correlation coefficient,
are standardized, varying from -1 to 1. This allows for easy comparison and interpretation of
the direction and strength of relationships across different datasets.
Robustness to Outliers: Correlation coefficients are typically less sensitive to outliers than
Covariance, delivering a more potent standard of the association between variables.
Scale Independencies: Correlation is not affected by the measurement scales, making it
convenient for comparing affinities between variables with distinct units or scales.
Disadvantages
Driven by Extreme Values: Extreme values can still affect the correlation coefficient, even
though it is less susceptible to outliers than Covariance.
Data Requirements: Correlation assumes that the data is distributed according to a
bivariate normal distribution, which may not always be accurate.
Limited to Bivariate Analysis: Because correlation only examines the connection between
two variables simultaneously, it can only capture simple multivariate correlations.
Using Pandas:
import pandas as pd
Page | 5
'data2': [5, 6, 7, 8, 9]
}
df = pd.DataFrame(data)
# Covariance calculation
covariance = df['data1'].cov(df['data2'])
# Correlation calculation
correlation = df['data1'].corr(df['data2'])
print("Covariance:", covariance)
print("Correlation:", correlation)
Output:
Covariance: 2.5
Correlation: 0.9999999999999999
Output:
"Correlation coefficient: 1"
"Covariance: 2.5"
Learning Objectives:
To understand the correlation and covariance between the variables.
Conclusion/Learning outcome:
Correlation and Covariance between the variables between variables is understood and
implemented in Python and R-code.
R1 R2 R3
DOP DOS Conduction File Record Viva Voice Total Signature
5 Marks 5 Marks 5 Marks 15 Marks
Page | 6