0% found this document useful (0 votes)
1 views

DAV practical 4

MU DAV

Uploaded by

mgade3012
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

DAV practical 4

MU DAV

Uploaded by

mgade3012
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Subject: Data Analytics & Visualization Course Code: CSL-601

Semester: 6 Course: AI & DS


Laboratory No: 315 Name of Subject Teacher: Rajesh Morey
Name of Student: Meghana Gade Roll Id: VU2S2223002

Practical Number 4
Title of Practical To implement Correlation and Covariance.

Prior Concept:
Covariance
It’s a statistical term demonstrating a systematic association between two random variables,
where the change in the other mirrors the change in one variable.
Definition and Calculation of Covariance
Covariance implies whether the two variables are directly or inversely proportional.
The covariance formula determines data points in a dataset from their average value. For
instance, you can compute the Covariance between two random variables, X and Y, using
the following formula:

Where,

Interpreting Covariance Values


Covariance values indicate the magnitude and direction (positive or negative) of the
relationship between variables. The covariance values range from -∞ to +∞. The positive
value implies a positive relationship, whereas the negative value represents a negative
relationship.
Positive, Negative, and Zero Covariance
The higher the number, the more reliant the relationship between the variables. Let’s
comprehend each variance type individually:

Page | 1
Positive Covariance
If the relationship between the two variables is a positive covariance, they are progressing in
the same direction. It represents a direct relationship between the variables. Hence, the
variables will behave similarly.
The relationship between the variables will be positive Covariance only if the values of one
variable (smaller or more significant) are equal to the importance of another variable.
Negative Covariance
A negative number represents negative Covariance between two random variables. It
implies that the variables will share an inverse relationship. In negative Covariance, the
variables move in the opposite direction.
In contrast to the positive Covariance, the greater of one variable correspond to the smaller
value of another variable and vice versa.
Zero Covariance
Zero Covariance indicates no relationship between two variables.
Significance of Covariance in Assessing Linear Relationship
Covariance is significant in determining the linear relationship between variables. It suggests
the direction (negative or positive) and magnitude of the relationship between variables.
A higher covariance value indicates a strong linear relationship between the variables, while
a zero covariance suggests no ties.
Limitations and Considerations of Covariance
The scales of measurements influence the Covariance and are highly affected by outliers.
Covariance is restricted to measuring only the linear relationships and doesn’t apprehend
the direction or strength.
Moreover, comparing covariances across various datasets demand caution due to different
variable ranges.

Correlation
Unlike Covariance, correlation tells us the direction and strength of the relationship
between multiple variables. Correlation assesses the extent to which two or more random
variables progress in sequence.
Definition and Calculation of Correlation Coefficient
Correlation is a statistical concept determining the relationship potency of two numerical
variables. While deducing the relation between variables, we conclude the change in one
variable that impacts a difference in another.

Page | 2
When an analogous movement of another variable reciprocates the progression of one
variable in some manner or another throughout the study of two variables, the variables are
correlated.
The formula for calculating the correlation coefficient is as follows:

Where,

Interpreting Correlation Values


There are three types of correlation based on diverse values. Negative correlation, positive
correlation, and no or zero correlation.
Positive, Negative, and Zero Correlation
If the variables are directly proportional to one another, the two variables are said to hold a
positive correlation. This implies that if one variable’s value rises, the other’s value will
exceed. An ideal positive correlation possesses a value of 1.
Here’s what a positive correlation looks like:

In a negative correlation, one variable’s value increases while the second one’s value
decreases. A perfect negative correlation has a value of -1.
The negative correlation appears as follows:

Page | 3
Just like in the case of Covariance, a zero correlation means no relation between the
variables. Therefore, whether one variable increases or decreases won’t affect the other
variable.
Strength and Direction of Correlation
Correlation assesses the direction and strength of a linear relationship between multiple
variables. The correlation coefficient varies from -1 to 1, with values near -1 or 1 implying a
high association (negative or positive, respectively) and values near 0 suggesting a weak or
no correlation.
Pearson Correlation Coefficient and Its Properties
The Pearson correlation coefficient (r) measures the linear connection between two
variables. The properties of the Pearson correlation coefficient include the following:
 Strength: The coefficient’s absolute value indicates the relationship’s strength. The closer
the value of the coefficient is to 1, the stronger the correlation between variables. However,
a value nearer to 0 represents a weaker association.
 Direction: The coefficient’s sign denotes the direction of the relationship. If the value is
positive, there is a positive correlation between the two variables, which means that if one
variable rises, the other will also rise. If the value is negative, there is a negative correlation,
which suggests that when one variable increases, the other will fall.
 Range: The coefficient’s content varies from -1 to 1. The perfect linear relationship is
represented by several -1, the absence of a linear relationship is represented by 0, and an
ideal linear relationship is denoted by a value of 1.
 Independence: The Pearson correlation coefficient quantifies how linearly dependent two
variables are but does not imply causality. There is no guarantee that a strong correlation
indicates a cause-and-effect connection.
 Linearity: The Pearson correlation coefficient only assesses linear relationships between
variables. The coefficient could be insufficient to describe non-linear connections fully.
 Sensitivity to Outliers: Outliers in the data might influence the correlation coefficient’s
value, thereby boosting or deflating its size.

Other Types of Correlation Coefficients


Other correlation coefficients are:
 Spearman’s Rank Correlation: It’s a nonparametric indicator of rank correlation or the
statistical dependency between the ranks of two variables. It evaluates how effectively a
monotonic function can capture the connection between two variables.
 Kendall Rank Correlation: A statistic determines the ordinal relationship between two
measured values. It represents the similarity of the data orderings when ordered by each
quantity, which is a measure of rank correlation.
An image of an anti-symmetric family of copulas’ Spearman rank correlation and Kendall’s
tau are inherently odd parameter functions.

Page | 4
Advantages and Disadvantages of Covariance
Following are the advantages and disadvantages of Covariance:
Advantages
 Easy to Calculate: Calculating covariance doesn’t require any assumptions of the underlying
data distribution. Hence, it’s easy to calculate covariance with the formula given above.
 Apprehends Relationship: Covariance gauges the extent of linear association between
variables, furnishing information about the relationship’s magnitude and direction (positive
or negative).
 Beneficial in Portfolio Analysis: Covariance is typically employed in portfolio analysis to
evaluate the diversification advantages of integrating different assets.
Disadvantages
 Restricted to Linear Relationships: Covariance only gauges linear relationships between
variables and does not capture non-linear associations.
 Doesn’t Offer Relationship Magnitude: Covariance doesn’t offer a standardized estimation
of the intensity or strength of the relationship between variables.
 Scale Dependency: Covariance is affected by the variables’ measurement scales, making
comparing covariances across various datasets or variables with distinct units challenging.
Advantages and Disadvantages of Correlation
The advantages and disadvantages of correlation are as follows:
Advantages
 Determining Non-Linear Relationships: While correlation primarily estimates linear
relationships, it can also demonstrate the presence of non-linear connections, especially
when using alternative correlation standards like Spearman’s rank correlation coefficient.
 Standardized Criterion: Correlation coefficients, such as the Pearson correlation coefficient,
are standardized, varying from -1 to 1. This allows for easy comparison and interpretation of
the direction and strength of relationships across different datasets.
 Robustness to Outliers: Correlation coefficients are typically less sensitive to outliers than
Covariance, delivering a more potent standard of the association between variables.
 Scale Independencies: Correlation is not affected by the measurement scales, making it
convenient for comparing affinities between variables with distinct units or scales.
Disadvantages
 Driven by Extreme Values: Extreme values can still affect the correlation coefficient, even
though it is less susceptible to outliers than Covariance.
 Data Requirements: Correlation assumes that the data is distributed according to a
bivariate normal distribution, which may not always be accurate.
 Limited to Bivariate Analysis: Because correlation only examines the connection between
two variables simultaneously, it can only capture simple multivariate correlations.

Python code for Correlation & Covariance

Using Pandas:
import pandas as pd

# Sample data in a DataFrame


data = {
'data1': [1, 2, 3, 4, 5],

Page | 5
'data2': [5, 6, 7, 8, 9]
}

df = pd.DataFrame(data)

# Covariance calculation
covariance = df['data1'].cov(df['data2'])

# Correlation calculation
correlation = df['data1'].corr(df['data2'])

print("Covariance:", covariance)
print("Correlation:", correlation)
Output:
Covariance: 2.5
Correlation: 0.9999999999999999

R-code for Correlation & Covariance


# Sample data
data1 <- c(1, 2, 3, 4, 5)
data2 <- c(5, 6, 7, 8, 9)
# Calculate correlation coefficient
correlation <- cor(data1, data2)
print(paste("Correlation coefficient:", correlation))
# Compute covariance
covariance <- cov(data1, data2)
print(paste("Covariance:", covariance))

Output:
"Correlation coefficient: 1"
"Covariance: 2.5"

Learning Objectives:
To understand the correlation and covariance between the variables.

Conclusion/Learning outcome:
Correlation and Covariance between the variables between variables is understood and
implemented in Python and R-code.

R1 R2 R3
DOP DOS Conduction File Record Viva Voice Total Signature
5 Marks 5 Marks 5 Marks 15 Marks

Page | 6

You might also like