0% found this document useful (0 votes)
14 views

Introduction to Correlation and Regression Analysis (1) (3)

The document is a project report on statistical analysis focusing on correlation and regression analysis, submitted by students of class 12. It includes methodologies, calculations, and interpretations of data collected from a second terminal examination of class 10 students in math and science. Key findings include a correlation coefficient of 0.64 and regression coefficients indicating relationships between the two subjects.

Uploaded by

sita03476
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Introduction to Correlation and Regression Analysis (1) (3)

The document is a project report on statistical analysis focusing on correlation and regression analysis, submitted by students of class 12. It includes methodologies, calculations, and interpretations of data collected from a second terminal examination of class 10 students in math and science. Key findings include a correlation coefficient of 0.64 and regression coefficients indicating relationships between the two subjects.

Uploaded by

sita03476
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

STATISTICAL ANALYSIS

A project report submitted for the partial fulfillment of


requirement in mathematics of class 12
2025
Submitted by
SUSTHIR
LATIKA RAWAL
SHRESTHA
Namrata Baskota
Class 12

Submitted to
Krishna prasad Aryal
Department of Mathematics
Daisy English Boarding Secondary School
Page 1 of 14
Table of content

Content Page No

➢ Introduction ➢3

➢ Methodology ➢9

➢ Interpretation of data ➢ 10

➢ Calculation ➢ 11

➢ Conclusion ➢ 14

➢ Reference ➢ 14

Page 2 of 14
Introduction to Correlation and Regression Analysis
In this section we will first discuss correlation analysis, which is used to
quantify the association between two continuous variables (e.g.,
between an independent and a dependent variable or between two
independent variables). Regression analysis is a related technique to
assess the relationship between an outcome variable and one or more
risk factors or confounding variables. The outcome variable is also called
the response or dependent variable and the risk factors and
confounders are called the predictors, or explanatory or independent
variables. In regression analysis, the dependent variable is denoted "y"
and the independent variables are denoted by "x".
Correlation Analysis
In correlation analysis, we estimate a sample correlation coefficient, more
specifically the Kari Pearson Product Moment correlation coefficient.
The sample correlation coefficient, denoted r, ranges between -1 and +1 and
quantifies the direction and strength of the linear association between the two
variables.
The correlation between two variables can be positive (i.e., higher levels of one
variable are associated with higher levels of the other) or negative (i.e., higher
levels of one variable are associated with lower levels of the other).
The sign of the correlation coefficient indicates the direction of the association.
The magnitude of the correlation coefficient indicates the strength of the
association.
For example, a correlation of r = 0.9 suggests a strong, positive association
between two variables, whereas a correlation of r = -0.2 suggest a weak, negative
association. A correlation close to zero suggests no linear association between two
continuous variables.

Page 3 of 14
Fig no. 1 Fig no. 2

Fig no 3 Fig no 4

→ Fig no 1 depicts a strong positive association (r=0.9), similar to what


we might see for the correlation between infant birth weight and birth
length.
→ Fig no 2 might depict the strong negative association (r=-0.9)
generally observed between the number of hours of aerobic exercise
per week and percent body fat.
→Fig no 3 depicts a weaker association (r=0,2) that we might expect to
see between age and body mass index (which tends to increase with
age).
→ Fig no 4 might depict the lack of association (r approximately 0)
between the extent of media exposure in adolescence and age at which
adolescents initiate sexual activity.

Page 4 of 14
Methods for the Determination of Correlation:
Commonly there are three methods used to determine the correlation:
1. Scatter Plot Diagram
2. Karl Pearson Coefficient of Correlation
3. Spearman's Rank-Correaltion coefficient
Generally, we mostly use Karl Pearson Coefficient of Correlation and
spearman’s Rank-Correlation coefficient
Karl Pearson Coefficient of Correlation coefficient:

The Karl Pearson coefficient of correlation, also known as Pearson's


correlation coefficient or simply Pearson's r, is a measure of the linear
relationship between two variables. It assesses the strength and
direction of the relationship between two quantitative variables. The
Coefficient is denoted by (r).
Correlation Coefficient Formula
𝑛∑𝑋𝑌−(∑𝑋)(∑𝑌)
r=
√[𝑛∑𝑋 2 −(∑𝑋)2 ][𝑛∑𝑌 2 −(∑𝑌)2 ]

where,
n-Number of values or elements
∑X=Sum of 1st values list
∑Y=Sum of 2nd values list
∑XY = Sum of the product of 1st and 2nd values
∑𝑋 2 =Sum of squares of 1st values
∑𝑌 2 =Sum of squares of 2nd values

Page 5 of 14
Properties of Karl Pearson Coefficient of Correlation:
Karl Pearson's coefficient of correlation, commonly referred to as Pearson's
correlation coefficient or Simply Pearson's (r), possesses several important
properties:
1. Range: Pearson's (r) always falls between -1 and 1, inclusive. A value of -1
indicates a perfect negative linear relationship, O indicates no linear relationship,
and 1 indicate a perfect positive linear relationship.
2. Linearity: Pearson's (r) measures the strength of a linear relationship between
two variables. It assumes a linear relationship between the variables; therefore, it
may not accurately represent nonlinear relationships.
3. Symmetry: Pearson's (r) is symmetric, meaning that the correlation between
variable (x ) and ( y ) is the same as the correlation between variable ( y ) and ( x ).
4. Not affected by scale: Pearson's (r) is not affected by changes in the scale of
measurement of the variables. This means that multiplying all the values of one
variable by a constant or adding a constant to all values does not change the
correlation coefficient.
5. Sensitive to outliers: Pearson's (r) can be influenced by outliers in the data.
Outliers can disproportionately affect the correlation coefficient, potentially
leading to misleading interpretations.
6. Affected by range: The correlation coefficient may be affected by the range of
values in the dataset. Limited variability in the data can result in an
underestimation of the true correlation. 1.Sample dependence: The sample size
influences the reliability of Pearson's (r). Generally, larger Sample sizes provide
more accurate estimates of the population correlation.
&Does not imply causation: A high correlation coefficient does not necessarily
imply a causal relationship between the variables. Correlation only measures the
strength and direction of association, not causation.

Page 6 of 14
Regression Analysis
Regression Analysis is a method of measuring the degree of association of a set
variable called cause variables over the effect variables. Correlation can measure
only the direction of association that is positive, negative or zero association
whereas regression can measure both direction as well as degree of association.
Regression is one of the highest used data analysis used by researchers and
academicians today. Regression equation express the linear relationship between
two variables.

Formula for regression equations and coefficients


Any regression line passes through mean. If we consider thee regression line is
passing through the mean. So, that the regression equation of Y on X can be
expressed as,

Y - Y͞ =𝑏𝑌𝑋 (X - X͞),
Where,
𝑛∑𝑋𝑌−(∑𝑋)(∑𝑌)
𝑏𝑌𝑋 = is regression coefficient of Y on X
𝑛∑𝑋 2 −(∑𝑋)2

Similarly, the regression of X on Y is,


X - X͞ =𝑏𝑋𝑌 (Y - Y͞),
Where,
𝑛∑𝑋𝑌−(∑𝑋)(∑𝑌)
𝑏𝑋𝑌 = IS regression coefficient of X on Y
𝑛∑𝑌 2 −(∑𝑌)2

Properties of linear regression


➢ The regression lines always intersect at their means
➢ Regression coefficient have the same sign and independent of the origin but
not of scale
➢ If one of the regression coefficients is greater than unity, the other must be
less than 1. The product of two regression coefficient is less than or equal to

Page 7 of 14
Spearmans Rank Correlation Coefficient
Rank correlation is the degree of association between two variables when the
data are arranged in order or ranks. The data which are quantitatively measured
are by Karl Pearsons correlation coefficient and the data which are not measured
quantitatively are measured by assigning ranks. It is generally denoted by (r).
There are 3 cases while calculating (r). They are
1)when ranks are given
2)when ranks are not given
3) Repeated ranks
The formula for calculating spearman rank correlation coefficient for case I and
case II are given by
∑𝑑2
R = 1-
𝑛(𝑛2 −1)

where,
n=Number of items ranked
d= difference between paired ranks (𝑅1 − 𝑅1 )
𝑅1 = 𝑇ℎ𝑒 rank of items with respect to first variable
𝑅1 = 𝑇ℎ 𝑒 𝑟𝑎𝑛𝑘 𝑜𝑓 𝑖𝑡𝑒𝑚𝑠 𝑤𝑖𝑡ℎ 𝑟𝑒𝑠𝑝𝑒𝑐𝑡 𝑡𝑜 𝑠𝑒𝑐𝑜𝑛𝑑 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒s

And, for case III,


1 1
6[∑𝑑2 + 𝑚1 (𝑚1 2 −1)+ 𝑚2 (𝑚2 2 −1)]
12 12
R =1 −
𝑛(𝑛2 −1)

where,

𝑚1, 𝑚2,…….𝑏𝑒 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡ℎ𝑎𝑡 𝑎𝑛 𝑖𝑡𝑒𝑚 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑

Page 8 of 14
Methodology
For the statistical analysis, the data of second terminal examination of class 10 is
taken from SHANGRILA ENGLISH BOARDING SCHOOL, Rapti -6. The observation is
to find out the better and stable performance of student in two different subjects
(i.e. math’s and science). About 28 students’ data was collected. The Full Mark and
Pass Mark are 75 and 27. The marks data is given below.

S. N Name of students Math’s marks Science marks


1 Ishika Suyal 75 70
2 Suman Gurung 56 71
3 Kul Chandra Silwal 50 46
4 Aavash Tamang 60 37
5 Biraj Mahato 48 35
6 Abhinam Ranamagar 53 27
7 Diwash Thapaliya 27 29
8 Yurisha Mainali 38 28
9 Alok Chaudhary 55 30
10 Krishesh Shrestha 50 36
11 Sahara lama 39 31
12 Aayush Chaudhary 45 17*
13 Rojish Thapa 30 27
14 Sandip Shrestha 49 32

Page 9 of 14
CALCULATION:
The above data is theory marks of Students in subject (math’s
and science). Let the marks of math’s be X and science be Y. The
data is interpreted below,
S.N X Y 𝑋2 𝑌2 XY 𝑅1 𝑅2 𝑑= d2
(𝑅1 − 𝑅2 )
1 75 70 5625 4900 5250 1 2 -1 1
2 56 71 3136 5041 3976 3 1 2 4
3 50 46 2500 2116 2300 6.5 3 3.5 12.25
4 60 37 3600 1369 2220 2 4 -2 4
5 48 35 2304 1225 1680 9 6 3 9
6 53 27 2809 729 1431 5 12.5 7.5 56.25
7 27 29 729 841 783 14 10 4 16
8 38 28 1444 784 1064 12 11 1 1
9 55 30 3025 900 1650 4 9 -5 25
10 50 36 2500 1296 1800 6.5 5 1.5 2.25
11 39 31 1521 961 1209 11 8 3 9
12 45 17 2025 289 765 10 14 -4 16
13 30 27 900 729 810 13 12.5 0.5 0.25
14 49 32 2401 1024 1568 8 7 1 1
∑X=675 ∑Y=516 ∑𝑋 2 = ∑𝑌 2 = ∑XY ∑𝑑2
34519 22204 26506 =157

Page 10 of 14
At first for coefficient for correlation;
n = 15
Now,
Correlation Coefficient Formula
𝑛∑𝑋𝑌−(∑𝑋)(∑𝑌)
r=
√[𝑛∑𝑋 2 −(∑𝑋)2 ]√[𝑛∑𝑌 2 −(∑𝑌)2

14𝑋26506−(675𝑋516)
r=
√[14𝑋34519−(675)2 ]√[14𝑋22204−(516)2 ]

r= 0.64

Again, utilizing the same data for regression,


For regression coefficient,
𝑛∑𝑋𝑌−(∑𝑋)(∑𝑌)
𝑏𝑋𝑌 =
𝑛∑𝑌 2 −(∑𝑌)2
14𝑋26506−(675𝑋516)
𝑏𝑋𝑌 =
14𝑋22204−(516)2

𝑏𝑋𝑌 = 0.51
Now,
𝑛∑𝑋𝑌−(∑𝑋)(∑𝑌)
𝑏𝑌𝑋 =
𝑛∑𝑋 2 −(∑𝑋)2
14𝑋26506−(675𝑋516)
𝑏𝑌𝑋 =
14𝑋34519−(675)2

∴ 𝑏 𝑌𝑋 = 0 . 82.

Page 11 of 14
For Regression equations,
∑𝑋
Mean of X =
𝑛
675
=
14

= 48.21
∑𝑌
Mean of Y =
𝑛
516
=
14

= 36.85
Now, regression equation X on Y
or, X - X͞ =𝑏𝑋𝑌 (Y - Y͞)
Or, X – 48.21=0.51(Y – 36.85)
X =0.51Y + 29.41 is the required regression equation of X on Y

Again, equation for Y on X


or, Y - Y͞ =𝑏𝑋𝑌 (X - X͞)
or, Y - 36.85 = 0.82(X – 48.21)
Y = 0.82X -2.68 is the required regression equation of Y on X

Page 12 of 14
For spearman’s Rank correlation coefficient
∑𝑑2 =157
.𝑚1 = 2
.𝑚2 = 2
According to spearman’s Rank correlation coefficient

1 1
6[∑𝑑2 + 𝑚1 (𝑚1 2 −1)+ 𝑚2 (𝑚2 2 −1)]
12 12
R =1 − 𝑛(𝑛2 −1)
1 1
6[157+ 2(22 −1)+ 2(22 −1)]
12 12
Or, R = 1 − 14(14 2 −1)

R = 0.64
Therefore, the spearman’s Rank correlation coefficient of the give
data is 0.64.

Page 13 of 14
Conclusion
From the above calculation, the observed value of correlation between
two variable X (math’s) and Y(Science) is 0.64 which is moderately close
to 1, so we conclude that the association is moderate strong.
Hence, it indicates that the marks of mathematics increases when the
marks of science increase.
And the same result was obtained from the regression Analysis and
spearman’s Rank correlation coefficient

Reference
I noted the definition and formula used in my project from the
book of class 11 and websites.
✓ Foundation of MATHEMATICS class 11
✓ www.Wikipedia.com
✓ www.freedictionary.com
Some sentences are also of our subject teacher (Krishna
prasad Aryal) also.

signature
Krishna prasad Aryal
Subject teacher

Page 14 of 14

You might also like