0% found this document useful (0 votes)
7 views

8th MLlab

The document introduces linear regression using R, focusing on predicting the 'Evaporation Coefficient' based on 'Air Velocity' with 10 observations. It emphasizes the importance of visualizing relationships through scatter plots and checking for outliers using box plots. The goal is to establish a mathematical equation to predict the evaporation coefficient when air velocity is known.

Uploaded by

n44943916
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

8th MLlab

The document introduces linear regression using R, focusing on predicting the 'Evaporation Coefficient' based on 'Air Velocity' with 10 observations. It emphasizes the importance of visualizing relationships through scatter plots and checking for outliers using box plots. The goal is to establish a mathematical equation to predict the evaporation coefficient when air velocity is known.

Uploaded by

n44943916
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

8. Introduction to regression using R.

Velocity (cm/sec) 20,60,100,140,180,220,260,300,340,380


Evaporation Coefficient(mm2/sec) 0.18, 0.37, 0.35, 0.78, 0.56, 0.75, 1.18, 1.36, 1.17, 1.65

Introduction to Linear Regression

Linear regression is one of the most commonly used predictive modelling techniques. The aim
of linear regression is to find a mathematical equation for a continuous response variable Y
as a function of one or more X variable(s). So that you can use this regression model to predict
the Y when only the X is known. It is expressed in the equation 1.

= 1 + 2 + (1)
Where 1 is intercept, and 2 is slope, and is the error term.

Problem Specification

In the given problem „Air velocity‟, and „Evaporation Coefficient‟ are the variables with 10
observations.
The goal here is to establish a mathematical equation for „Evaporation Coefficient‟ as a function of
„Air velocity‟, so you can use it to predict „Evaporation Coefficient‟ when only the „Air velocity‟ of
the car is known. So, it is desirable to build a linear regression model with the response variable as
„Evaporation Coefficient‟ and the predictor as „Air velocity‟. Before we begin building the
regression model, it is a good practice to analyse and understand the variables.

>airvelocity<-c(20,60,100,140,180,220,260,300,340,380)
>evaporationcoefficient<-c(0.18, 0.37, 0.35, 0.78, 0.56, 0.75, 1.18, 1.36, 1.17,1.65)
>airvelocity
[1] 20 60 100 140 180 220 260 300 340 380
> evaporationcoefficient
[1] 0.18 0.37 0.35 0.78 0.56 0.75 1.18 1.36 1.17 1.65
Graphical analysis
The aim of this exercise is to build a simple regression model that you can use to predict
„Evaporation Coefficient‟. But before jumping in to the syntax, let‟s try to understand these
variables graphically.

Typically, for each of the predictors, the following plots help visualize the patterns:
Using Scatter Plot to Visualize the Relationship
Scatter plots can help visualize linear relationships between the response and predictor
variables. Ideally, if you have many predictor variables, a scatter plot is drawn for each one
of them against the response, along with the line of best fit as seen below.

>scatter.smooth(airvelocity,evaporationcoefficient,main="Airvelocity ~ Eva
poration Coefficient")

Airvelocity ~ Eva poration Coefficient


1.5
evaporationcoefficient

1.0
0.5

50 100 150 200 250 300 350

airvelocity

The scatter plot along with the smoothing line above suggests a linear and positive
relationship between the „Air Velocity‟ and „Evaporation Coefficient‟.

This is a good thing. Because, one of the underlying assumptions of linear


regression is, the relationship between the response and predictor variables is
linear and additive.

Using BoxPlot to Check for Outliers


Generally, an outlier is any datapoint that lies outside the 1.5 * inter quartile range
(IQR). IQR is calculated as the distance between the 25th percentile and 75th
percentile values for that variable (1).
# Set up the plotting area for 1 row and 2 columns
par(mfrow=c(1, 2))

# Create boxplot for airvelocity and include outliers in the subtitle


boxplot(airvelocity, main="Airvelocity",
sub=paste("Outlier rows: ",
paste(which(airvelocity %in% boxplot.stats(airvelocity)$out), collapse=", ")))

# Create boxplot for evaporationcoefficient and include outliers in the subtitle


boxplot(evaporationcoefficient, main="Distance",
sub=paste("Outlier rows: ",
paste(which(evaporationcoefficient %in% boxplot.stats(evaporationcoefficient)$out), collapse=",
")))

Airvelocity Distance
350

1.5
300
250

1.0
200
150
100

0.5
50

Outlier rows: Outlier rows:

You might also like