0% found this document useful (0 votes)

8 views3 pages

ml2020 Pythonlab02

This document discusses linear regression and describes applying various linear regression methods to a random dataset. It introduces linear regression, imports necessary packages, downloads a house price dataset, explores the data, splits it into training and test sets, fits a linear regression model to the training set, and prints the intercept and coefficients. It then provides an exercise to generate a random dataset, apply different linear regression methods to it including scipy.Polyfit, stats.linregress, optimize.curve_fit, numpy.linalg.lstsq, statsmodels.OLS, and sklearn.LinearRegression, show the results on a graph and find the fastest

Uploaded by

VINAY U PAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views3 pages

ml2020 Pythonlab02

Uploaded by

VINAY U PAI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Birla Institute of Technology and Science, Pilani

Department of computer science & information system

BITS F464 - Machine Learning
I Semester 2020-21
28-Aug-20 Lab Sheet-02 – Linear Regression

Introduction

In statistics, linear regression is a linear approach for modeling the relationship between a scalar dependent
variable y and one or more explanatory variables (or independent variables) denoted X. The case of one
explanatory variable is called simple linear regression. For more than one explanatory variable, the process
is called multiple linear regression.

Linear regression models are often fitted using the least squares approach, but they may also be fitted in
other ways, such as by minimizing the "lack of fit" in some other norm (as with least absolute deviations
regression), or by minimizing a penalized version of the least squares loss function as in ridge regression
(L2-norm penalty) and lasso (L1-norm penalty). Conversely, the least squares approach can be used to fit
models that are not linear models. Thus, although the terms "least squares" and "linear model" are closely
linked, they are not synonymous.

Import packages and download the US house price dataset from

“https://www.kaggle.com/farhankarim1/usa-house-prices”

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

df = pd.read_csv("your dataset location")

df.head()

Check basic info on the data set

'info()' method to check the data types and number
df.info(verbose=True)

'describe()' method to get the statistical summary of the various features of the data set
df.describe(percentiles=[0.1,0.25,0.5,0.75,0.9])

'columns' method to get the names of the columns (features)

df.columns

Basic plotting and visualization on the data set

Pairplots using seaborn
sns.pairplot(df)
Distribution of price (the predicted quantity)
df['Price'].plot.hist(bins=25,figsize=(8,4))
df['Price'].plot.density()

Correlation matrix and heatmap

df.corr()
plt.figure(figsize=(10,7))
sns.heatmap(df.corr(),annot=True,linewidths=2)

Feature and variable sets

Make a list of data frame column names

l_column = list(df.columns) # Making a list out of column names
len_feature = len(l_column) # Length of column vector list
l_column

Put all the numerical features in X and Price in y, ignore Address which is string for linear regression

X = df[l_column[0:len_feature-2]]
y = df[l_column[len_feature-2]]
print("Feature set size:",X.shape)
print("Variable set size:",y.shape)

X.head()
y.head()

Test-train split
Import train_test_split function from scikit-learn
from sklearn.cross_validation import train_test_split

Create X and y train and test splits in one command using a split ratio and a random seed
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=123)

Check the size and shape of train/test splits (it should be in the ratio as per test_size parameter above)
print("Training feature set size:",X_train.shape)
print("Test feature set size:",X_test.shape)
print("Training variable set size:",y_train.shape)
print("Test variable set size:",y_test.shape)

Model fit and training

Import linear regression model estimator from scikit-learn and instantiate

from sklearn.linear_model import LinearRegression

from sklearn import metrics

lm = LinearRegression() # Creating a Linear Regression object 'lm'

Fit the model on to the instantiated object itself and Check the intercept and coefficients and put them in a
DataFrame
lm.fit(X_train,y_train) # Fit the linear model on to the 'lm' object itself i.e. no need to set this to another variable
print("The intercept term of the linear model:", lm.intercept_)
print("The coefficients of the linear model:", lm.coef_)

#idf = pd.DataFrame(data=idict,index=['Intercept'])
cdf = pd.DataFrame(data=lm.coef_, index=X_train.columns, columns=["Coefficients"])
#cdf=pd.concat([idf,cdf], axis=0)
cdf

Lab 02 Exercise (Submit the code in given time):

These are the various linear regression methods:
 Scipy.Polyfit
 Stats.linregress
 Optimize.curve_fit
 numpy.linalg.lstsq
 statsmodels.OLS
 Analytic solution using Moore-Penrose generalized inverse or simple multiplicative matrix inverse
 sklearn.linear_model.LinearRegression

Generate a random dataset of some points and apply all the above mentioned regression methods on this
dataset. Show the results on a graph. Also, find the most suitable method according to the execution time.

Lab 6 - Linear Regression and Multiple Linear Regression
No ratings yet
Lab 6 - Linear Regression and Multiple Linear Regression
12 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
Unit 5
No ratings yet
Unit 5
171 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
An Introduction to Stadistical Learning-129-140-1-8
No ratings yet
An Introduction to Stadistical Learning-129-140-1-8
8 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
PythonFile[1]
No ratings yet
PythonFile[1]
5 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Assignment 2 - LP1
No ratings yet
Assignment 2 - LP1
7 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
5 pages
DMV Unit 3 PPT_RSK_250419_125620 jfhuehiwhu
No ratings yet
DMV Unit 3 PPT_RSK_250419_125620 jfhuehiwhu
89 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
Assignment No.4 - (20-Ele-68)
No ratings yet
Assignment No.4 - (20-Ele-68)
17 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
ML Lab Record
No ratings yet
ML Lab Record
17 pages
ML Experiment No 1 Linear Regression Analysis
No ratings yet
ML Experiment No 1 Linear Regression Analysis
3 pages
SMEC ML LAB MANUAL R22
No ratings yet
SMEC ML LAB MANUAL R22
21 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
Lecture 3
No ratings yet
Lecture 3
42 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
Machine Learning With Python Algorithms
No ratings yet
Machine Learning With Python Algorithms
28 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
ML Practical File
100% (2)
ML Practical File
43 pages
ML Lab-3
No ratings yet
ML Lab-3
14 pages
ML Linear Regression Trupesh Patel
No ratings yet
ML Linear Regression Trupesh Patel
23 pages
Ml Cyber Lab
No ratings yet
Ml Cyber Lab
16 pages
Linear Regression Code
No ratings yet
Linear Regression Code
5 pages
ML Combined
No ratings yet
ML Combined
254 pages
Linear Regression - Cheatsheet
No ratings yet
Linear Regression - Cheatsheet
8 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
ML Unit
No ratings yet
ML Unit
23 pages
ml record
No ratings yet
ml record
19 pages
Lab 3 - Linear Regression
No ratings yet
Lab 3 - Linear Regression
15 pages
Lab Experiments Vi Sem-1
No ratings yet
Lab Experiments Vi Sem-1
10 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
98 pages
22UCS303 DS-Unit IV-LINEAR REGRESSION
No ratings yet
22UCS303 DS-Unit IV-LINEAR REGRESSION
19 pages
ML Lab Manual
100% (1)
ML Lab Manual
37 pages
AI lab7
No ratings yet
AI lab7
13 pages
Regression
No ratings yet
Regression
16 pages
DSBDAL_Assignment no 4
No ratings yet
DSBDAL_Assignment no 4
15 pages
lab-5-nguyenngocmaithi-20130120
No ratings yet
lab-5-nguyenngocmaithi-20130120
20 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
ML manoj
No ratings yet
ML manoj
51 pages
ML_recordjp
No ratings yet
ML_recordjp
35 pages
R22 ML Lab Manual
No ratings yet
R22 ML Lab Manual
25 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
23 pages
Experiment 1
No ratings yet
Experiment 1
17 pages
lab mannual of ML
No ratings yet
lab mannual of ML
43 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Forest Plot
No ratings yet
Forest Plot
4 pages
Statistical Tables and Formula Sheet
No ratings yet
Statistical Tables and Formula Sheet
15 pages
DA unit 2
No ratings yet
DA unit 2
51 pages
Why Study Dispersion?: Spread of The Data
No ratings yet
Why Study Dispersion?: Spread of The Data
31 pages
Kriging Interpolation
No ratings yet
Kriging Interpolation
6 pages
Chapter_11_Quasiexperimental_designs
No ratings yet
Chapter_11_Quasiexperimental_designs
22 pages
R Code For Testing Goodness of Fit, Independence and Homogeneity
No ratings yet
R Code For Testing Goodness of Fit, Independence and Homogeneity
2 pages
Tyre
No ratings yet
Tyre
3 pages
Sleep Quality & Its Psychological Correlates Among University Students in Ethiopia A Cross-Sectional Study.
No ratings yet
Sleep Quality & Its Psychological Correlates Among University Students in Ethiopia A Cross-Sectional Study.
7 pages
Random Variables and Probability Distributions
100% (1)
Random Variables and Probability Distributions
15 pages
Estimation of Mean Using Two-Auxiliary Varaible
No ratings yet
Estimation of Mean Using Two-Auxiliary Varaible
10 pages
Ch04Ex4a
No ratings yet
Ch04Ex4a
2 pages
Lab Assignment (Linear Regression)
No ratings yet
Lab Assignment (Linear Regression)
2 pages
150 Essential Data Science Questions and Answers
No ratings yet
150 Essential Data Science Questions and Answers
55 pages
Confidence Interval, Model Fitness and Prediction: S S T B
No ratings yet
Confidence Interval, Model Fitness and Prediction: S S T B
8 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Da Unit III
0% (1)
Da Unit III
43 pages
02c# - Guggenmos Et Al. (2018) - Custom Contrast Testing
No ratings yet
02c# - Guggenmos Et Al. (2018) - Custom Contrast Testing
23 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Coding, Data Entry, and Descriptive Analysis (1)
No ratings yet
Coding, Data Entry, and Descriptive Analysis (1)
14 pages
MALONES Problem Set2 Non Parametric Statistics
No ratings yet
MALONES Problem Set2 Non Parametric Statistics
7 pages
Hypothesis Testing: Cee 3040 - Uncertainty Analysis in Engineering
No ratings yet
Hypothesis Testing: Cee 3040 - Uncertainty Analysis in Engineering
1 page
Aldi PPT 1w2
No ratings yet
Aldi PPT 1w2
17 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
27 pages
Ncism PG S1 Ay-Bs
No ratings yet
Ncism PG S1 Ay-Bs
91 pages
Lesson 2.1–Describing Location in a Distribution
No ratings yet
Lesson 2.1–Describing Location in a Distribution
9 pages
Assignment I: Statistics For Agricultural Research VAR3013 Second Semester 2017/2018 Lecturer
No ratings yet
Assignment I: Statistics For Agricultural Research VAR3013 Second Semester 2017/2018 Lecturer
3 pages
Megastat Getting Started Guide
No ratings yet
Megastat Getting Started Guide
47 pages
A Stochastic Frontier Model With Correction For Sample Selection
No ratings yet
A Stochastic Frontier Model With Correction For Sample Selection
10 pages
Econ-103-2023-MidTerm-Practice2_with_solutions
No ratings yet
Econ-103-2023-MidTerm-Practice2_with_solutions
12 pages

ml2020 Pythonlab02

Uploaded by

ml2020 Pythonlab02

Uploaded by

Birla Institute of Technology and Science, Pilani

Department of computer science & information system

Import packages and download the US house price dataset from

df = pd.read_csv("your dataset location")

Check basic info on the data set

'columns' method to get the names of the columns (features)

Basic plotting and visualization on the data set

Correlation matrix and heatmap

Feature and variable sets

Make a list of data frame column names

Model fit and training

from sklearn.linear_model import LinearRegression

lm = LinearRegression() # Creating a Linear Regression object 'lm'

Lab 02 Exercise (Submit the code in given time):

You might also like