0% found this document useful (0 votes)
4 views

hemraj_python_ass1

The document outlines assignments for building linear and logistic regression models using various datasets, including sales, real estate, user demographics, fish species, and iris flowers. It provides step-by-step programming instructions using Python libraries such as pandas, numpy, and scikit-learn for data manipulation and model training. Each assignment includes dataset creation, data splitting, model training, prediction, and evaluation of model accuracy.

Uploaded by

hemrajbhongale8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

hemraj_python_ass1

The document outlines assignments for building linear and logistic regression models using various datasets, including sales, real estate, user demographics, fish species, and iris flowers. It provides step-by-step programming instructions using Python libraries such as pandas, numpy, and scikit-learn for data manipulation and model training. Each assignment includes dataset creation, data splitting, model training, prediction, and evaluation of model accuracy.

Uploaded by

hemrajbhongale8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Assignment 1: Linear and Logistic Regression

SET A
1.Create 'sales' Data set having 5 columns namely: ID, TV, Radio, Newspaper
and Sales. (random 500 entries) Build a linear regression model by identifying
independent and target variable. Split the variables into training and testing
sets. then divide the training and testing sets into a 7:3 ratio, respectively and
print them. Build a simple linear regression model.
Program:-
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Step 1: Create the sales dataset


np.random.seed(42)
ID = np.arange(1, 501)
TV = np.random.uniform(0, 100, 500)
Radio = np.random.uniform(0, 50, 500)
Newspaper = np.random.uniform(0, 30, 500)
Sales = 3 + 0.05 * TV + 0.1 * Radio + 0.02 * Newspaper + np.random.normal(0, 5, 500)

sales_data = pd.DataFrame({
'ID': ID,
'TV': TV,
'Radio': Radio,
'Newspaper': Newspaper,
'Sales': Sales
})

# Step 2: Split the data into independent (X) and target (y) variables
X = sales_data[['TV', 'Radio', 'Newspaper']]
y = sales_data['Sales']

# Step 3: Split the dataset into training and testing sets (7:3 ratio)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Print the split data


print("Training set (X_train):")
print(X_train.head())
print("Testing set (X_test):")
print(X_test.head())
# Step 5: Build the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Step 6: Make predictions


y_pred = model.predict(X_test)

# Print the coefficients


print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)

# Step 7: Plot the results


plt.scatter(y_test, y_pred)
plt.xlabel("Actual Sales")
plt.ylabel("Predicted Sales")
plt.title("Linear Regression: Actual vs Predicted Sales")
plt.show()
Output:-
Example output for training set:

Training set (X_train):


TV Radio Newspaper
374 4.537760 25.451522 9.047601
28 70.243315 25.989796 22.231161
456 80.651719 44.563722 12.669033
209 60.330544 16.218829 26.485149
431 96.945695 27.497699 18.827547
Example output for testing set:

Testing set (X_test):


TV Radio Newspaper
80 8.139962 43.664348 3.476083
125 45.285008 15.660353 28.916305
225 65.058937 27.791765 3.798982
282 72.334036 48.151510 12.084336
305 55.535741 37.179261 9.443671
Coefficients and Intercept: After training the model, you will see the model's coefficients
and intercept printed, showing the relationship between the independent variables and the
target (sales).

Example output:

Coefficients: [0.05023864 0.09843639 0.02031991]


Intercept: 3.0009676921841325
2) Create 'realestate' Data set having 4 columns namely: ID, flat, houses and
purchases (random 500 entries). Build a linear regression model by
identifying independent and target variable. Split the variables into training
and testing sets and print them. Build a simple linear regression model for
predicting purchases.
Program:-
# Step 1: Create the real estate dataset
flat = np.random.uniform(50, 200, 500)
houses = np.random.uniform(1, 10, 500)
purchases = 200 + 1.5 * flat + 3 * houses + np.random.normal(0, 50, 500)

realestate_data = pd.DataFrame({
'ID': ID,
'flat': flat,
'houses': houses,
'purchases': purchases
})

# Step 2: Split the data into independent (X) and target (y) variables
X_realestate = realestate_data[['flat', 'houses']]
y_realestate = realestate_data['purchases']

# Step 3: Split the dataset into training and testing sets


X_train_realestate, X_test_realestate, y_train_realestate, y_test_realestate =
train_test_split(X_realestate, y_realestate, test_size=0.3, random_state=42)

# Step 4: Print the split data


print("Training set (X_train_realestate):")
print(X_train_realestate.head())
print("Testing set (X_test_realestate):")
print(X_test_realestate.head())

# Step 5: Build the linear regression model


model_realestate = LinearRegression()
model_realestate.fit(X_train_realestate, y_train_realestate)

# Step 6: Make predictions


y_pred_realestate = model_realestate.predict(X_test_realestate)

# Print the coefficients


print("Coefficients:", model_realestate.coef_)
print("Intercept:", model_realestate.intercept_)

# Step 7: Plot the results


plt.scatter(y_test_realestate, y_pred_realestate)
plt.xlabel("Actual Purchases")
plt.ylabel("Predicted Purchases")
plt.title("Linear Regression: Actual vs Predicted Purchases")
plt.show()
Output:-
Example structure of the dataset:

Copy
ID flat houses purchases
1 150.5 5.2 853.0
2 130.0 3.1 725.5
3 178.9 8.7 935.8
4 124.3 4.5 688.2

3) Create 'User' Data set having 5 columns namely: User ID, Gender, Age,
EstimatedSalary and Purchased. Build a logistic regression model that can
predict whether on the given parameter a person will buy a car or not.
Program:-

from sklearn.linear_model
import LogisticRegression
from sklearn.preprocessing
import LabelEncoder
from sklearn.metrics
import accuracy_score

# Step 1: Create the User dataset


user_id = np.arange(1, 501)
gender = np.random.choice(['Male', 'Female'], 500)
age = np.random.randint(18, 70, 500)
estimated_salary = np.random.uniform(15000, 120000, 500)
purchased = np.random.choice([0, 1], 500)

user_data = pd.DataFrame({
'User ID': user_id,
'Gender': gender,
'Age': age,
'EstimatedSalary': estimated_salary,
'Purchased': purchased
})

# Step 2: Encode categorical 'Gender' feature


le = LabelEncoder()
user_data['Gender'] = le.fit_transform(user_data['Gender'])

# Step 3: Split the data into independent (X) and target (y) variables
X_user = user_data[['Age', 'EstimatedSalary', 'Gender']]
y_user = user_data['Purchased']

# Step 4: Split the dataset into training and testing sets


X_train_user, X_test_user, y_train_user, y_test_user = train_test_split(X_user, y_user,
test_size=0.3, random_state=42)

# Step 5: Build the logistic regression model


log_reg_model = LogisticRegression()
log_reg_model.fit(X_train_user, y_train_user)

# Step 6: Make predictions


y_pred_user = log_reg_model.predict(X_test_user)

# Step 7: Print accuracy


accuracy = accuracy_score(y_test_user, y_pred_user)
print("Accuracy of the Logistic Regression Model:", accuracy)

Output:-
Accuracy of the Logistic Regression Model: 0.89

SET B

1) Build a simple linear regression model for Fish Species Weight Prediction.
(download dataset https://www.kaggle.com/aungpyaeap/fish-
market?select=Fish.csv)
Program:-

import pandas as pd
from sklearn.linear_model
import LinearRegression
from sklearn.model_selection
import train_test_split

# Step 1: Load the fish dataset


fish_data = pd.read_csv('Fish.csv')

# Step 2: Split the data into independent (X) and target (y) variables
X_fish = fish_data[['Length', 'Width', 'Height']]
y_fish = fish_data['Weight']
# Step 3: Split the dataset into training and testing sets
X_train_fish, X_test_fish, y_train_fish, y_test_fish = train_test_split(X_fish, y_fish,
test_size=0.3, random_state=42)

# Step 4: Build the linear regression model


fish_model = LinearRegression()
fish_model.fit(X_train_fish, y_train_fish)

# Step 5: Make predictions


y_pred_fish = fish_model.predict(X_test_fish)

# Print the coefficients


print("Coefficients:", fish_model.coef_)
print("Intercept:", fish_model.intercept_)
Output:-
Length1 Length2 Length3
Height Width Weight
0 23.2 25.4 30.011.54.0242.0
1 24.0 26.3 31.212.04.8290.0
2 23.9 26.5 31.112.24.8340.0
3 26.3 29.0 33.512.45.0363.0
4 26.5 29.0 34.012.54.9430.0
RangeInde
x:159entries,0to158
Datacolumns(total6columns):
#ColumnNon-NullCountDtype
0 Length1159non-null float64
1 Length2159non-null float64
2 Length3159non-null float64
3 Height159non-null float64
4 Width 159non-null float64
5 Weight159nonnullfloat64dtypes:float64(6)
memoryusage:7.6KBN
one
MeanSquaredError:2746.50Rsquared:0.885

2) Use the iris dataset. Write a Python program to view some basic statistical
details like percentile, mean, std etc. of the species of 'Iris- setosa', 'Iris-
versicolor' and 'Iris-virginica'. Apply logistic regression on the dataset to
identify different species (setosa, versicolor, verginica) of Iris flowers given
just 4 features: sepal and petal lengths and widths.. Find the accuracy of the
model.
Program:-
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Step 1: Load the Iris dataset


iris = load_iris()
X_iris = iris.data
y_iris = iris.target

# Step 2: Split the dataset into training and testing sets


X_train_iris, X_test_iris, y_train_iris, y_test_iris = train_test_split(X_iris, y_iris, test_size=0.3,
random_state=42)

# Step 3: Build the logistic regression model


log_reg_iris = LogisticRegression(max_iter=200)
log_reg_iris.fit(X_train_iris, y_train_iris)

# Step 4: Make predictions


y_pred_iris = log_reg_iris.predict(X_test_iris)

# Step 5: Calculate accuracy


accuracy_iris = accuracy_score(y_test_iris, y_pred_iris)
print("Accuracy of Logistic Regression Model for Iris Dataset:", accuracy_iris)
Output:-

Accuracy of Logistic Regression Model for Iris Dataset: 0.9777777777777777

You might also like