PANIPAT INSTITUTE OF ENGINEERING TECHNOLOGY Machine learnin withPython
INDEX
Date Sign
S.No. Program Name
10
Department of B. TECH CSE-AI & DS Page 2 Satyam
PANIPAT INSTITUTE OF ENGINEERING TECHNOLOGY Machine learning withPython
Program 4
Aim:
Reading from a CSV file of any dataset using pandas library.
Objectives:
The program aims to showcase how to read data from a CSV file using the pandas library.
Key Points:
CSV Reading with Pandas: Demonstrate reading and manipulating data from a CSV file using the pandas
library.
Code:
import pandas as pd
data = pd.read_csv("C:/Users/Downloads/Salary_Data.csv")
print(data)
Output:
Department of B. TECH CSE-AI & DS Page 10 Satyam
PANIPAT INSTITUTE OF ENGINEERING TECHNOLOGY Machine learning withPython
Program 5
Aim:
Using matplotlib,visualize the simulated data with suitable statistical measures.
Objectives:
1. Simulate and Analyze Data: Generate simulated data using a statistical distribution and compute key
statistical measures to summarize its characteristics.
2. Visualize Data Effectively: Create visualizations that clearly display the distribution of the data along with
relevant statistical measures.
Key Points
1. Data Simulation: Use NumPy to create a dataset from a chosen distribution (e.g., normal distribution) to
represent the data accurately.
2. Statistical Measures: Calculate and display the mean, median, and standard deviation to provide insights
into the data’s central tendency and variability.
3. Visual Representation: Employ Matplotlib to create visualizations such as histograms and box plots,
overlaying statistical measures to enhance understanding of the data distribution.
Code:
# for reading dataset
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = pd.read_csv("C:/Users/Downloads/bigmac.csv")
data.head(7)
Output:
Department of B. TECH CSE-AI & DS Page 11 Satyam
PANIPAT INSTITUTE OF ENGINEERING TECHNOLOGY Machine learning withPython
Code:
# for making plots
data = pd.read_csv("C:/Users/Downloads/bigmac.csv")
plt.figure(figsize=(8, 9))
plt.subplot(3, 2, 1)
plt.hist(data["GDP_dollar"], bins=100, edgecolor='white')
plt.title("GDP in Dollar", fontsize=16)
plt.xlabel("GDP", fontsize=14)
plt.ylabel("Frequency", fontsize=14)
plt.subplot(3, 2, 2)
plt.violinplot(data["GDP_dollar"])
plt.title("Violin Plot of GDP in Dollars", fontsize=16)
plt.ylabel("GDP in Dollars", fontsize=14)
plt.subplot(3, 2, 3)
plt.boxplot(data["GDP_dollar"])
plt.title("Box Plot of GDP in Dollars", fontsize=16)
plt.ylabel("GDP in Dollars", fontsize=14)
top_n = 10
top_countries = data.nlargest(top_n, 'GDP_dollar')
plt.subplot(3, 2, 4)
plt.bar(top_countries['Country'], top_countries['GDP_dollar'], color='orange')
plt.title(f"Top {top_n} Countries by GDP in Dollars", fontsize=16)
plt.xlabel("Country", fontsize=14)
plt.ylabel("GDP in Dollars", fontsize=14)
plt.xticks(rotation=45, ha='right')
plt.subplot(3, 2, 5)
plt.scatter(data["GDP_dollar"], data["dollar_price"], alpha=0.5, color='b')
plt.title("Scatter Plot of GDP vs Dollar Price", fontsize=16)
plt.xlabel("GDP in Dollars", fontsize=14)
plt.ylabel("Dollar Price", fontsize=14)
plt.tight_layout()
plt.show()
Department of B. TECH CSE-AI & DS Page 12 Satyam
PANIPAT INSTITUTE OF ENGINEERING TECHNOLOGY Machine learning withPython
Output:
Department of B. TECH CSE-AI & DS Page 13 Satyam
PANIPAT INSTITUTE OF ENGINEERING TECHNOLOGY Machine learning withPython
Program 6
Aim:
Generate simulated data from python, apply simple linearregression analysis.
Objectives:
1. Simulate Data: Create synthetic datasets for simple linear regression scenarios.
2. Perform Regression Analysis: Apply simple linear regression for one dependent variable and one
independent variable
Key Points:
1. Data Generation: Use NumPy to create a simple linear relationship y=mx+by = mx + by=mx+b with some
random noise for the simple regression
2. Model Fitting: Utilize the statsmodels library to fit simple linear regression models, which provides easy
access to estimated parameters.
3. Retrieve Parameters: Access the estimated coefficients and intercepts from the fitted models using the
relevant attributes, typically found in the model's params property.
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
df = pd.read_csv('Salary_Data.csv')
print(df.head())
X_simple = df[['YearsExperience']]
y_simple = df['Salary']
X_train_simple, X_test_simple, y_train_simple, y_test_simple = train_test_split(X_simple, y_simple, test_size=0.2,
random_state=42)
model_simple = LinearRegression()
model_simple.fit(X_train_simple, y_train_simple)
y_pred_simple = model_simple.predict(X_test_simple)
mse_simple = mean_squared_error(y_test_simple, y_pred_simple)
r2_simple = r2_score(y_test_simple, y_pred_simple)
print("Simple Linear Regression - Mean Squared Error:", mse_simple)
print("Simple Linear Regression - R-squared:", r2_simple)
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.scatter(y_test_simple, y_pred_simple)
plt.xlabel("True Values")
plt.ylabel("Predictions")
plt.title("Simple Linear Regression: True vs Predicted Values")
plt.plot([min(y_test_simple), max(y_test_simple)], [min(y_test_simple), max(y_test_simple)], color='red') # Ideal
line
plt.tight_layout()
plt.show()
Department of B. TECH CSE-AI & DS Page 14 Satyam
PANIPAT INSTITUTE OF ENGINEERING TECHNOLOGY Machine learning withPython
Output:
Department of B. TECH CSE-AI & DS Page 15 Satyam
PANIPAT INSTITUTE OF ENGINEERING TECHNOLOGY Machine learning withPython
Program 7
Aim:
Demonstrate the creation and visualisation of a confusion matrix to evaluate the performance of a classification
model
Objectives:
1. To calculate the confusion matrix for a binary classification task ("Spam" vs "Not Spam") and understand
how well a model's predictions align with the actual outcomes.
2. To visualize the confusion matrix using matplotlib and ConfusionMatrixDisplay, allowing for a clear
interpretation of model performance by displaying the true positives, true negatives, false positives, and
false negatives.
Key Points:
1. Confusion Matrix Interpretation: The confusion matrix provides a comprehensive view of how well the
model is classifying each category. In this case, it helps evaluate the accuracy of "Spam" and "Not Spam"
predictions.
2. Visualization of Model Accuracy: By using a color-coded plot, the confusion matrix visually highlights the
number of correct and incorrect predictions, which aids in understanding the model's performance at a
glance.
Code:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
# Actual and predicted labels
y_actual = ["Spam", "Spam", "Not Spam", "Not Spam"]
y_pred = ["Spam", "Spam", "Not Spam", "Not Spam"]
#Create the confusion matrix
cm = confusion_matrix(y_actual, y_pred, labels=["Spam", "Not Spam"])
# Display the confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=["Spam", "Not Spam"])
disp.plot(cmap="Blues")
plt.title("Confusion Matrix - Perfect Prediction")
plt.show()
Output:
Department of B. TECH CSE-AI & DS Page 16 Satyam
PANIPAT INSTITUTE OF ENGINEERING TECHNOLOGY Machine learning withPython
Program 8
Aim:
Write a python code to implement and evaluate a Support vector machine Classifier on a dataset to predict the
target variablebased on the provided features.
Objectives:
1. To preprocess the dataset by handling missing values and encoding categorical features to make the data
suitable for training an SVM model.
2. To train and evaluate a Support Vector Machine (SVM) classifier on the preprocessed dataset to predict
student absences, and assess the model’s performance using accuracy.
Key Points:
Data Preprocessing:
One-hot encoding is applied to categorical features, converting them into numerical format.
Missing value imputation is done by filling in missing values with the mean of each column using
SimpleImputer.
Support Vector Machine (SVM):
SVM is a supervised machine learning algorithm used for classification tasks. It tries to find the hyperplane
that best separates the classes in the feature space. Here, it's used to classify the "absences" based on the
available student data.
Model Evaluation:
The model's performance is evaluated using accuracy, which measures the proportion of correct
predictions made on the test set.
Code:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target
# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train an SVM model
model = SVC()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate and print accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')
Output:
Department of B. TECH CSE-AI & DS Page 17 Satyam
PANIPAT INSTITUTE OF ENGINEERING TECHNOLOGY Machine learning withPython
Program 9
Aim:
Write a python code to implement and evaluate a Decision Tree Classifier on a dataset to predict the target
variablebased on the provided features.
Objectives:
1. To preprocess the dataset by encoding categorical features into numerical values so that the model can
process them.
2. To train a Decision Tree Classifier on the preprocessed data and evaluate its performance using accuracy.
Key Points:
1. Decision Tree Classifier is a supervised machine learning model used for classification tasks. It works by
splitting the data into subsets based on feature values to make predictions.
2. Label Encoding is an important step for handling categorical data, where each unique category is assigned a
numerical value to make it interpretable by the machine learning algorithm.
Code:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# Load the Wine dataset
wine = load_wine()
X = wine.data
y = wine.target
# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create a Decision Tree model
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict on the test set
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
Output:
Department of B. TECH CSE-AI & DS Page 18 Satyam