0% found this document useful (0 votes)
33 views

ML Lab Manual Final

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

ML Lab Manual Final

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

EXPERIMENT NO:1

CREATE A DATAFRAME AND DEMONSTRATE DIFFERENT


WAYS TO TREAT MISSING VALUES

AIM:
Create a DataFrame and Demonstrate Different ways to treat the missing
values.

DESCRIPTION:
Create a Sample DataFrame: First, we will create a sample DataFrame
representing student data. This DataFrame will intentionally have some
missing values.
Demonstrate Different Methods to Treat Missing Values: We will showcase
several methods, such as filling missing values with a default value,
dropping rows or columns with missing values, and imputing missing
values based on statistical measures like mean or median.
PROGRAM:
import pandas as pd
import numpy as np

# Step 1: Create a sample DataFrame with missing values

data={
'Name': ['siri', 'sony', None, 'Dini', 'Bhuvi'],
'Age': [22, 23, 14, None, 23],
'Grade': [81, 92, None, 75, 92]
}
df = pd.DataFrame(data)
print("Original DataFrame with Missing Values:") print(df)

# Step 2: Different methods to treat missing values

• Method 1: Fill missing values with a default value

df_filled = df.fillna('Unknown')
print("\nDataFrame after filling missing values with 'Unknown':")
print(df_filled)
• Method 2: Drop rows with any missing values

df_dropped_rows = df.dropna()
print("\nDataFrame after dropping rows with missing values:")
print(df_dropped_rows)

• Method 3: Drop columns with any missing values

df_dropped_columns = df.dropna(axis=1)
print("\nDataFrame after dropping columns with missing values:")
print(df_dropped_columns)

• Method 4: Fill missing values with mean (for numerical columns)

df_filled_mean = df.copy() df_filled_mean[‘Age']=


df_filled_mean['Age'].fillna(df_filled_mean['Age'].mean())
df_filled_mean[‘Grade']=
df_filled_mean['Grade'].fillna(df_filled_mean['Grade'].mean()) print("\
nDataFrame after filling missing numerical values with mean:")
print(df_filled_mean)

• Method 5: Forward fill (use previous value to fill missing values)

df_forward_fill = df.fillna(method='ffill')
print("\nDataFrame after forward filling missing values:") print(df_forward_fill)

• Method 6: Backward fill (use next value to fill missing values)

df_backward_fill = df.fillna(method='bfill')
print("\nDataFrame after backward filling missing values:")
print(df_backward_fill)

OUTPUT:

Original DataFrame with Missing Values:


Name Age Grade
0 siri 22.0 81.0
1 sony 23.0 92.0
2 None 14.0 NaN 3
3 Dini NaN 75.0
4 Bhuvi 23.0 92.0

DataFrame after filling missing values with 'Unknown':


Name Age Grade
0 siri 22.0 81.0
1 sony 23.0 92.0
2 Unknown 14.0 Unknown
3 Dini Unknown 75.0
4 Bhuvi 23.0 92.0

DataFrame after dropping rows with missing values:


Name Age Grade
0 siri 22.0 81.0
1 sony 23.0 92.0
4 Bhuvi 23.0 92.0

DataFrame after dropping columns with missing values:


Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4]

DataFrame after filling missing numerical values with mean:


Name Age Grade
0 siri 22.0 81.00
1 sony 23.0 92.00
2 None 14.0 86.75
3 Dini 20.5 75.00
4 Bhuvi 23.0 92.00

DataFrame after forward filling missing values:


Name Age Grade
0 siri 22.0 88.0
1 sony 23.0 92.0
2 sony 14.0 92.0
3 Dini 14.0 75.0
4 Bhuvi 23.0 92.0

DataFrame after backward filling missing values:


Name Age Grade
0 siri 22.0 88.0
1 sony 23.0 92.0
2 Dini 14.0 75.0
3 Dini 23.0 75.0
4 Bhuvi 23.0 92.0
EXPERIMENT NO:2

IMPLEMENT DATA
WRANGLING(CONCATENATE,MERGE,GROUP) AND DATA
AGGREGATION

AIM:
Implement Data Wrangling(Concatenate,Merge,Group) and Data
Aggregation.

DESCRIPTION:

a.

Data Aggregation:
This involves summarizing data, like calculating averages or totals.
a.Concatenation: pd.concat combines df1 and df2. ignore_index=True
resets the index in the resulting DataFrame.
b.Grouping: groupby('Age') groups the data by the 'Age' column.
size() then counts the number of students in each age group.

c.Merging: pd.merge combines concatenated_df and grades_df using


the 'Student_ID' column as the key.

d.Data Aggregation: mean() calculates the average age of the students


in the concatenated DataFrame.
PROGRAM:
# To implement data wrangling(Concatenate) and data
aggregation import pandas as pd sqrs = pd.DataFrame({
"nums":[15,16,17,18,19],
"sqrs":[i**2 for i in range (15,20)]
})
cubs = pd.DataFrame({
"nums":[45,46,47,48,49],
"cubs":[i**3 for i in range(45,50)]
}) print(sqrs.to_string(),"/n")
print(cubs.to_string(),"/n") print("/nAfter
counting:/n”,pd.concat([sqrs,cubs]))
b.Write: 'DataFrame.to_csv()' for writing a DataFrame to a CSV file.

For text(.txt) files:

a. Read: Standard Python functions like 'open()' with 'read()' or 'readlines()'


for reading text files.

b. Write: Using 'open()' with the 'write()' or 'writelines()' methods for writing
to text files.

For .xls (Excel) files:

a.Read: 'pandas.read_excel()' for reading data from an Excel file into a


DataFrame.

b. Write: 'DataFrame.to_excel()' for writing a DataFrame to an Excel file.


PROGRAM:

# Write a Computer Program to Conduct Read and Write Data into


Files(.csv format) import pandas as pd df = pd.DataFrame({
"name":['Dini','jain','mary','rani'],
"pin":['201','204','208','209'],
})
df.to_csv('Student_data.csv',index=False) stdata =
pd.read_csv('Student_data.csv') print("/n The Data
Present in Stut Data is:/n”,stdata)

# Write a Computer Program to Conduct Read and Write Data into Files
(.txt format) import pandas as pd df = pd.DataFrame({
"name":['Dini','jain','mary','rani'],
"pin":['201','204','208','209'],
})
df.to_csv('Student_data.txt',index=False) stdata =
pd.read_fwf('Student_data.txt')

print("/n The Data Present in Student Data is:/n",stdata)

# Write a Computer Program to Conduct Read and Write Data into Files
(.xls format) import pandas as pd df = pd.DataFrame({
"name":['Dini','jain','mary','rani'],
"pin":['201','204','208','209'],
})
df.to_csv('Student_data.xlsx',index=False) stdata =
pd.read_excel('Student_data.xlsx') print("/n The Data Present
in Student Data is:/n”,stdata)

OUTPUT:

# Output for Read and Write Data into Files(.csv format)

# Output for Read and Write Data into Files(.txt format)

/n The Data Present in Student Data is:/n

name,pin 0
Dini,201
1 jain,204
2 mary,208
3 rani,209

# Output for Read and Write Data into Files(.xlsx format)

/n The Data Present in Student Data is:/n


PROGRAM:

#Creating DataFrame using Dictionary

import pandas as pd
di = {
"nums" :[1,2,3,4,5,6],
"sqr" :[i**2 for i in range(1,7)]
}
df1 = pd.DataFrame(di) df1

# Creating DataFrame using Tuple

l = [("a",20),("b",10),("c",30)]
df = pd.DataFrame(l) df

OUTPUT:
#Output for Creating DataFrame using Dictionary

#Output for Creating DataFrame using Tuple

EXPERIMENT NO:5

WRITE A PROGRAM TO CONDUCT INDEXING AND


SLICING
A) HEAD()
B) TAIL()
C) DESCRIBE()
D) SHAPE()
E) ITERROWS()

AIM:

A Program to Conduct Indexing and Slicing For: a)Head()


b)Tail()
c)Describe()
d)Shape()
e)Iterrows()

DESCRIPTION:

a) Head(): Use head() to quickly peek at the first few rows of a DataFrame.

b) Tail(): Utilize tail() to glance at the last few rows of a DataFrame.

c) Describe(): Call describe() to get summary statistics of numerical


columns in a DataFrame.

d) Shape(): Access the shape of a DataFrame using shape attribute,


returning a tuple of (num_rows, num_columns).

e) Iterrows(): Iterate over rows of a DataFrame with iterrows(), returning


index and row data for each iteration.

These operations provide essential insights and information about the


structure and contents of a DataFrame in Python using Pandas.

PROGRAM:

df1

OUTPUT 1:
df1.head()

OUTPUT 2:

df1.head(3)

OUTPUT 3:
df1.tail

OUTPUT 4:
EXPERIMENT NO:6

WRITE A COMPUTER PROGRAM TO COMPUTE LOC AND


ILOCFUNCTIONS IN PANDAS

AIM:
A Computer Program To Complete Loc and Loc Functions in Pandas.

DESCRIPTION:

• Import Pandas:
Start by importing the Pandas library (import pandas as pd).

• Create DataFrame:
Define your dataset as a Pandas DataFrame.

• Using loc:
Use loc to select rows and columns by labels.

Syntax: df.loc[row_labels, column_labels].

Example: df.loc[2] selects the row with label 2.

• Using iloc:
Use iloc to select rows and columns by integer indices.

Syntax: df.iloc[row_indices, column_indices].

Example: df.iloc[2] selects the row with index 2.

Output: Print or use the selected data as needed.

PROGRAM:

import pandas as pd
data = pd.DataFrame({'Brand': ['Maruti', 'Hyundai', 'Tata','Mahindra',
'Maruti', 'Hyundai', 'Renault', 'Tata', 'Maruti'],
'Year': [2012, 2014, 2011, 2015, 2012, 2016, 2014, 2018,
2019],
'Kms Driven': [50000, 30000, 60000, 25000, 10000, 46000,
31000, 15000, 12000],
'City': ['Gurgaon', 'Delhi', 'Mumbai', 'Delhi', 'Mumbai',
'Delhi', 'Mumbai', 'Chennai', 'Ghaziabad'],
'Mileage': [28, 27, 25, 26, 28, 29, 24, 21, 24]
})
print(data)
OUTPUT 1:
Import Libraries:
Import scikit-learn for machine learning and
data handling.
import numpy as np import matplotlib.pyplot as plt from
sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Sample data (Height in cm and Weight in kg)


height = np.array([150, 160, 170, 180, 190]).reshape(-1, 1) # Explanatory
variable
weight = np.array([50, 60, 70, 80, 90]) # Dependent variable

# Create a linear regression model


model = LinearRegression()

# Fit the model to the data (training) model.fit(height,


weight)

# Make predictions
weight_pred = model.predict(height)

# Print the coefficients print(f"Coefficients:


{model.coef_}")
print(f"Intercept: {model.intercept_}")

# The mean squared error and coefficient of determination (R^2)


print(f"Mean squared error: {mean_squared_error(weight, weight_pred)}")
print(f"Coefficient of determination (R^2): {r2_score(weight,
weight_pred)}")

# Plot outputs
plt.scatter(height, weight, color='black') plt.plot(height,
weight_pred, color='blue', linewidth=3) plt.xlabel('Height
(cm)') plt.ylabel('Weight (kg)') plt.show()

OUTPUT:

Coefficients: [1.]
Intercept: -100.0
Mean squared error: 0.0
Coefficient of determination (R^2): 1.0

EXPERIMENT NO:9

IMPLEMENT LINEAR REGRESSION USING PYTHON SCRIPT


AND IDENTIFY EXPLANATORY VARIABLES

AIM:
To Implement Linear Regression using Python Script and Identify Explanatory
Variables.

DESCRIPTION:

In your Python script, you’ll:


• Load data using pandas.

• Split data using train_test_split from scikit-learn.

• Create a LinearRegression model.

• Fit the model to the training data.

• Extract and analyze coefficients.

• Evaluate the model's performance.

• Optionally, visualize the relationship between variables.

PROGRAM:

import numpy as np from sklearn.linear_model import

linearRegression

#sample data

x = np.array([1, 2, 3, 4, 5)].reshape((-1,1)) y =

np.array([2, 4, 6, 8, 10])

#Create a linear regression model model

= linearRegression() #Fit the model to

the data model.fit(x,y)

#Get the coefficients (slope and intercept) slope =

model.coef_[0] intercept = model.intercept_


#print the coeficients print("Slope (Explanatory

Variable(:", slope) print("Intercept:", intercept)

OUTPUT:

Slope (Explanatory Variable): 2.0

Intercept: 0.0

EXPERIMENT:10

IMPLEMENT THE CLUSTERING TECHNIQUE FOR A GIVEN


DATASET IN PYTHON AIM:

To Implement the Clustering Technique for a given Dataset in the Python.

DESCRIPTION:

• Import Libraries: Import pandas for data handling, scikit-learn for clustering
algorithms, and matplotlib/seaborn for visualization.

• Load and Prepare Data: Load your dataset into a DataFrame and preprocess it
if necessary.

• Choose a Clustering Algorithm: Select a clustering algorithm like Kmeans,


DBSCAN, or Hierarchical Clustering based on your dataset characteristics
and requirements.

• Create and Fit the Model: Instantiate the chosen clustering algorithm and fit it
to your data.

• Predict Cluster Labels: If applicable, predict cluster labels for each data point.
• Visualize Clusters (Optional): Visualize the clustering results to understand
the data's structure better. Use scatter plots or other visualization techniques.

• Evaluate Clustering Performance (Optional): If ground truth labels are


available, evaluate the clustering performance using metrics such as silhouette
score or adjusted Rand index.

PROGRAM:

#Import necessary libraries from

sklearn.cluster import KMeans import

numpy as np import pandas as pd

#example dataset: Two-dimensional data

data = np.array([[1, 2],[1, 4], [1 ,0], [10 ,2], [10, 4], [10, 0]])

#convert the dataset to a Dataframe for better visualization df =

pd.DataFrame(data, columns=['x' ,'y']) #Initialize the KMeans

algorithm with 2 clusters kmeans = kMeans(n_clusters=2) #Fit the

model on the dataset kmeans.fit(df)

#Predict the cluster for each instance in the dataset

df['cluster'] = kmeans.predict(df) #ouput the clustering

results

print(df)
OUTPUT:
To Implement the Naive Bayesian Classifier for a sample Training
Dataset stored as .csv File.

DESCRIPTION:

Follow these steps:

• Import Libraries: Import necessary libraries such as pandas for data


manipulation and scikit-learn for machine learning algorithms.

• Load and Prepare Data: Load your dataset into a pandas DataFrame.
Preprocess the data if needed, including handling missing values, encoding
categorical variables, and splitting the data into features (independent
variables) and target variable (dependent variable).

• Choose a Naive Bayes Algorithm: Select a Naive Bayes algorithm based on


your dataset and requirements. Common options include Gaussian Naive
Bayes, Multinomial Naive Bayes, and Bernoulli Naive Bayes.

• Create and Train the Model: Instantiate the chosen Naive Bayes algorithm and
fit it to your training data.

• Make Predictions: Use the trained model to make predictions on new or


unseen data.

• Evaluate Model Performance: Assess the performance of the Naive Bayes


classifier using evaluation metrics such as accuracy, precision, recall, or F1-
score.
PROGRAM:
from sklearn.naive_bayes import GaussianNB from

sklearn.model_selection import train_test_split from

sklearn.metrics import accuracy_score import pandas as

pd df = pd.read_csv('sample_data.csv')

x = df.iloc[:,:-1] y

= df.iloc[:,-1]

x_train, x_test, y_train, y_test =


train_test_split(x,y,test_size=0.2,random_state =42) gnb =

GaussianNB()

gnb.fit(x_train,y_train) y_pred =

gnb.predict(x_test) accuracy =

accuracy_score(y_test,y_pred)

print(f’Accuracy:{accuracy:.2f}')

OUTPUT:

Accuracy = 0.83
EXPERIMENT NO:12

WRITE A PROGRAM TO IMPLEMENT THE NAIVE BAYESIAN


CLASSIFIER FOR A SAMPLE TRAINING DATASET STORED AS
A .CSV FILE. COMPUTE THE ACCURACY OF THE
CLASSIFIER,CONSIDERING FEW TEXT DATASETS.

AIM:
Program to Implement the Naive Bayesian Classifier for a Sample Training
Dataset stored as a .csv File. Compute the Accuracy of the Classifier,
Considering few Datasets.

DESCRIPTION:

• Import Libraries: Import pandas for data manipulation, scikitlearn for


machine learning algorithms, and any other necessary libraries.

• Load and Prepare Data: Load the training dataset from the .csv file into
a pandas DataFrame. Preprocess the data as needed, including handling
missing values and encoding categorical variables.

• Split Data: Split the dataset into features (independent variables) and

the target variable (dependent variable). Choose a Naive Bayes
Algorithm: Select a Naive Bayes algorithm suitable for your dataset.
Common options include Gaussian Naive Bayes, Multinomial Naive
Bayes, and Bernoulli Naive Bayes.

• Create and Train the Model: Instantiate the chosen Naive Bayes
algorithm and fit it to the training data.

• Load Test Data: Load the test dataset from .csv files into a pandas
DataFrame.

• Make Predictions: Use the trained model to make predictions on the test
data.
• Evaluate Model Performance: Compare the predicted labels with the
actual labels in the test data to compute the accuracy of the classifier.

PROGRAM:

import pandas as pd from sklearn.model_selection

import train_test_split from sklearn.naive_bayes import

GaussianNB from sklearn.metrics import accuracy_score

df = pd.read_csv('/content/iris.csv')

df.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width',


'species'] df.info()

print(df)

X = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']] y =

df['species']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42) clf = GaussianNB()
clf.fit(X_train, y_train) y_pred =

clf.predict(X_test) accuracy =

accuracy_score(y_test, y_pred)

print('Accuracy:', accuracy)

OUTPUT:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149 Data
columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal_length 150 non-null float64
1 sepal_width 150 non-null float64
2 petal_length 150 non-null float64
3 petal_width 150 non-null float64 4 species 150 non-null
object dtypes: float64(4), object memory usage: 6.0+ KB
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
.. ... ... ... … …
145 6.7 3.0 5.2 2.3 Virginica
146 6.3 2.5 5.0 1.9 Virginica
147 6.5 3.0 5.2 2.0 Virginica
148 6.2 3.4 5.4 2.3 Virginica 149 5.9
3.0 5.1 1.8 Virginica
[150 rows x 5 columns]
Accuracy: 1.0

You might also like