ML Lab Manual Final
ML Lab Manual Final
AIM:
Create a DataFrame and Demonstrate Different ways to treat the missing
values.
DESCRIPTION:
Create a Sample DataFrame: First, we will create a sample DataFrame
representing student data. This DataFrame will intentionally have some
missing values.
Demonstrate Different Methods to Treat Missing Values: We will showcase
several methods, such as filling missing values with a default value,
dropping rows or columns with missing values, and imputing missing
values based on statistical measures like mean or median.
PROGRAM:
import pandas as pd
import numpy as np
data={
'Name': ['siri', 'sony', None, 'Dini', 'Bhuvi'],
'Age': [22, 23, 14, None, 23],
'Grade': [81, 92, None, 75, 92]
}
df = pd.DataFrame(data)
print("Original DataFrame with Missing Values:") print(df)
df_filled = df.fillna('Unknown')
print("\nDataFrame after filling missing values with 'Unknown':")
print(df_filled)
• Method 2: Drop rows with any missing values
df_dropped_rows = df.dropna()
print("\nDataFrame after dropping rows with missing values:")
print(df_dropped_rows)
df_dropped_columns = df.dropna(axis=1)
print("\nDataFrame after dropping columns with missing values:")
print(df_dropped_columns)
df_forward_fill = df.fillna(method='ffill')
print("\nDataFrame after forward filling missing values:") print(df_forward_fill)
df_backward_fill = df.fillna(method='bfill')
print("\nDataFrame after backward filling missing values:")
print(df_backward_fill)
OUTPUT:
IMPLEMENT DATA
WRANGLING(CONCATENATE,MERGE,GROUP) AND DATA
AGGREGATION
AIM:
Implement Data Wrangling(Concatenate,Merge,Group) and Data
Aggregation.
DESCRIPTION:
a.
Data Aggregation:
This involves summarizing data, like calculating averages or totals.
a.Concatenation: pd.concat combines df1 and df2. ignore_index=True
resets the index in the resulting DataFrame.
b.Grouping: groupby('Age') groups the data by the 'Age' column.
size() then counts the number of students in each age group.
b. Write: Using 'open()' with the 'write()' or 'writelines()' methods for writing
to text files.
# Write a Computer Program to Conduct Read and Write Data into Files
(.txt format) import pandas as pd df = pd.DataFrame({
"name":['Dini','jain','mary','rani'],
"pin":['201','204','208','209'],
})
df.to_csv('Student_data.txt',index=False) stdata =
pd.read_fwf('Student_data.txt')
# Write a Computer Program to Conduct Read and Write Data into Files
(.xls format) import pandas as pd df = pd.DataFrame({
"name":['Dini','jain','mary','rani'],
"pin":['201','204','208','209'],
})
df.to_csv('Student_data.xlsx',index=False) stdata =
pd.read_excel('Student_data.xlsx') print("/n The Data Present
in Student Data is:/n”,stdata)
OUTPUT:
name,pin 0
Dini,201
1 jain,204
2 mary,208
3 rani,209
import pandas as pd
di = {
"nums" :[1,2,3,4,5,6],
"sqr" :[i**2 for i in range(1,7)]
}
df1 = pd.DataFrame(di) df1
l = [("a",20),("b",10),("c",30)]
df = pd.DataFrame(l) df
OUTPUT:
#Output for Creating DataFrame using Dictionary
EXPERIMENT NO:5
AIM:
DESCRIPTION:
a) Head(): Use head() to quickly peek at the first few rows of a DataFrame.
PROGRAM:
df1
OUTPUT 1:
df1.head()
OUTPUT 2:
df1.head(3)
OUTPUT 3:
df1.tail
OUTPUT 4:
EXPERIMENT NO:6
AIM:
A Computer Program To Complete Loc and Loc Functions in Pandas.
DESCRIPTION:
• Import Pandas:
Start by importing the Pandas library (import pandas as pd).
• Create DataFrame:
Define your dataset as a Pandas DataFrame.
• Using loc:
Use loc to select rows and columns by labels.
• Using iloc:
Use iloc to select rows and columns by integer indices.
PROGRAM:
import pandas as pd
data = pd.DataFrame({'Brand': ['Maruti', 'Hyundai', 'Tata','Mahindra',
'Maruti', 'Hyundai', 'Renault', 'Tata', 'Maruti'],
'Year': [2012, 2014, 2011, 2015, 2012, 2016, 2014, 2018,
2019],
'Kms Driven': [50000, 30000, 60000, 25000, 10000, 46000,
31000, 15000, 12000],
'City': ['Gurgaon', 'Delhi', 'Mumbai', 'Delhi', 'Mumbai',
'Delhi', 'Mumbai', 'Chennai', 'Ghaziabad'],
'Mileage': [28, 27, 25, 26, 28, 29, 24, 21, 24]
})
print(data)
OUTPUT 1:
Import Libraries:
Import scikit-learn for machine learning and
data handling.
import numpy as np import matplotlib.pyplot as plt from
sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Make predictions
weight_pred = model.predict(height)
# Plot outputs
plt.scatter(height, weight, color='black') plt.plot(height,
weight_pred, color='blue', linewidth=3) plt.xlabel('Height
(cm)') plt.ylabel('Weight (kg)') plt.show()
OUTPUT:
Coefficients: [1.]
Intercept: -100.0
Mean squared error: 0.0
Coefficient of determination (R^2): 1.0
EXPERIMENT NO:9
AIM:
To Implement Linear Regression using Python Script and Identify Explanatory
Variables.
DESCRIPTION:
PROGRAM:
linearRegression
#sample data
x = np.array([1, 2, 3, 4, 5)].reshape((-1,1)) y =
np.array([2, 4, 6, 8, 10])
OUTPUT:
Intercept: 0.0
EXPERIMENT:10
DESCRIPTION:
• Import Libraries: Import pandas for data handling, scikit-learn for clustering
algorithms, and matplotlib/seaborn for visualization.
• Load and Prepare Data: Load your dataset into a DataFrame and preprocess it
if necessary.
• Create and Fit the Model: Instantiate the chosen clustering algorithm and fit it
to your data.
• Predict Cluster Labels: If applicable, predict cluster labels for each data point.
• Visualize Clusters (Optional): Visualize the clustering results to understand
the data's structure better. Use scatter plots or other visualization techniques.
PROGRAM:
data = np.array([[1, 2],[1, 4], [1 ,0], [10 ,2], [10, 4], [10, 0]])
results
print(df)
OUTPUT:
To Implement the Naive Bayesian Classifier for a sample Training
Dataset stored as .csv File.
DESCRIPTION:
• Load and Prepare Data: Load your dataset into a pandas DataFrame.
Preprocess the data if needed, including handling missing values, encoding
categorical variables, and splitting the data into features (independent
variables) and target variable (dependent variable).
• Create and Train the Model: Instantiate the chosen Naive Bayes algorithm and
fit it to your training data.
pd df = pd.read_csv('sample_data.csv')
x = df.iloc[:,:-1] y
= df.iloc[:,-1]
GaussianNB()
gnb.fit(x_train,y_train) y_pred =
gnb.predict(x_test) accuracy =
accuracy_score(y_test,y_pred)
print(f’Accuracy:{accuracy:.2f}')
OUTPUT:
Accuracy = 0.83
EXPERIMENT NO:12
AIM:
Program to Implement the Naive Bayesian Classifier for a Sample Training
Dataset stored as a .csv File. Compute the Accuracy of the Classifier,
Considering few Datasets.
DESCRIPTION:
• Load and Prepare Data: Load the training dataset from the .csv file into
a pandas DataFrame. Preprocess the data as needed, including handling
missing values and encoding categorical variables.
• Split Data: Split the dataset into features (independent variables) and
•
the target variable (dependent variable). Choose a Naive Bayes
Algorithm: Select a Naive Bayes algorithm suitable for your dataset.
Common options include Gaussian Naive Bayes, Multinomial Naive
Bayes, and Bernoulli Naive Bayes.
• Create and Train the Model: Instantiate the chosen Naive Bayes
algorithm and fit it to the training data.
• Load Test Data: Load the test dataset from .csv files into a pandas
DataFrame.
• Make Predictions: Use the trained model to make predictions on the test
data.
• Evaluate Model Performance: Compare the predicted labels with the
actual labels in the test data to compute the accuracy of the classifier.
PROGRAM:
df = pd.read_csv('/content/iris.csv')
print(df)
df['species']
clf.predict(X_test) accuracy =
accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
OUTPUT:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149 Data
columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal_length 150 non-null float64
1 sepal_width 150 non-null float64
2 petal_length 150 non-null float64
3 petal_width 150 non-null float64 4 species 150 non-null
object dtypes: float64(4), object memory usage: 6.0+ KB
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
.. ... ... ... … …
145 6.7 3.0 5.2 2.3 Virginica
146 6.3 2.5 5.0 1.9 Virginica
147 6.5 3.0 5.2 2.0 Virginica
148 6.2 3.4 5.4 2.3 Virginica 149 5.9
3.0 5.1 1.8 Virginica
[150 rows x 5 columns]
Accuracy: 1.0