0% found this document useful (0 votes)
24 views21 pages

ml exp 3-7 manuval

MLP MCA

Uploaded by

238x1f00c3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views21 pages

ml exp 3-7 manuval

MLP MCA

Uploaded by

238x1f00c3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

EXPERIMENT-3

Implement Python program to prepare plots such as bar plot, histogram, distribution plot, box
plot,scatter plot

Import matplotlib.pyplot as plt

import seaborn as sns

import numpy as np

# Generate some example data

np.random.seed(0)

data = np.random.randn(1000) # Normally distributed data

categories = ['A', 'B', 'C', 'D']

values = [10, 20, 15, 30]

# 1. Bar Plot

plt.figure(figsize=(10, 6))

plt.bar(categories, values, color='skyblue')

plt.title('Bar Plot')

plt.xlabel('Categories')

plt.ylabel('Values')

plt.show()

# 2. Histogram

plt.figure(figsize=(10, 6))

plt.hist(data, bins=30, color='salmon', edgecolor='black')

plt.title('Histogram')

plt.xlabel('Value')
plt.ylabel('Frequency')

plt.show()

# 3. Distribution Plot (using seaborn)

plt.figure(figsize=(10, 6))

sns.histplot(data, kde=True, color='purple')

plt.title('Distribution Plot')

plt.xlabel('Value')

plt.ylabel('Density')

plt.show()

# 4. Box Plot

plt.figure(figsize=(10, 6))

sns.boxplot(x=data, color='lightgreen')

plt.title('Box Plot')

plt.xlabel('Value')

plt.show()

# 5. Scatter Plot

# Generating example scatter data

x = np.random.rand(100)

y = np.random.rand(100) * 100

plt.figure(figsize=(10, 6))

plt.scatter(x, y, color='orange', alpha=0.5)


plt.title('Scatter Plot')

plt.xlabel('X Axis')

plt.ylabel('Y Axis')

plt.show()

Explanation:

x axis and values on the y-axis.


1. Bar Plot: Creates a bar plot with categories on the x-axis y The
bars are colored sky blue.
2. Histogram: Displays the distribution of data using bins. Each bin shows the frequency of
data points within
in its range.
3. Distribution Plot: Combines a histogram with a Kernel Density Estimate (KDE) to
visualize the distribution of the data.
4. Box Plot: Shows the spread of data based on quartiles and highlights any outliers.
5. Scatter Plot: Plots individual data points on a Cartesian plane to show relationships
between two variables.
EXPERIMENT-4
Implement Simple Linear regression algorithm in Python Implement Gradient Descent algorithm for
the above linear regression model

import numpy as np

import matplotlib.pyplot as plt

# Generate synthetic data

np.random.seed(42)

X = 2 * np.random.rand(100, 1) # 100 data points

y = 4 + 3 * X + np.random.randn(100, 1) # Linear relationship with noise

# Plot the synthetic data

plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)

plt.scatter(X, y, color='blue', alpha=0.6, edgecolors='w', s=100)

plt.title('Synthetic Data for Linear Regression')

plt.xlabel('X')

plt.ylabel('y')

# Define the cost function

def compute_cost(X, y, theta):

m = len(y)

predictions = X @ theta

cost = (1 / (2 * m)) * np.sum(np.square(predictions - y))

return cost
# Implement gradient descent

def gradient_descent(X, y, theta, learning_rate, iterations):

m = len(y)

cost_history = np.zeros(iterations)

for i in range(iterations):

gradients = (1 / m) * X.T @ (X @ theta - y)

theta = theta - learning_rate * gradients

cost_history[i] = compute_cost(X, y, theta)

return theta, cost_history

# Add a column of ones to X for the intercept term

X_b = np.c_[np.ones((X.shape[0], 1)), X] # Add x0 = 1 to each instance

theta_initial = np.random.randn(2, 1) # Random initialization

learning_rate = 0.1

iterations = 1000

# Perform gradient descent

theta_optimal, cost_history = gradient_descent(X_b, y, theta_initial, learning_rate, iterations)

# Plot cost function history

plt.subplot(1, 2, 2)

plt.plot(cost_history)

plt.title('Cost Function History')


plt.xlabel('Iterations')

plt.ylabel('Cost')

# Show plots

plt.tight_layout()

plt.show()

# Plotting the best-fit line

plt.figure(figsize=(12, 6))

plt.scatter(X, y, color='blue', alpha=0.6, edgecolors='w', s=100)

x_values = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)

X_b_values = np.c_[np.ones((x_values.shape[0], 1)), x_values]

y_values = X_b_values @ theta_optimal

plt.plot(x_values, y_values, color='red', linewidth=2)

plt.title('Linear Regression Fit')

plt.xlabel('X')

plt.ylabel('y')

plt.show()

# Display the optimal parameters

print(f"Optimal parameters: \n{theta_optimal}")


Explanation:

1. Data Generation:
o Creates synthetic data with a linear relationship plus noise.
o Plots the synthetic data in the first subpl
subplot.
2. Cost Function:
o Computes the cost (mean squared error) for given parameters.
3. Gradient Descent:
o Updates the parameters iteratively to minimize the cost function.
o Tracks the cost history for visualization.
4. Visualization:
o Plots the cost function history to show convergence.
o Plots the synthetic data and the fitted regression line.
5. Output:
o Prints the optimal parameters learned by the model.
Experiment 5:
Implement Multiple linear regression algorithm using Python.

import numpy as np

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

from sklearn.preprocessing import StandardScaler

# Generate synthetic data

np.random.seed(42)

X1 = 2 * np.random.rand(100, 1)

X2 = 3 * np.random.rand(100, 1)

X = np.hstack([X1, X2])

y = 4 + 3 * X1 + 5 * X2 + np.random.randn(100, 1) # Linear relationship with noise

# Standardize features

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Add a column of ones to X for the intercept term

X_b = np.c_[np.ones((X_scaled.shape[0], 1)), X_scaled] # Add x0 = 1 to each instance

# Compute the parameters using the Normal Equation

def normal_equation(X, y):

return np.linalg.inv(X.T @ X) @ X.T @ y


theta_optimal = normal_equation(X_b, y)

# Display the optimal parameters

print(f"Optimal parameters: \n{theta_optimal}")

# Predicting new values

def predict(X, theta):

X_b = np.c_[np.ones((X.shape[0], 1)), X] # Add x0 = 1 to each instance

return X_b @ theta

# Plotting the fitted plane

fig = plt.figure(figsize=(12, 8))

ax = fig.add_subplot(111, projection='3d')

X1_range = np.linspace(X1.min(), X1.max(), 10)

X2_range = np.linspace(X2.min(), X2.max(), 10)

X1_grid, X2_grid = np.meshgrid(X1_range, X2_range)

X_grid = np.c_[X1_grid.ravel(), X2_grid.ravel()]

X_grid_scaled = scaler.transform(X_grid)

X_grid_b = np.c_[np.ones((X_grid_scaled.shape[0], 1)), X_grid_scaled]

y_grid = predict(X_grid_scaled, theta_optimal).reshape(X1_grid.shape)

# Plot data points

ax.scatter(X1, X2, y, color='blue', alpha=0.6, edgecolors='w', s=100, label='Data points')

# Plot the plane


ax.plot_surface(X1_grid, X2_grid, y_grid, color='red', alpha=0.5)

ax.set_xlabel('X1')

ax.set_ylabel('X2')

ax.set_zlabel('y')

ax.set_title('Multiple Linear Regression Fit')

plt.show()
Experiment 6:
Implement Python Program to build logistic regression and decision tree models using the Python
package statsmodel and sklearn APIs

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import classification_report, confusion_matrix

from sklearn.tree import DecisionTreeClassifier

import statsmodels.api as sm

# 1. Generate or Load Data

# Generate a synthetic dataset

X, y = make_classification(n_samples=200, n_features=5, n_informative=3, n_redundant=2,


random_state=42)

# Convert to DataFrame for ease of use with statsmodels

data = pd.DataFrame(X, columns=[f'Feature_{i}' for i in range(X.shape[1])])

data['Target'] = y

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 2. Logistic Regression using statsmodels


# Add intercept term to the features

X_train_sm = sm.add_constant(X_train)

# Fit the model

logit_model = sm.Logit(y_train, X_train_sm)

logit_result = logit_model.fit()

# Display the summary

print(logit_result.summary())

# Predict on test set

X_test_sm = sm.add_constant(X_test)

y_pred_proba = logit_result.predict(X_test_sm)

y_pred = (y_pred_proba > 0.5).astype(int)

# Evaluate Logistic Regression

print("Logistic Regression Performance:")

print(confusion_matrix(y_test, y_pred))

print(classification_report(y_test, y_pred))

# 3. Decision Tree using sklearn

# Feature scaling

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)
# Initialize and train the Decision Tree model

dt_model = DecisionTreeClassifier(random_state=42)

dt_model.fit(X_train_scaled, y_train)

# Predict on test set

y_pred_dt = dt_model.predict(X_test_scaled)

# Evaluate Decision Tree

print("Decision Tree Performance:")

print(confusion_matrix(y_test, y_pred_dt))

print(classification_report(y_test, y_pred_dt))

# Optional: Visualize Decision Tree

from sklearn import tree

plt.figure(figsize=(20,10))

tree.plot_tree(dt_model, filled=True, feature_names=[f'Feature_{i}' for i in range(X.shape[1])])

plt.title("Decision Tree Visualization")

plt.show()
Output:
Experiment 7:
Implement Python Program to perform the activities such as - splitting the data set into training and
validation datasets - building model using Python package on training dataset and test on the
validation dataset

import numpy as np

import pandas as pd

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LogisticRegression

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import classification_report, confusion_matrix

# 1. Generate or Load Data

# Generate a synthetic classification dataset

X, y = make_classification(n_samples=200, n_features=5, n_informative=3, n_redundant=2,


random_state=42)

# 2. Split the Data

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.3, random_state=42)

# 3. Feature Scaling (important for models like logistic regression)

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_val_scaled = scaler.transform(X_val)
# 4. Build and Train the Model

# Logistic Regression

logistic_model = LogisticRegression(random_state=42)

logistic_model.fit(X_train_scaled, y_train)

# Predict on validation data

y_pred_logistic = logistic_model.predict(X_val_scaled)

# Evaluate Logistic Regression

print("Logistic Regression Performance:")

print(confusion_matrix(y_val, y_pred_logistic))

print(classification_report(y_val, y_pred_logistic))

# Decision Tree

decision_tree_model = DecisionTreeClassifier(random_state=42)

decision_tree_model.fit(X_train, y_train)

# Predict on validation data

y_pred_tree = decision_tree_model.predict(X_val)

# Evaluate Decision Tree

print("Decision Tree Performance:")

print(confusion_matrix(y_val, y_pred_tree))

print(classification_report(y_val, y_pred_tree))
Explanation:

1. Data Generation:
o A synthetic dataset is created with make_classification.
2. Data Splitting:
o train_test_split divides the data into training and validation sets. Here, 70% of
the data is used for training and 30% for validation.
3. Feature Scaling:
o StandardScaler is used to standardize features for the logistic regression model.
Scaling is not required for the decision tree but is important for models
sensitive to feature scales.
4. Model Building and Training:
o Logistic Regression: Model is trained on the scaled training data.
o Decision Tree: Model is trained on the original training data.
5. Model Evaluation:
o Logistic Regression: Predictions are made on the scaled validation set, and
performance metrics are printed.
o Decision Tree: Predictions are made on the validation set, and performance
metrics are printed.

output:
Logistic Regression Performance
Logistic Regression Performance:
[[30 5]
[ 7 18]]
precision recall f1-score support

0 0.81 0.86 0.84 35


1 0.78 0.72 0.75 25

accuracy 0.79 60
macro avg 0.79 0.79 0.79 60
weighted avg 0.79 0.79 0.79 60

Decision Tree Performance

Decision Tree Performance:


[[29 6]
[ 4 21]]
precision recall f1-score support

0 0.88 0.83 0.85 35


1 0.78 0.84 0.81 25

accuracy 0.82 60
macro avg 0.83 0.83 0.83 60
weighted avg 0.83 0.82 0.82 60

You might also like