0% found this document useful (0 votes)

79 views

PRACTICAL5

The document describes applying a decision tree classifier to the Pima Indian diabetes dataset. It discusses decision trees and how they work, loads the diabetes dataset, splits it into training and test sets, trains a decision tree classifier on the training set, evaluates it on the test set, and outputs the accuracy score.

Uploaded by

thundergamerz403

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views

PRACTICAL5

Uploaded by

thundergamerz403

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Enrollment No :202103103510280

PRACTICAL:5

Aim: To implement principal component analysis.

Principal Component Analysis (PCA) ia a statistical technique used for reducing the
dimensionality of data while preserving its important information. It's commonly employed in
data analysis and machine learning to simplify datasets with a large number of variables into a
smaller set of derived variables called principal components.

The key steps in PCA include:

1. Standardization: Normalize the dataset to have a mean of zero and a standard
deviation of one for each variable to ensure they are on the same scale.
2. Calculation of Covariance Matrix: Determine the covariance matrix of the standardized
data, which shows the relationships between variables.
3. Eigenvalue Decomposition: Compute the eigenvectors and eigenvalues of the
covariance matrix. Eigenvectors represent the directions (principal components)
of maximum variance, and eigenvalues indicate the magnitude of variance
along these directions.
4. Selection of Principal Components: Sort the eigenvectors based on their corresponding
eigenvalues in descending order. The principal components are chosen according to the
top eigenvalues, as they explain the most variance in the data.
5. Projection: Transform the original data into the new feature space formed by the
selected principal components. This transformation reduces the dimensions while
retaining most of the information present in the original dataset.
PCA is widely used in various fields such as image processing, pattern recognition,
finance, and many others to simplify complex datasets, remove redundant information, and
facilitate further analysis or visualization of data.

Here, we are performing Principal Component Analysis (PCA) on the Iris dataset
using Python.

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

Step 1: Import necessary libraries

This code block imports necessary libraries for data visualization and dimensionality reduction
using Principal Component Analysis (PCA). It includes Matplotlib for plotting, Pandas for data
manipulation, and scikit-learn's StandardScaler for feature scaling and PCA for dimensionality
reduction.

Step 2: Load the Iris dataset

Output:

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

This code block fetches the Iris dataset from the given URL and loads it into a Pandas
DataFrame named 'df'. The dataset contains measurements of sepal length, sepal width, petal
length, and petal width for different iris flowers, with the corresponding target variable
indicating the species of each iris. The 'names' parameter assigns column names to the
DataFrame. The resulting DataFrame 'df' is then printed, displaying the tabular representation
of the Iris dataset.

Step 3: Standardize the data

Output:

In this code block, a list named 'features' is defined, containing the names of the four features in
the Iris dataset. The features (sepal length, sepal width, petal length, and petal width) are then
extracted from the previously loaded DataFrame 'df' and stored in the variable 'x'. The target
variable ('target', indicating the iris species) is extracted and stored in the variable 'y'. The
features in 'x' are then standardized using the StandardScaler from scikit-learn, ensuring that
they have a mean of 0 and a standard deviation of 1. Finally, the standardized feature values for
the first 5 rows are printed to the console using 'print(x[:5, :])'.

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

Standardizing features is a common preprocessing step in machine learning to ensure

that all features contribute equally to the analysis, particularly in methods sensitive to
the scale of input variables.

Step 4: PCA projection to 2D

‘

Output:

In this code block, a Principal Component Analysis (PCA) with two components is applied to
the standardized feature matrix 'x' using the 'PCA' class from scikit- learn. The resulting
principal components are stored in 'principalComponents'. These components are then used to

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

create a new DataFrame 'principalDf' with columns named 'principal component 1' and
'principal component 2'. The first 5 rows of 'principalDf' are printed. Additionally, the target
variable ('target') and the first 5 rows of the original DataFrame 'df' are printed to demonstrate
the correspondence between the reduced-dimensional data and the original dataset. Finally, a
new DataFrame 'finalDf' is created by concatenating 'principalDf' with the 'target' column
from the original DataFrame, providing a consolidated DataFrame that includes the principal
components along with the target variable for further analysis or visualization. This process is
often used to reduce the dimensionality of the data for visualization or modeling purposes
while retaining essential information captured by the principal components.

Step 5: Visualize 2D Projection

Output:
CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

This code block creates a scatter plot using Matplotlib to visualize the reduceddimensional
representation of the Iris dataset obtained through PCA. The figure is set to be 8x6 inches, and a
subplot is added to the figure. The x-axis and y-axis labels are set, and the title is specified.
Three target classes ('Iris-setosa', 'Iris-versicolor', 'Iris-virginica') are assigned different colors ('r'
for red, 'g' for green, 'b' for blue). For each target class, a scatter plot is generated by identifying
the corresponding indices in the 'finalDf' DataFrame and plotting the values of the first two
principal components. The size of the points is set to 50, and a legend is added to the plot
indicating the target classes. Finally, grid lines are added to enhance readability. This
visualization provides insights into the distribution and separation of the iris species in the
reduced two-dimensional space.

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

PRACTICAL:6

Aim: Write a program to apply decision tree classifier on Pima Indian

diabetes dataset.

A decision tree classifier is a machine learning model that learns to classify data by creating a
tree-like structure of rules based on the features and labels of the training data. The tree
consists of nodes and branches, where each node represents a test or a decision on a feature,
and each branch represents an outcome or a value of that feature. The leaf nodes at the bottom
of the tree contain the class labels or the predictions for the data.
The decision tree classifier works by recursively splitting the data into smaller subsets based
on the feature that best separates the classes. The feature is chosen by using a criterion such as
entropy or Gini impurity, which measures the level of disorder or uncertainty in the data. The
goal is to find the feature that maximizes the information gain or the reduction in entropy or
impurity after the split. The process stops when all the data in a subset belong to the same
class, or when a predefined limit such as the maximum depth of the tree or the minimum
number of samples in a node is reached.

The decision tree classifier can handle both numerical and categorical features, and can also
deal with missing values by assigning them to the most frequent value or the most probable
class. The decision tree classifier is easy to understand and interpret, as it provides a visual
representation of the logic behind the classification. However, it also has some drawbacks,
such as being prone to overfitting, being sensitive to noise and outliers, and being unstable due
to small changes in the data.

How does the decision tree algorithm work?

The basic idea behind any decision tree algorithm is as follows:

1. Select the best attribute using Attribute Selection Measures (ASM) to splitthe records.
2. Make that attribute a decision node and breaks the dataset into smallersubsets.
3. Start tree building by repeating this process recursively for each child until one of the
conditions will match:
• All the tuples belong to the same attribute value.
• There are no more remaining attributes.
• There are no more instances.

Here, we are applying decision tree classifier on the Pima Indian diabetes dataset using Python.

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

Step 1: Import necessary libraries

This code block imports necessary libraries for decision tree classifier. It start by importing
necessary libraries, including Pandas for data manipulation and scikit-learn for machine learning.
The DecisionTreeClassifier from scikit-learn is then imported to create a Decision Tree model.
Additionally, the train_test_split function is imported for splitting the dataset into training and
testing sets, and the metrics module is imported to evaluate the model's accuracy.

Step 2: Load the dataset

Output:

In this code block, a dataset related to Pima Indian women's health, specifically focusing on
diabetes, is loaded into a Pandas DataFrame. The dataset is obtained from a given URL and
has columns representing attributes such as the number of pregnancies, glucose levels, blood
pressure, skin thickness, insulin levels, body mass index (BMI), diabetes pedigree function,
age, and a binary label indicating the presence or absence of diabetes. The read_csv function is
CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

used to read the data, skipping the first row (header) and assigning custom column names
specified in the col_names list. The resulting DataFrame, named 'pima,' is then displayed using
the head() function, providing a glimpse of the first few rows of the dataset.

Step 3: Split dataset in features and target variable

In this code block, the dataset is split into features (X) and the target variable (y). The features
are selected from the 'pima' DataFrame using the specified columns in the 'feature_cols' list,
which includes attributes such as the number of pregnancies, insulin levels, BMI, age, glucose
levels, blood pressure, and the diabetes pedigree function. These features are stored in the
variable X. The target variable y is assigned the values from the 'label' column in the 'pima'
DataFrame, representing whether an individual has diabetes (1) or not

(0). This separation of features and the target variable is a common preprocessing step before
training a machine learning model, allowing for effective training and evaluation.

Step 4: Split dataset into training set and test set

This code block utilizes the train_test_split function from scikit-learn to partition the dataset
into training and test sets. The feature matrix (X) and target variable (y) obtained from the
previous step are split into training sets (X_train and y_train) and test sets (X_test and
y_test). The parameter test_size=0.3 indicates that 30% of the data will be used for testing,
while the remaining 70% will be utilized for training the machine learning model. The
random_state=1 parameter ensures reproducibility by fixing the random seed during the
splitting process, resulting in consistent training and evaluation sets across multiple runs.
This step is crucial for assessing the model's generalization performance on unseen data.

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

Step 5: Building decision tree model

In this code block, a Decision Tree Classifier is instantiated using the DecisionTreeClassifier
class from scikit-learn, creating an object named clf. Subsequently, the classifier is trained
using the training sets (X_train and y_train) through the fit method. This process involves the
algorithm recursively splitting the data based on features to construct a decision tree that can
make predictions. Once the model is trained, it is applied to the test dataset (X_test) using the
predict method, generating predictions stored in the variable y_pred. The decision tree model
is now ready for evaluation and analysis of its predictive performance on the test data.

Step 6: Evaluating the model

Output:
Accuracy: 0.6796536796536796
This code block assesses the accuracy of the Decision Tree Classifier model on the test dataset.
The accuracy_score function from scikit-learn's metrics module isutilized to compare the
predicted labels (y_pred) with the actual labels (y_test). The result, printed as "Accuracy,"
represents the proportion of correctly classified instances. In this specific output, the accuracy is
approximately 0.688, indicating that the model correctly predicted the target variable for around
68.8% of the instances in the test set. Evaluating accuracy is a common metric to gauge the
overall performance of a classification model

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

Step 7: Visualizing decision trees

Output:

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

In this code block, the Decision Tree Classifier (clf) is visualized using Graphics tools. The
export_graphics function generates a DOT format representation of the decision tree, which is
stored in the dot_data variable. The only pydotplu library is then used to create a graphical
representation of the tree from the DOT data. The resulting image is saved as 'diabetes.png,' and
the Image module from IPython is employed to display the visualized decision tree directly
within the Colab notebook. This visualization provides a detailed overview of the decision-
making process and the structure of the trained decision tree model.

Practical:7

Aim: Write a program to classify various types of iris dataset using Support
Vector Machine (SVM).
Support Vector Machine (SVM) is a supervised machine learning algorithmused for
classification and regression tasks. The primary objective of SVM is to finda hyperplane in an N-
dimensional space (where N is the number of features) thatdistinctly classifies the data into
different classes. This hyperplane is chosen in such away that it maximizes the margin between
the classes. The margin is the distancebetween the hyperplane and the nearest data point of each
class.

Key Concepts:
1. Support Vectors:
• SVM works by finding the hyperplane that best separates the data into
different classes.
• Support Vectors are the data points that lie closest to the decision
boundary (hyperplane).
• These vectors are critical in determining the optimal hyperplane and

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

ultimately the classification boundary.

2. Hyperplane:
• In a two-dimensional space, a hyperplane is a simple line.
• In a three-dimensional space, it's a plane.
• For higher dimensions, it's referred to as a hyperplane.
• The goal of SVM is to find the hyperplane that best separates the data
into classes.
3. Margin:
• The margin is the distance between the hyperplane and the nearest
data point from either class.
• SVM aims to maximize this margin, resulting in a more robust classifier.
4. Kernel Trick:
• SVM can handle non-linear decision boundaries by transforming the
input features into a higher-dimensional space.

• This is done using kernel functions (e.g., polynomial, radial basis function)
to map the data into a space where a hyperplane can effectively
separate it.
5. C parameter:
• SVM has a regularization parameter denoted as 'C.'
• C determines the trade-off between having a smooth decision boundary
and classifying training points correctly.
• A smaller C allows for a softer decision boundary, while a larger C aims
for a more accurate classification on the training data.

Steps in SVM:
1.Data Collection:
•Gather a dataset with labelledsamples.

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

2.Choose a Kernel Function:

•Select a suitable kernel function based on the nature of the data. Common choices
include linear, polynomial, and radial basis function (RBF) kernels.
3.Model Training:
•Train the SVM model using the training dataset. The algorithm optimizes the
hyperplane to maximize the margin.
4.Parameter Tuning:
•Adjust hyperparameters, such as the choice of kernel and the regularization
parameter C, to optimize the model's performance.
5.Prediction:
•Use the trained model to predict the class labels of new, unseen data.

Advantages of SVM:
•Effective in high-dimensional spaces
.•Robust in the presence of outliers.
•Versatile with various kernel functions for different data types.

Limitations of SVM:
•Computationally expensive, especially for large datasets.
•The choice of the kernel and parameters requires careful tuning.
•It may not perform well when the number of features is much greater than the number of
samples.

Step 1: Import necessary libraries

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

from sklearn.metrics import accuracy_score This code block imports the necessary Python
libraries for implementing and evaluating a Support Vector Machine (SVM) on a dataset. It
includes NumPy for numerical operations, Matplotlib for data visualization, scikit-learn's
datasets module to load the Iris dataset, train_test_split for splitting the data into training and
testing sets, SVC for creating an SVM classifier, and accuracy_score for evaluating the accuracy
of the classifier. The SVM will be trained and tested on the Iris dataset, with the ultimate goal of
predicting and assessing the accuracy of the classification results.

Step 2: Load the Iris dataset

In this code block, the Iris dataset is loaded using scikit-learn'sdatasets module. The dataset
consists of four features for each sample, but for simplicity, only the first two features are
selected and stored in the variables X and y. X represents the feature matrix containing sepal
length and sepal width, while y contains the corresponding target labels denoting the species of
iris flowers. This reduced feature set simplifies the visualization and classification task while
retaining essential information for training an SVM classifier.

Step 3:Splits the data into training and testing sets

This code block uses scikit-learn's train_test_splitfunction to split the Iris dataset into training
and testing sets. The features (X) and target labels (y) are divided into X_train, X_test, y_train,
and y_test, respectively. The parameter test_size=0.2indicates that 20% of the data will be used
for testing, while the Machine Intelligence remaining 80% will be used for training the Support
Vector Machine (SVM) classifier. The random_state=42ensures reproducibility by fixing the
random seed for the data split, allowing consistent results across different runs.

Step 4: Creates an SVM classifier with a linear kernel and trains it on the
training data.

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

In this code block, a Support Vector Machine (SVM) classifier is created and trained using
scikit-learn's SVC(Support Vector Classification) class with a linear kernel. The
kernel='linear'parameter specifies that a linear decision boundary should be used for
classification. The classifier is then trained on the training set (X_train, y_train). Subsequently,
predictions are made on the test set (X_test), and the predicted labels are stored in the variable
y_pred. This process allows the evaluation of the classifier's performance on unseen data, which
will be assessed further using accuracy metrics.

Step 5:Evaluates the accuracy of the classifier on the test set

Output:
Accuracy: 0.90
In this code block, the accuracy of the Support Vector Machine (SVM) classifier is evaluated by
comparing its predictions (y_pred) on the test set (X_test) with the true labels (y_test). The
accuracy_scorefunction from scikit-learn's metrics module is used to calculate the accuracy,
which represents the proportion of correctly classified instances. The result is then printed to the
console, providing a quantitative measure of the SVM classifier's performance on the unseen
data. The accuracy score is a value between 0 and 1, with higher values indicating better
classification performance.

Step 6: Visualizes the data points and decision boundary of the SVM classifier

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

Output:

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

Practical:8

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

Aim: Write a program to implement K-means clustering on iris dataset.

K-Means is an iterative and unsupervised machine learning algorithm that partitions a dataset
into K distinct, non-overlapping subsets or clusters. The algorithm aims to group similar data
points together and separate different groups based on certain features or attributes.
Algorithm Steps:

1. Initialization:
 Choose the number of clusters (K) that you want to identify in the
dataset.
 Randomly initialize K centroids, one for each cluster. Centroids represent
the mean position of all the points in a cluster.
2. Assignment:
 For each data point in the dataset, calculate the Euclidean distance to
each centroid.
 Assign the data point to the cluster whose centroid is closest.
3. Update:
 Recalculate the centroids for each cluster as the mean of all data points
assigned to that cluster.
4. Iteration:
 Repeat the assignment and update steps until convergence.
 Convergence occurs when the assignment of data points to clusters stabilizes, and
centroids no longer change significantly.

Key Characteristics:
• Centroids: K-Means defines clusters by their centroids, which represent the center of mass for
the points in a cluster.
• Euclidean Distance: The algorithm uses Euclidean distance to measure the dissimilarity
between data points and centroids.
• Scalability: K-Means is computationally efficient and scalable to large datasets.

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

• Sensitivity to Initialization: The final clustering result can be sensitive to the initial placement
of centroids. Multiple runs with different initializations may be performed to mitigate this.
• Number of Clusters (K): The number of clusters needs to be predefined, and the algorithm
assumes that the data can be well-represented by this number. K-Means clustering is widely used
due to its simplicity, efficiency, and effectiveness in a variety of applications.
Here, we are applying K-means clustering on the iris dataset using Python.
Step 1: Import necessary libraries:

In this code block, the necessary libraries for implementing K-Means clustering on the Iris
dataset are imported. NumPy (np) is used for numerical operations, Pandas (pd) for data
manipulation, and Matplotlib (plt) for data visualization. The KMeans class from Scikit-Learn is
imported to perform the K-Means clustering algorithm, and the load_iris function is used to load
the Iris dataset. Additionally, StandardScaler from Scikit-Learn is imported to standardize the
features, ensuring that they have zero mean and unit variance, which is a common preprocessing
step for K-Means clustering to achieve better performance.

Step 2: Load the Iris dataset:

In this code block, the Iris dataset is loaded using the load_iris function from Scikit-Learn. The
data matrix X contains the features of the dataset, and feature_names stores the names of these
features. The Iris dataset is a well-known benchmark dataset in machine learning, containing
measurements of sepal length, sepal width, petal length, and petal width for three different
species of iris flowers. This code block prepares the data for subsequent processing and analysis
within the K-Means clustering algorithm.

Step 3: Standardizes the features using StandardScaler:

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

In this code block, the features of the Iris dataset stored in the matrix X are standardized to have
zero mean and unit variance using the StandardScaler from Scikit-Learn. Standardization is a
preprocessing step commonly applied in K-Means clustering to ensure that all features contribute
equally to the clustering process, as it minimizes the impact of differences in the scales of
different features. The standardized feature matrix X_std is then obtained by fitting the scaler to
the original data (X) and transforming it accordingly. This standardization enhances the
performance and convergence of the K-Means algorithm by preventing features with larger
scales from dominating the clustering process.

Step 4: Applies K-Means clustering with K=3 clusters:

In this code block, the K-Means clustering algorithm is applied to the standardized Iris dataset
(X_std) using the KMeans class from Scikit-Learn. The parameter n_clusters=3 specifies that the
algorithm should identify three clusters, corresponding to the three different species of iris
flowers in the dataset. The_init=10 parameter determines the number of times the algorithm is
run with different initial centroids, and the result with the lowest inertia (sum of squared
distances from points to centroids) is selected. Setting random_state=42 ensures reproducibility
of results. The fit method then performs the actual clustering, assigning each data point to one
of the identified clusters based on their similarity to the cluster centroids.

Step 5: Get cluster labels and centroids:

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

In this code block, after applying the K-Means clustering algorithm, the cluster labels assigned
to each data point are obtained using the labels_ attribute of the K-Means model (kmeans).
Each label indicates the cluster to which the corresponding data point belongs. Additionally, the
coordinates of the centroids of the identified clusters are retrieved using the cluster_centers_
attribute. These centroids represent the average position of the data points within their
respective clusters. Both the cluster labels and centroids are important outputs for further
analysis and interpretation ofthe clustering results.

Step 6: Visualizes the clustered data points in a 2D space and marks the
clustercentroids with red 'X' markers:

Output:

CGPIT/CE/SEM-6/Machine Intelligence
Enrollment No :202103103510280

CGPIT/CE/SEM-6/Machine Intelligence

Ch-03: Programming Fundamentals - Short Question Answers | PDF
No ratings yet
Ch-03: Programming Fundamentals - Short Question Answers | PDF
27 pages
Oracle Fusion Cloud Financial Implementation
No ratings yet
Oracle Fusion Cloud Financial Implementation
24 pages
Cody's Data Cleaning Techniques Using SAS, Third Edition
From Everand
Cody's Data Cleaning Techniques Using SAS, Third Edition
Ron Cody
4.5/5 (3)
Form One Examinations-Ict
100% (7)
Form One Examinations-Ict
7 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Reduce Data Dimensionality Using PCA
No ratings yet
Reduce Data Dimensionality Using PCA
6 pages
vertopal.com_DAI_Amberish_LAB_ASSIGNMENT_3 (1)
No ratings yet
vertopal.com_DAI_Amberish_LAB_ASSIGNMENT_3 (1)
7 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
ml lab
No ratings yet
ml lab
14 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
No ratings yet
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
14 pages
Implementing PCA in Python With Scikit
No ratings yet
Implementing PCA in Python With Scikit
6 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
PCA_Explained -
No ratings yet
PCA_Explained -
9 pages
DS Manual
No ratings yet
DS Manual
30 pages
Unit1 ML Programs
No ratings yet
Unit1 ML Programs
5 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
M pdf
No ratings yet
M pdf
13 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
Machine Learning Guide: Meher Krishna Patel
No ratings yet
Machine Learning Guide: Meher Krishna Patel
121 pages
8. ML_Lab Manual
No ratings yet
8. ML_Lab Manual
54 pages
ML 1
No ratings yet
ML 1
6 pages
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
No ratings yet
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
15 pages
lec 13-14 PCA
No ratings yet
lec 13-14 PCA
53 pages
PR Final File
No ratings yet
PR Final File
70 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Assignment
No ratings yet
Assignment
24 pages
pgm3
No ratings yet
pgm3
2 pages
ML LabReport Final Index Edited
No ratings yet
ML LabReport Final Index Edited
35 pages
Python
No ratings yet
Python
5 pages
Assignment 1 A
No ratings yet
Assignment 1 A
12 pages
3_Modeling.ipynb - Colaboratory
No ratings yet
3_Modeling.ipynb - Colaboratory
31 pages
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
No ratings yet
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
16 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Assignment PCA
No ratings yet
Assignment PCA
4 pages
Dimensionality Reduction: Motivation I: Data Compression
No ratings yet
Dimensionality Reduction: Motivation I: Data Compression
35 pages
Normalization and PCA
No ratings yet
Normalization and PCA
12 pages
Python Lab Programs-23pca017 & 23pca018
No ratings yet
Python Lab Programs-23pca017 & 23pca018
8 pages
Machine Learning in Python
No ratings yet
Machine Learning in Python
5 pages
ML-3
No ratings yet
ML-3
24 pages
Dvpd11 Merged Merged 27 83
No ratings yet
Dvpd11 Merged Merged 27 83
57 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
EDAB Module 5 Singular Value Decomposition (SVD)
No ratings yet
EDAB Module 5 Singular Value Decomposition (SVD)
58 pages
Unit 2 ML
No ratings yet
Unit 2 ML
93 pages
Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
Lecture-3 Unit 3
No ratings yet
Lecture-3 Unit 3
22 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Lab Cs
No ratings yet
Lab Cs
38 pages
Presentation1
No ratings yet
Presentation1
15 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
To Study About Numpy, Pandas and Matplotlib Libraries in Python
No ratings yet
To Study About Numpy, Pandas and Matplotlib Libraries in Python
21 pages
AIML Record Batch 9
No ratings yet
AIML Record Batch 9
88 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Experiment 2
No ratings yet
Experiment 2
17 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
DMV & ML Lab
No ratings yet
DMV & ML Lab
103 pages
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
From Everand
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
Kim Chantala
No ratings yet
pyhton 2
No ratings yet
pyhton 2
8 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
2 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Operating Instructions Sim2500 Sensor Integration Machine en Im0097329
No ratings yet
Operating Instructions Sim2500 Sensor Integration Machine en Im0097329
46 pages
dv[1]
No ratings yet
dv[1]
7 pages
BD05961 - NT Control LT Part 2 Start Up
No ratings yet
BD05961 - NT Control LT Part 2 Start Up
10 pages
System - Windows.Forms - VisualStyles Namespace
No ratings yet
System - Windows.Forms - VisualStyles Namespace
854 pages
Audiobook Instructions - Download - Extract - Sync 2021-08-1
No ratings yet
Audiobook Instructions - Download - Extract - Sync 2021-08-1
7 pages
SIL Solver Pre-Installation Procedure v6 Combined
No ratings yet
SIL Solver Pre-Installation Procedure v6 Combined
7 pages
Codigo de Errores HP
No ratings yet
Codigo de Errores HP
34 pages
Shiva Swarodaya
No ratings yet
Shiva Swarodaya
92 pages
TOPPER 5500: What Was Already Good Now Is Even Better!
No ratings yet
TOPPER 5500: What Was Already Good Now Is Even Better!
3 pages
ARC Guia
No ratings yet
ARC Guia
38 pages
JalTest Soft (JalTest V2-V3) Manual - 2
No ratings yet
JalTest Soft (JalTest V2-V3) Manual - 2
43 pages
IMX327-Datasheet
No ratings yet
IMX327-Datasheet
13 pages
Intel I5 4210H
No ratings yet
Intel I5 4210H
7 pages
Creating Custom - Net Controls With C#
No ratings yet
Creating Custom - Net Controls With C#
6 pages
Case Studies in Construction Materials
No ratings yet
Case Studies in Construction Materials
14 pages
Multimedia Information and Media
No ratings yet
Multimedia Information and Media
23 pages
Proficy Ifix 5.0 Ds Gfa1120a
No ratings yet
Proficy Ifix 5.0 Ds Gfa1120a
4 pages
CHAPTER 8 INTRODUCTION TO PROGRAMMING-print
No ratings yet
CHAPTER 8 INTRODUCTION TO PROGRAMMING-print
46 pages
PROJECT REPORT FORMAT 2025
No ratings yet
PROJECT REPORT FORMAT 2025
59 pages
Windows XP Registry Tips
No ratings yet
Windows XP Registry Tips
3 pages
Chapter 2 - Installing and Managing Windows Server 2022
No ratings yet
Chapter 2 - Installing and Managing Windows Server 2022
75 pages
Harvard CS197 Lecture 18 Notes
No ratings yet
Harvard CS197 Lecture 18 Notes
8 pages
Flight Booking Management System: Project Report
100% (1)
Flight Booking Management System: Project Report
12 pages
PLAXIS 3D 2024.3 3D 0 General Information
No ratings yet
PLAXIS 3D 2024.3 3D 0 General Information
34 pages
CNN LSTM
0% (1)
CNN LSTM
5 pages
CH 13
No ratings yet
CH 13
12 pages
Windows95 Sms Paint
No ratings yet
Windows95 Sms Paint
5 pages

PRACTICAL5

Uploaded by

PRACTICAL5

Uploaded by

Enrollment No :202103103510280

Aim: To implement principal component analysis.

The key steps in PCA include:

Step 1: Import necessary libraries

Step 2: Load the Iris dataset

Step 3: Standardize the data

Standardizing features is a common preprocessing step in machine learning to ensure

Step 4: PCA projection to 2D

Step 5: Visualize 2D Projection

Aim: Write a program to apply decision tree classifier on Pima Indian

How does the decision tree algorithm work?

Step 1: Import necessary libraries

Step 2: Load the dataset

Step 3: Split dataset in features and target variable

Step 4: Split dataset into training set and test set

Step 5: Building decision tree model

Step 6: Evaluating the model

Step 7: Visualizing decision trees

ultimately the classification boundary.

2.Choose a Kernel Function:

Step 1: Import necessary libraries

Step 2: Load the Iris dataset

Step 3:Splits the data into training and testing sets

Step 5:Evaluates the accuracy of the classifier on the test set

Aim: Write a program to implement K-means clustering on iris dataset.

Step 2: Load the Iris dataset:

Step 3: Standardizes the features using StandardScaler:

Step 4: Applies K-Means clustering with K=3 clusters:

Step 5: Get cluster labels and centroids:

You might also like