0% found this document useful (0 votes)

30 views6 pages

Meaningful Predictive Modeling Week-4 Assignment Cancer Disease Prediction

This document analyzes various machine learning algorithms to predict cancer using a cancer dataset with 568 samples. It imports necessary libraries, loads and preprocesses the dataset, splits it into training and test sets, trains 7 different classifiers (Logistic Regression, KNN, Linear SVM, RBF SVM, Naive Bayes, Decision Tree, Random Forest), makes predictions on the test set, calculates accuracy scores, and plots the results. The Logistic Regression model achieved the highest accuracy of 97.88% while the Decision Tree model had the lowest accuracy of 90.14%.

Uploaded by

frankh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views6 pages

Meaningful Predictive Modeling Week-4 Assignment Cancer Disease Prediction

Uploaded by

frankh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

10/8/2020 Cancer

Meaningful Predictive Modeling Week-4 Assignment

CANCER DISEASE PREDICTION
In [1]:

#importing the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [16]:

#importing our cancer dataset

dataset = pd.read_csv('cancer.csv')
X = dataset.iloc[:, 2:31].values
Y = dataset.iloc[:, 1].values

In [17]:

dataset.head()

Out[17]:

842302 M 17.99 10.38 122.8 1001 0.1184 0.2776 0.3001 0.1471 ... 25.38

0 842517 M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 ... 24.99 2

1 84300903 M 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 ... 23.57 2

2 84348301 M 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 ... 14.91 2

3 84358402 M 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 ... 22.54

4 843786 M 12.45 15.70 82.57 477.1 0.12780 0.17000 0.1578 0.08089 ... 15.47 2

5 rows × 32 columns

In [18]:

print("Cancer data set dimensions : {}".format(dataset.shape))

Cancer data set dimensions : (568, 32)

localhost:8888/nbconvert/html/Desktop/Cancer.ipynb?download=false 1/6
10/8/2020 Cancer

In [19]:

dataset.isnull().sum()
dataset.isna().sum()

Out[19]:

842302 0
M 0
17.99 0
10.38 0
122.8 0
1001 0
0.1184 0
0.2776 0
0.3001 0
0.1471 0
0.2419 0
0.07871 0
1.095 0
0.9053 0
8.589 0
153.4 0
0.006399 0
0.04904 0
0.05373 0
0.01587 0
0.03003 0
0.006193 0
25.38 0
17.33 0
184.6 0
2019 0
0.1622 0
0.6656 0
0.7119 0
0.2654 0
0.4601 0
0.1189 0
dtype: int64

In [20]:

#Encoding categorical data values

from sklearn.preprocessing import LabelEncoder
labelencoder_Y = LabelEncoder()
Y = labelencoder_Y.fit_transform(Y)

In [21]:

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_stat
e = 0)

localhost:8888/nbconvert/html/Desktop/Cancer.ipynb?download=false 2/6
10/8/2020 Cancer

In [22]:

#Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

localhost:8888/nbconvert/html/Desktop/Cancer.ipynb?download=false 3/6
10/8/2020 Cancer

In [24]:

#Using Logistic Regression Algorithm to the Training Set

from sklearn.linear_model import LogisticRegression
classifier1 = LogisticRegression(random_state = 0)
classifier1.fit(X_train, Y_train)
#Using KNeighborsClassifier Method of neighbors class to use Nearest Neighbor algorithm
from sklearn.neighbors import KNeighborsClassifier
classifier2 = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier2.fit(X_train, Y_train)
#Using SVC method of svm class to use Support Vector Machine Algorithm
from sklearn.svm import SVC
classifier3 = SVC(kernel = 'linear', random_state = 0)
classifier3.fit(X_train, Y_train)
#Using SVC method of svm class to use Kernel SVM Algorithm
from sklearn.svm import SVC
classifier4 = SVC(kernel = 'rbf', random_state = 0)
classifier4.fit(X_train, Y_train)
#Using GaussianNB method of naïve_bayes class to use Naïve Bayes Algorithm
from sklearn.naive_bayes import GaussianNB
classifier5 = GaussianNB()
classifier5.fit(X_train, Y_train)
#Using DecisionTreeClassifier of tree class to use Decision Tree Algorithm

from sklearn.tree import DecisionTreeClassifier

classifier6 = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier6.fit(X_train, Y_train)

#Using RandomForestClassifier method of ensemble class to use Random Forest Classificat

ion algorithm

from sklearn.ensemble import RandomForestClassifier

classifier7 = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_s
tate = 0)
classifier7.fit(X_train, Y_train)

C:\Users\ROHINI\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.
py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22.
Specify a solver to silence this warning.
FutureWarning)

Out[24]:

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entro

py',
max_depth=None, max_features='auto', max_leaf_nodes
=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10,
n_jobs=None, oob_score=False, random_state=0, verbo
se=0,
warm_start=False)

localhost:8888/nbconvert/html/Desktop/Cancer.ipynb?download=false 4/6
10/8/2020 Cancer

In [29]:

Y_pred1 = classifier1.predict(X_test)
Y_pred2 = classifier2.predict(X_test)
Y_pred3 = classifier3.predict(X_test)
Y_pred4 = classifier4.predict(X_test)
Y_pred5 = classifier5.predict(X_test)
Y_pred6 = classifier6.predict(X_test)
Y_pred7 = classifier7.predict(X_test)

In [30]:

from sklearn.metrics import confusion_matrix

cm1 = confusion_matrix(Y_test, Y_pred1)
cm2 = confusion_matrix(Y_test, Y_pred2)
cm3 = confusion_matrix(Y_test, Y_pred3)
cm4 = confusion_matrix(Y_test, Y_pred4)
cm5 = confusion_matrix(Y_test, Y_pred5)
cm6 = confusion_matrix(Y_test, Y_pred6)
cm7 = confusion_matrix(Y_test, Y_pred7)
print(cm1)
print(cm2)
print(cm3)
print(cm4)
print(cm5)
print(cm6)
print(cm7)

[[91 1]
[ 2 48]]
[[91 1]
[ 6 44]]
[[90 2]
[ 4 46]]
[[92 0]
[ 6 44]]
[[89 3]
[ 6 44]]
[[84 8]
[ 6 44]]
[[89 3]
[ 6 44]]

localhost:8888/nbconvert/html/Desktop/Cancer.ipynb?download=false 5/6
10/8/2020 Cancer

In [34]:

from sklearn.metrics import accuracy_score

acc1=accuracy_score(Y_test, Y_pred1)*100
acc2=accuracy_score(Y_test, Y_pred2)*100
acc3=accuracy_score(Y_test, Y_pred3)*100
acc4=accuracy_score(Y_test, Y_pred4)*100
acc5=accuracy_score(Y_test, Y_pred5)*100
acc6=accuracy_score(Y_test, Y_pred6)*100
acc7=accuracy_score(Y_test, Y_pred7)*100
print("LogR",acc1)
print("KNN",acc2)
print("SVM",acc3)
print("K-SVM",acc4)
print("NB",acc5)
print("DT",acc6)
print("RF",acc7)

LogR 97.88732394366197
KNN 95.07042253521126
SVM 95.77464788732394
K-SVM 95.77464788732394
NB 93.66197183098592
DT 90.14084507042254
RF 93.66197183098592

In [38]:

import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import matplotlib.pyplot as plt

data = [acc1, acc2, acc3, acc4,acc5,acc6,acc7]

labels = ['LogR', 'KNN', 'SVM', 'KSVM', 'NB','DT','RF']
plt.xticks(range(len(data)), labels)
plt.xlabel('Algorithms')
plt.ylabel('Accuracy(%)')
plt.title('Comparision of Algorithms')
plt.bar(range(len(data)), data,color=['pink', 'red', 'green', 'blue', 'cyan','yellow',
'purple'])
plt.show()

localhost:8888/nbconvert/html/Desktop/Cancer.ipynb?download=false 6/6

networking presentation
No ratings yet
networking presentation
23 pages
En Iso 12937
0% (1)
En Iso 12937
10 pages
Additional Program
No ratings yet
Additional Program
573 pages
Dissertation Voltaire
100% (2)
Dissertation Voltaire
6 pages
Trs 80 Model 1 Clone
No ratings yet
Trs 80 Model 1 Clone
46 pages
TensorFlow Classification
No ratings yet
TensorFlow Classification
68 pages
Excel Training Syllabus
0% (1)
Excel Training Syllabus
2 pages
Bus 2201: Principles of Marketing
No ratings yet
Bus 2201: Principles of Marketing
2 pages
DM Slip Solutions
100% (1)
DM Slip Solutions
24 pages
ML-journal
No ratings yet
ML-journal
45 pages
Machine
100% (1)
Machine
45 pages
Positioning Content: D-R Ivan Chorbev
No ratings yet
Positioning Content: D-R Ivan Chorbev
49 pages
Lab Manual - MachineLearningLaboratory-DR.vaishnavi (1)
No ratings yet
Lab Manual - MachineLearningLaboratory-DR.vaishnavi (1)
71 pages
Resume - Nabiha Ghori
No ratings yet
Resume - Nabiha Ghori
1 page
01 Machine Learning
No ratings yet
01 Machine Learning
25 pages
Google Pixel 8 Pro
No ratings yet
Google Pixel 8 Pro
3 pages
MLfull
No ratings yet
MLfull
29 pages
reast-cancer-prediction-using-debt
No ratings yet
reast-cancer-prediction-using-debt
18 pages
PRJ-Parkinsons Disease Prediction
No ratings yet
PRJ-Parkinsons Disease Prediction
16 pages
Final Project Making Predictions From Data-Course 2: October 6, 2020
No ratings yet
Final Project Making Predictions From Data-Course 2: October 6, 2020
20 pages
ZaloPay API QuickPay Integration
No ratings yet
ZaloPay API QuickPay Integration
29 pages
Jacobs OSCM 17e Chap017
No ratings yet
Jacobs OSCM 17e Chap017
28 pages
Conduction of ASQ CSI survey
No ratings yet
Conduction of ASQ CSI survey
2 pages
HEART DIS
No ratings yet
HEART DIS
13 pages
Final Acts For WRC-15
No ratings yet
Final Acts For WRC-15
552 pages
Team No-7
No ratings yet
Team No-7
12 pages
Huf IntelliSens App Hc1000-Guide
No ratings yet
Huf IntelliSens App Hc1000-Guide
24 pages
All in one
No ratings yet
All in one
13 pages
DWDM Lab 3
No ratings yet
DWDM Lab 3
10 pages
M.E MACHINE LEARNING -CP4252 LAB MANUAL4716718074353656238
No ratings yet
M.E MACHINE LEARNING -CP4252 LAB MANUAL4716718074353656238
26 pages
Unstop Startup Overview and Analysis
No ratings yet
Unstop Startup Overview and Analysis
8 pages
vertopal.com_Female_A_S_Breast_Cancer_Prediction_model
No ratings yet
vertopal.com_Female_A_S_Breast_Cancer_Prediction_model
8 pages
Final_ML_Programs_075005
No ratings yet
Final_ML_Programs_075005
15 pages
Data Entry
100% (1)
Data Entry
27 pages
bacdeaf_23032025_115708_split_1
No ratings yet
bacdeaf_23032025_115708_split_1
37 pages
Cancer Disease Classification
No ratings yet
Cancer Disease Classification
6 pages
Assignment_New (14)
No ratings yet
Assignment_New (14)
6 pages
Medical Literature Searching
No ratings yet
Medical Literature Searching
12 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
Written Assignment
No ratings yet
Written Assignment
7 pages
allcodesml2
No ratings yet
allcodesml2
10 pages
Written Assignment Unit 7: Abstract
No ratings yet
Written Assignment Unit 7: Abstract
3 pages
Written Assignment Unit 7: Abstract
No ratings yet
Written Assignment Unit 7: Abstract
3 pages
This Study Resource Was: Module 2 - Assignment 2
No ratings yet
This Study Resource Was: Module 2 - Assignment 2
3 pages
ATUL MLT EXP 4-11
No ratings yet
ATUL MLT EXP 4-11
17 pages
Import As Import As From Import Import As Matplotlib Import Import
No ratings yet
Import As Import As From Import Import As Matplotlib Import Import
5 pages
Assignment ML
No ratings yet
Assignment ML
5 pages
ML0101EN Clas SVM Cancer Py v1
No ratings yet
ML0101EN Clas SVM Cancer Py v1
10 pages
COSC221101091.
No ratings yet
COSC221101091.
5 pages
Zerodha Amibroker
No ratings yet
Zerodha Amibroker
18 pages
Journal. Retrieved From: References
No ratings yet
Journal. Retrieved From: References
1 page
AIML PROGRAMS
No ratings yet
AIML PROGRAMS
12 pages
SKIPPER Catalogue 2021 135
No ratings yet
SKIPPER Catalogue 2021 135
38 pages
ML Lab
No ratings yet
ML Lab
4 pages
Written Assignment Unit 1: Business Net Types University of The People BUS 2202 E-Commerce Instructor Richard Cline 16 November, 2020
No ratings yet
Written Assignment Unit 1: Business Net Types University of The People BUS 2202 E-Commerce Instructor Richard Cline 16 November, 2020
5 pages
I Avaliação Parcial - 25.0 PTS - Gabarito
No ratings yet
I Avaliação Parcial - 25.0 PTS - Gabarito
9 pages
Step 1: Finding The Data Set: "Amazon - Reviews - Multilingual - UK - v1 - 00.tsv - GZ" 'RT' "Utf8"
No ratings yet
Step 1: Finding The Data Set: "Amazon - Reviews - Multilingual - UK - v1 - 00.tsv - GZ" 'RT' "Utf8"
4 pages
University of The People Course Bus 2204 Topic: Personal Financial Planning Instructor: Madam Schaffert
No ratings yet
University of The People Course Bus 2204 Topic: Personal Financial Planning Instructor: Madam Schaffert
4 pages
Data Exploration
No ratings yet
Data Exploration
4 pages
Breast Cancer Detection Using Python & Machine Learning
No ratings yet
Breast Cancer Detection Using Python & Machine Learning
12 pages
AI Medical Diagnosis Week 02
No ratings yet
AI Medical Diagnosis Week 02
3 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Preductive Modelling Assignment
No ratings yet
Preductive Modelling Assignment
3 pages
ML Lab 5
No ratings yet
ML Lab 5
2 pages
SVM - Classification - Jupyter Notebook
No ratings yet
SVM - Classification - Jupyter Notebook
2 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
ML II lab
No ratings yet
ML II lab
5 pages
What Makes A Good Abstract
No ratings yet
What Makes A Good Abstract
3 pages
PDF To Jpeg
No ratings yet
PDF To Jpeg
7 pages
SPPUML5
No ratings yet
SPPUML5
4 pages
A. Describe in Detail The Advantages and Disadvantages of Renting Versus Owning A Home
No ratings yet
A. Describe in Detail The Advantages and Disadvantages of Renting Versus Owning A Home
2 pages
AI ML - Cycle 2 Programs (1)
No ratings yet
AI ML - Cycle 2 Programs (1)
15 pages
Fedra Serif Pro™ Fedra Serif STD™: Abcdegh AB
No ratings yet
Fedra Serif Pro™ Fedra Serif STD™: Abcdegh AB
10 pages
mn
No ratings yet
mn
1 page
University of The People BUS 2201 - AY2021-T2 Principles of Marketing Written Assignment Unit 1 Instructor DR Linda Howe Date: November 14, 2020
No ratings yet
University of The People BUS 2201 - AY2021-T2 Principles of Marketing Written Assignment Unit 1 Instructor DR Linda Howe Date: November 14, 2020
5 pages
Scikit Learn What Were Covering
No ratings yet
Scikit Learn What Were Covering
15 pages
Information Regarding Sales Made in Real Estate in A Tabular Format
No ratings yet
Information Regarding Sales Made in Real Estate in A Tabular Format
13 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Shobit Sharma (2124399) ML lab file pdf
No ratings yet
Shobit Sharma (2124399) ML lab file pdf
19 pages
1 KNN - Jupyter Notebook
No ratings yet
1 KNN - Jupyter Notebook
3 pages
AML_lab[1] (1)
No ratings yet
AML_lab[1] (1)
14 pages
ML lab 4,5,6,7,8,9,10
No ratings yet
ML lab 4,5,6,7,8,9,10
7 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
8 pages
SVM(1)
No ratings yet
SVM(1)
1 page
CSC 211 - Course Outline
No ratings yet
CSC 211 - Course Outline
2 pages
Review of ISF Parameters - IJAMT - 2016 (Ou5)
No ratings yet
Review of ISF Parameters - IJAMT - 2016 (Ou5)
22 pages
KnnClassifier - Jupyter Notebook
No ratings yet
KnnClassifier - Jupyter Notebook
2 pages
Injection system
No ratings yet
Injection system
6 pages
LAB9
No ratings yet
LAB9
3 pages
Ml Lab Experiment Shortened With Same Output
No ratings yet
Ml Lab Experiment Shortened With Same Output
6 pages
Breast Cancer Classification Using DTC
No ratings yet
Breast Cancer Classification Using DTC
1 page
DBW18 - 0001 - Originally OPW17 0024 Site & Soil
No ratings yet
DBW18 - 0001 - Originally OPW17 0024 Site & Soil
10 pages
labaihw_
No ratings yet
labaihw_
1 page
Machine Learning
No ratings yet
Machine Learning
8 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
sunmi L2
No ratings yet
sunmi L2
1 page
Common Instrumentation Glossary of Terms Student
No ratings yet
Common Instrumentation Glossary of Terms Student
3 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
16BCB0126 VL2018195002535 Pe003
No ratings yet
16BCB0126 VL2018195002535 Pe003
40 pages
Concrete Pump Putz
100% (2)
Concrete Pump Putz
196 pages
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
100% (1)
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
11 pages
API 570 Day 5 Book (1 To 130)
No ratings yet
API 570 Day 5 Book (1 To 130)
137 pages
Korg tr61 PDF
No ratings yet
Korg tr61 PDF
18 pages
Ervice AND Arts Anual: Elmhults Konstruktions Ab
No ratings yet
Ervice AND Arts Anual: Elmhults Konstruktions Ab
45 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet

Meaningful Predictive Modeling Week-4 Assignment Cancer Disease Prediction

Uploaded by

Meaningful Predictive Modeling Week-4 Assignment Cancer Disease Prediction

Uploaded by

10/8/2020 Cancer

Meaningful Predictive Modeling Week-4 Assignment

#importing the libraries

#importing our cancer dataset

print("Cancer data set dimensions : {}".format(dataset.shape))

Cancer data set dimensions : (568, 32)

#Encoding categorical data values

#Using Logistic Regression Algorithm to the Training Set

from sklearn.tree import DecisionTreeClassifier

#Using RandomForestClassifier method of ensemble class to use Random Forest Classificat

from sklearn.ensemble import RandomForestClassifier

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entro

from sklearn.metrics import confusion_matrix

from sklearn.metrics import accuracy_score

data = [acc1, acc2, acc3, acc4,acc5,acc6,acc7]

You might also like