Updated Presentation

Abc

Uploaded by

Umaid Ali Keerio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views11 pages

Updated Presentation

Abc

Uploaded by

Umaid Ali Keerio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Introduction to the data

set
I selected this dataset because it provides a substantial amount of data, with 8,124 rows, making it ideal for machine learning models as it allows for robust training and
potentially higher accuracy. The dataset, sourced from the UCI Machine Learning Repository, is derived from reliable scientific observations, ensuring its credibility. The
dataset includes entirely categorical data (e.g., cap shape, odor, habitat, and population), offering a rich opportunity for classification tasks. Additionally, the topic is
highly compelling, offering insights into the characteristics that determine whether a mushroom is edible or poisonous. This combination of a rich dataset, a reputable
source, and a meaningful subject makes it both an engaging and impactful project.
Key features or variables/attributes in the dataset:

The dataset consists of 8,124 rows and 23 columns, providing a rich mix of categorical features that describe various mushroom characteristics. Each row represents a unique observation of a
mushroom, while the columns capture attributes such as cap shape, cap surface, cap color, and odor. All data in the dataset is categorical, with no missing values, making it well-suited for classification
tasks. The target column, class, specifies whether a mushroom is edible (e) or poisonous (p). This dataset offers a unique opportunity to uncover patterns and relationships among features to predict
the edibility of mushrooms with high accuracy. (Dua, D. and Graff, C. 2019).
Problem Statement:

Problem The problem at hand is the difficulty in identifying characteristics that determine weather a
Mushroom is edible or poisonous , posing a potential risk to foragers and consumers

definitio
Objective:
n
The goal of this project is to determine the characteristics that mostly accurately predict
weather a mushroom is edible or poisonous.
Data Preprocessing cleaning and transformation

Checked for Missing Values

I started by checking for any missing values in the mushroom dataset using data.isnull().any(). Fortunately, there were no missing values in any of the
columns, so no imputation or handling of null entries was required.

Encoded Categorical Columns

All columns in the dataset were categorical. I used OneHotEncoder to convert these columns into numerical representations. This transformation was
essential because machine learning algorithms require numerical data to process and interpret the features effectively.

Dropped Irrelevant Columns

Since all features were relevant to the classification task, no columns were removed. However, care was taken to ensure that no duplicates or redundant
data were present.

Validated Feature Distribution

I checked the distribution of categorical features to ensure there were no anomalies or imbalances that would adversely affect the model.

No Outlier Removal
Outlier removal was not required as all features were categorical, and the dataset was already clean and well-structured.
Data Preprocessing cleaning
and transformation
Encoding categorical columns into
numerical values because ml
Checking for missing values
algorithms require numerical inputs
and I used OneHotEncoding
Exploratory Data Analysis (EDA)
The three graphs highlight important patterns and relationships within the mushroom dataset. The first bar chart
visualizes the distribution of the target variable (class), showing the number of edible (e) and poisonous (p)
mushrooms. This balanced distribution ensures that both classes are well-represented for classification tasks. The
second graph explores the distribution of cap-shape, revealing that certain shapes, such as x and f, are more
prevalent, while others are less common. The third graph, a stacked bar chart, illustrates the relationship between
odor and habitat, showing how certain odors are dominant in specific habitats. For instance, n is prominent in
wooded areas (d), while other odors vary across habitats. Together, these visualizations provide valuable insights
into the distribution and interaction of features, helping to identify key patterns for predicting mushroom edibility.
Machine learning models

I will begin with a Logistic Regression model as a baseline due to its simplicity and effectiveness in
binary classification tasks. It provides interpretable coefficients, helping to understand the
relationship between features like odor and gill size with the target variable. Next, I will use a
Random Forest Classifier, which is reliable and capable of handling non-linear relationships. It also
provides feature importance metrics, which are crucial for identifying the most influential features,
such as cap shape and bruises. Finally, I will implement XGBoost, a more advanced gradient boosting
algorithm that refines predictions by reducing residual errors and capturing complex interactions
among features. These three models will enable a comprehensive evaluation of the dataset, ensuring
robust and accurate classification of mushrooms.
Model evaluation :
For model evaluation, I will use accuracy to measure how well the models correctly classify mushrooms
as edible or poisonous. Additionally, I will consider precision, recall, and the F1-score to evaluate the
performance of each model in distinguishing between the two classes. I will compare these metrics for
Logistic Regression, Random Forest, and XGBoost on the test data to determine which model performs
best. (Hossain, M.S., Muhammad, G., and Kwon, J., 2020).

I will train and test the Random Forest and xg Boost models by splitting the dataset into
training and testing subsets, typically using an 80/20 split. The models will be trained on
the training data to learn patterns and relationships between features and the target
variable
Potential Challenges

My dataset is balanced with numerical values, so I do not expect major class imbalance issues. However, I
will monitor for skewed distributions in the target variable (Smoking Prevalence). If imbalance is
identified in any future classification tasks, I will address it using techniques like oversampling with
SMOTE or under sampling the majority class. Ganganwar, V. (2012)

I will handle the categorical variables by converting them into numerical representations
using one-hot encoding for all non-numeric features. This ensures that the machine
learning models can interpret the data accurately and effectively. (Ganganwar, V., 2012).
conclusion
In conclusion, by leveraging Logistic Regression, Random Forest, and XGBoost models,
I aim to accurately classify mushrooms as edible or poisonous while identifying the key
features driving these classifications. Logistic Regression provides a simple and
interpretable baseline, offering insights into the relationships between features and
the target variable. Random Forest serves as a robust model, highlighting feature
importance and handling non-linear relationships effectively. XGBoost builds on this by
refining predictions through its advanced capabilities in capturing complex interactions
among features. Looking ahead, I plan to improve the models by experimenting with
hyperparameter tuning and exploring additional feature engineering techniques.
Incorporating larger or more diverse datasets in the future could further enhance the
accuracy and reliability of the analysis.
References

Dua, D., & Graff, C. (2019). Mushroom Dataset. [online] UCI Machine Learning Repository. Available at: https://archive.ics.uci.edu/ml/datasets/mushroom (Accessed: 19 November
2024).

Singh, A., N., M., & Lakshmiganthan, R. (2017). Impact of Different Data Types on Classifier Performance of Random Forest, Naïve Bayes, and K-Nearest Neighbors Algorithms.
International Journal of Advanced Computer Science and Applications, 8(12). doi: https://doi.org/10.14569/ijacsa.2017.081201.

Ganganwar, V. (2012). An overview of classification algorithms for imbalanced datasets. International Journal of Emerging Technology and Advanced Engineering, 2(4), 42–47.
Available at: https://www.researchgate.net/publication/292018027 (Accessed: 19 November 2024).

Hossain, M.S., Muhammad, G., & Kwon, J. (2020). A survey of big data architectures and machine learning algorithms for internet of things (IoT). Journal of Sensors, 15(8), 260.
Available at: https://www.mdpi.com/1999-5903/15/8/260 (Accessed: 17 November 2024).

12G, 130G & 140G Hydraulic System and Steering PRUBAS
100% (6)
12G, 130G & 140G Hydraulic System and Steering PRUBAS
24 pages
Muet 2006 To 2019 Past Papers
100% (4)
Muet 2006 To 2019 Past Papers
148 pages
Mushrooms Edibility Classification - George Yermak
No ratings yet
Mushrooms Edibility Classification - George Yermak
31 pages
A Comparative Study On Mushroom Classification Using Supervised Machine Learning Algorithms
No ratings yet
A Comparative Study On Mushroom Classification Using Supervised Machine Learning Algorithms
8 pages
214 - Manuscript INDIACom-2023-2.007 - Shakhi Singh
No ratings yet
214 - Manuscript INDIACom-2023-2.007 - Shakhi Singh
4 pages
State_of_art_Research_in_Edible_and_Poisonous_Mushroom_Recognition
No ratings yet
State_of_art_Research_in_Edible_and_Poisonous_Mushroom_Recognition
5 pages
A Deep Learning-Based Approach For Edible Inedible and Poisonous Mushroom Classification
No ratings yet
A Deep Learning-Based Approach For Edible Inedible and Poisonous Mushroom Classification
5 pages
Mini Project
No ratings yet
Mini Project
11 pages
Classification of Mushroom Fungi Using Machine Lea
No ratings yet
Classification of Mushroom Fungi Using Machine Lea
8 pages
02 Mushroom - Machine Learning Repository
No ratings yet
02 Mushroom - Machine Learning Repository
7 pages
Poisonous_Mushroom_Detection_Using_Graph_Neural_Networks
No ratings yet
Poisonous_Mushroom_Detection_Using_Graph_Neural_Networks
6 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
A Comparative Study On Mushrooms Classification
No ratings yet
A Comparative Study On Mushrooms Classification
8 pages
Identification of Edible and Non-Edible Mushroom Through Convolution Neural Network
No ratings yet
Identification of Edible and Non-Edible Mushroom Through Convolution Neural Network
10 pages
An AI Based Model For Classification of Mushroom Images
No ratings yet
An AI Based Model For Classification of Mushroom Images
5 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
ALKPOWv 1
No ratings yet
ALKPOWv 1
8 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
LightGBM - An In-Depth Guide Python
No ratings yet
LightGBM - An In-Depth Guide Python
26 pages
An Empirical Study On Mushroom Disease Diagnosis:A Data Mining Approach
No ratings yet
An Empirical Study On Mushroom Disease Diagnosis:A Data Mining Approach
6 pages
Jurnal
No ratings yet
Jurnal
9 pages
50.postharvest Classification of Banana (Musa Acuminata) Using Tier-Based
No ratings yet
50.postharvest Classification of Banana (Musa Acuminata) Using Tier-Based
8 pages
Tomato Disease Classification 1 3
No ratings yet
Tomato Disease Classification 1 3
3 pages
ArffaLimRachleff LearningToCook Poster
No ratings yet
ArffaLimRachleff LearningToCook Poster
1 page
17-Data Mining Introduction - IASRI - Lecture PDF
No ratings yet
17-Data Mining Introduction - IASRI - Lecture PDF
24 pages
Statistical Tools for Taming Complex Data
From Everand
Statistical Tools for Taming Complex Data
Pasquale De Marco
No ratings yet
Jurnal Inggris Shidqi Akram Hauzan
No ratings yet
Jurnal Inggris Shidqi Akram Hauzan
5 pages
Machine Learning Algorithms for Data Scientists: An Overview
From Everand
Machine Learning Algorithms for Data Scientists: An Overview
Vinaitheerthan Renganathan
No ratings yet
Study_on_Identification_Method_of_Chinese_Herbal_M
No ratings yet
Study_on_Identification_Method_of_Chinese_Herbal_M
19 pages
Data Analysis and Evaluation Methods Comparison
No ratings yet
Data Analysis and Evaluation Methods Comparison
11 pages
CC08 Group 07 Probability and Statistics Assignment Report PDF
No ratings yet
CC08 Group 07 Probability and Statistics Assignment Report PDF
36 pages
Clasification of Mango (Mangifera Indica L) Fruit Varieties Using CNN
No ratings yet
Clasification of Mango (Mangifera Indica L) Fruit Varieties Using CNN
7 pages
MYOLO A Lightweight Fresh Shiitake Mushroom Detect
No ratings yet
MYOLO A Lightweight Fresh Shiitake Mushroom Detect
23 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Gotzenberger Et Al. - 2021 - Trait-Based Ecology Tools in R
No ratings yet
Gotzenberger Et Al. - 2021 - Trait-Based Ecology Tools in R
267 pages
CatBoost - An In-Depth Guide Python
No ratings yet
CatBoost - An In-Depth Guide Python
33 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
3text
No ratings yet
3text
2 pages
HW Template
No ratings yet
HW Template
6 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Project Plan
No ratings yet
Project Plan
6 pages
Transfer learning: classifying balanced and imbalanced fungus images using inceptionV3
No ratings yet
Transfer learning: classifying balanced and imbalanced fungus images using inceptionV3
10 pages
SUMITs MINOR REPORT
No ratings yet
SUMITs MINOR REPORT
16 pages
Honours LY Project
No ratings yet
Honours LY Project
31 pages
Mlp Slides Merged
No ratings yet
Mlp Slides Merged
480 pages
Kritika Sejwal_24MCI10023_ML Lab_ Worksheet 4
No ratings yet
Kritika Sejwal_24MCI10023_ML Lab_ Worksheet 4
4 pages
Animal Species Prediction Using Machine Learning
No ratings yet
Animal Species Prediction Using Machine Learning
10 pages
2023 Prediction Modeling Using Deep Learning For The CL MD Nurul Raihen
No ratings yet
2023 Prediction Modeling Using Deep Learning For The CL MD Nurul Raihen
12 pages
Project Report
No ratings yet
Project Report
26 pages
Comparing_SVM_and_ANN_based_Machine_Learning_Metho
No ratings yet
Comparing_SVM_and_ANN_based_Machine_Learning_Metho
13 pages
Differential Evolution: Fundamentals and Applications
From Everand
Differential Evolution: Fundamentals and Applications
Fouad Sabry
No ratings yet
Gene Expression Programming: Fundamentals and Applications
From Everand
Gene Expression Programming: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
MUSHROOM
No ratings yet
MUSHROOM
6 pages
A data mining approach to wine quality prediction - Radosavljevic, Ilic, Pitulic
No ratings yet
A data mining approach to wine quality prediction - Radosavljevic, Ilic, Pitulic
5 pages
2-FlavorMiner a Machine Learning Platform for Extracting Molecular Flavor Profiles From Structural Data
No ratings yet
2-FlavorMiner a Machine Learning Platform for Extracting Molecular Flavor Profiles From Structural Data
12 pages
Applied Statistical Analysis with SPSS: Definitive Reference for Developers and Engineers
From Everand
Applied Statistical Analysis with SPSS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Machine Learning Based Predictive Modelling For The Enhancement of Wine Quality
No ratings yet
Machine Learning Based Predictive Modelling For The Enhancement of Wine Quality
18 pages
PySiRC Supplementary Information
No ratings yet
PySiRC Supplementary Information
8 pages
# Tommy Trojan # ITP 449 Fall 2021 # Final Project # Q1
No ratings yet
# Tommy Trojan # ITP 449 Fall 2021 # Final Project # Q1
6 pages
HW2 Individual
No ratings yet
HW2 Individual
2 pages
taask
No ratings yet
taask
18 pages
A2
No ratings yet
A2
11 pages
Spring 2025_SE601P_1
No ratings yet
Spring 2025_SE601P_1
1 page
Report
No ratings yet
Report
6 pages
Trigonometric Notes With Tricks and Formulas)
No ratings yet
Trigonometric Notes With Tricks and Formulas)
64 pages
Std. 12th Perfect Maths Part - II
No ratings yet
Std. 12th Perfect Maths Part - II
29 pages
Learners Academy: Chemistry - XI (Chap - 02, Test - 02 - Liquid)
No ratings yet
Learners Academy: Chemistry - XI (Chap - 02, Test - 02 - Liquid)
2 pages
Practical Centre (Karachi)
No ratings yet
Practical Centre (Karachi)
11 pages
Scientific Facts in Quran: November 2015
No ratings yet
Scientific Facts in Quran: November 2015
4 pages
NVH 02 - SDOF Systems
No ratings yet
NVH 02 - SDOF Systems
19 pages
De Thi Lop 3 HK1 Family and Friends
No ratings yet
De Thi Lop 3 HK1 Family and Friends
4 pages
Oventrop DB - 1146112 - en
No ratings yet
Oventrop DB - 1146112 - en
3 pages
Animal Feed Production Companies
No ratings yet
Animal Feed Production Companies
29 pages
Runatal Translations
No ratings yet
Runatal Translations
32 pages
Sec Phys Kinematics Problems
No ratings yet
Sec Phys Kinematics Problems
88 pages
Industrial Ventilating Fans: 30KQT - 40KQT 50AEQ2
No ratings yet
Industrial Ventilating Fans: 30KQT - 40KQT 50AEQ2
1 page
Taper and Deburring Countersinkers
No ratings yet
Taper and Deburring Countersinkers
32 pages
Dr.P.Saradha B13, Atlantis Apartment
No ratings yet
Dr.P.Saradha B13, Atlantis Apartment
3 pages
Algorithms
No ratings yet
Algorithms
14 pages
de-thi-giua-ki-2-tieng-anh-6-ilearn-smart-world-de-so-4-1676521846
No ratings yet
de-thi-giua-ki-2-tieng-anh-6-ilearn-smart-world-de-so-4-1676521846
14 pages
Sapa Group - Shape Magazine 2003 # 2 - Aluminium / Aluminum
No ratings yet
Sapa Group - Shape Magazine 2003 # 2 - Aluminium / Aluminum
12 pages
Acoustic Reflex Threshold (ART) Patterns: An Interpretation Guide For Students and Supervisors: Course Material and Exam Questions Course Material
No ratings yet
Acoustic Reflex Threshold (ART) Patterns: An Interpretation Guide For Students and Supervisors: Course Material and Exam Questions Course Material
17 pages
KCPL Party List
No ratings yet
KCPL Party List
1 page
1001 Antar and Abla Resource Pack
No ratings yet
1001 Antar and Abla Resource Pack
8 pages
Science 7 Q4 M17
No ratings yet
Science 7 Q4 M17
16 pages
HFJ Vibration Sensitive Feedback Sensor Replaceing v.1.1
No ratings yet
HFJ Vibration Sensitive Feedback Sensor Replaceing v.1.1
6 pages
Leading Pressure Gauges Manufacturer in Gujarat, India
No ratings yet
Leading Pressure Gauges Manufacturer in Gujarat, India
1 page
Kinetics of Motion
No ratings yet
Kinetics of Motion
37 pages
Silver Brochure
No ratings yet
Silver Brochure
8 pages
Q2118121 01 El CRT 00001
No ratings yet
Q2118121 01 El CRT 00001
25 pages
Weekly Production Week 46 & Plan Week 47 2022.pptx - AutoRecovered
No ratings yet
Weekly Production Week 46 & Plan Week 47 2022.pptx - AutoRecovered
35 pages
TRC Checklist
No ratings yet
TRC Checklist
1 page
Reliabilityweb Uptime 20121011
No ratings yet
Reliabilityweb Uptime 20121011
69 pages
TEI-of-a-Migration-to-Microsoft-Dynamics-365-Business-Central - Forrester - June 2024
No ratings yet
TEI-of-a-Migration-to-Microsoft-Dynamics-365-Business-Central - Forrester - June 2024
40 pages
Project Evaluation by Sensitivity, Scenario and Risk Analysis
No ratings yet
Project Evaluation by Sensitivity, Scenario and Risk Analysis
3 pages
Photon Beam Dosimetry PPT Final
No ratings yet
Photon Beam Dosimetry PPT Final
49 pages
Hsslive Xi Imp Time Table July 2018
No ratings yet
Hsslive Xi Imp Time Table July 2018
1 page
Kuhn Harold B.,The Angelology of The Non-Canonical Jewish Apocalypses
100% (1)
Kuhn Harold B.,The Angelology of The Non-Canonical Jewish Apocalypses
17 pages