DSBDA - Mini Project Report

Uploaded by

omkarshinde3905

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

1K views

DSBDA - Mini Project Report

Uploaded by

omkarshinde3905

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

SCTR's Pune Institute of Computer Technology

Dhankawadi, Pune

A PROJECT REPORT ON
Covid Vaccine Statewise Analysis

SUBMITTED BY
Omkar Shinde (31480)
Amey Wadgaonkar (31492)

Under the guidance of

Prof. Rutuja
Kulkarni

DEPARTMENT OF COMPUTER ENGINEERING

Academic Year 2023-24
Title:
Mini-Project
Exploratory data analysis of the covid vaccination data of India using the given dataset.

Problem Statement:
Use the following covid_vaccine_statewise.csv dataset and perform the following analytics on the given
dataset
https://www.kaggle.com/sudalairajkumar/covid19-in-india?select=covid_vaccine_statewise.csv
● Describe the dataset
● Number of persons statewise vaccinated for the first dose in India
● Number of persons statewise vaccinated for the second dose in India
● Number of males vaccinated
● Number of females vaccinated

Objectives:
● To Describe the dataset
● To do Preprocessing on the given dataset

Theory:
Libraries:
● Pandas - Pandas, a powerful Python library, plays a crucial role in machine learning (ML) workflows
by providing efficient data manipulation and analysis capabilities. With its intuitive data structures, such
as DataFrames and Series, Pandas simplifies the process of preprocessing and cleaning datasets for ML
tasks. It offers various functionalities like data selection, filtering, merging, and transformation,
allowing users to handle missing values, outliers, and feature engineering effectively. Additionally,
Pandas seamlessly integrates with other ML libraries like NumPy and scikit-learn, enabling smooth data
integration and model building. It also supports reading and writing data from various file formats,
making it convenient for ML practitioners to work with diverse data sources. Whether it's exploratory
data analysis, data preprocessing, or feature extraction, Pandas provides a versatile and efficient toolkit
that significantly enhances productivity and facilitates the development of robust machine learning
models.
● Numpy - NumPy, a fundamental library for numerical computations in Python, is widely used in
machine learning (ML) applications. Its array-oriented programming paradigm allows for efficient
manipulation and processing of large multi-dimensional arrays and matrices, which are central to many
ML algorithms. NumPy's extensive collection of mathematical functions enables quick and vectorized
operations on arrays, improving computational performance significantly. ML tasks such as data
preprocessing, feature extraction, and model evaluation benefit from NumPy's capabilities in handling
numerical data. The seamless integration of NumPy with other ML libraries like Pandas, scikit-learn,
and TensorFlow ensures smooth data interchange and compatibility.

● Sklearn - Scikit-learn, a widely-used machine learning library in Python, provides a comprehensive

set of tools for various aspects of ML workflows. Its extensive collection of algorithms and utilities
covers a broad range of tasks, including classification, regression, clustering, dimensionality reduction,
and model selection. With scikit-learn, ML practitioners can easily implement and experiment with
different algorithms and models without having to build everything from scratch. The library offers a
consistent and user-friendly API, making it straightforward to preprocess and transform data, split
datasets for training and testing, and evaluate model performance using various metrics. scikit-learn
also includes modules for feature extraction, feature selection, and hyperparameter tuning, enabling
researchers to fine-tune models for optimal performance.

Methods used:

● read_csv () - The .read_csv() function takes a path to a CSV file and reads the data into a Pandas
DataFrame object.

● describe() - The describe() method returns a description of the data in the DataFrame. If the
DataFrame contains numerical data, the description contains this information for each column:
count - The number of not-empty values. mean - The average (mean) value.

● groupby() and sum() -Use DataFrame.groupby().sum() to group rows based on one or multiple
columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object
which contains an aggregate function sum() to calculate a sum of a given column for each group.
The dataframe.groupby() involves a combination of splitting the object, applying a function, and
combining the results. This can be used to group large amounts of data and compute operations on
these groups such as sum().

Pandas dataframe.sum() function returns the sum of the values for the requested axis. If the input
is the index axis, then it adds all the values in a column and repeats the same for all the columns
and returns a series containing the sum of all the values in each column.
System Architecture:

Methodology:
1. Data collection: Gather the relevant data from reliable sources.
2. Data loading: Load the data into a suitable data structure (e.g., DataFrame) using a programming
language like Python.
3. Data overview: Get an initial understanding of the dataset by examining its structure, dimensions, and
basic statistical summaries.
4. Data cleaning: Handle missing values, duplicates, and outliers in the dataset to ensure data quality.
5. Data visualization: Create visual representations such as histograms, scatter plots, and box plots to
understand patterns, relationships, and distributions in the data.
6. Feature engineering: Extract or transform features to derive new meaningful variables that can
enhance the analysis and model performance.
7. Statistical analysis: Apply statistical techniques to uncover insights, correlations, and associations
within the data.
8. Data segmentation: Group and explore the data based on different criteria (e.g., demographics, time
periods) to uncover patterns and differences.
9. Conclusion and reporting: Summarize findings, draw conclusions, and present the results of the EDA
process in a clear and concise manner.
Results:
Conclusion: In this project, we analyzed the COVID-19 vaccination data in India using the
"covid_vaccine_statewise.csv" dataset. We explored the number of individuals vaccinated for the first
and second doses across different states. Additionally, we determined the number of males and females
vaccinated in the country. The insights gained from this analysis can aid in understanding the progress
of vaccination efforts in India and help in formulating effective strategies to combat the COVID-19
pandemic.

DSBDA Mini Project
No ratings yet
DSBDA Mini Project
19 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Mini Project Report
No ratings yet
Mini Project Report
25 pages
DTE Demux MP...
No ratings yet
DTE Demux MP...
17 pages
AI Mini Project
No ratings yet
AI Mini Project
29 pages
Student Result Generation
No ratings yet
Student Result Generation
6 pages
DIP Lab Manual Final
No ratings yet
DIP Lab Manual Final
31 pages
Visvesvaraya Technological University: Computer Graphics Laboratory With Mini Project 18CSL67
100% (1)
Visvesvaraya Technological University: Computer Graphics Laboratory With Mini Project 18CSL67
34 pages
LAB PROGRAMS With Screen Shot
No ratings yet
LAB PROGRAMS With Screen Shot
31 pages
GUIDELINES FOR MACHIE LEARNING EXPERIMENTS - PDF (Lakshan)
No ratings yet
GUIDELINES FOR MACHIE LEARNING EXPERIMENTS - PDF (Lakshan)
11 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
Oopcgl Mini Project
No ratings yet
Oopcgl Mini Project
6 pages
Case Study DSBDA Report Final
No ratings yet
Case Study DSBDA Report Final
24 pages
OODP Mini Project
100% (1)
OODP Mini Project
19 pages
Student Management Project Report Tkinter Mysql
No ratings yet
Student Management Project Report Tkinter Mysql
31 pages
DS&BD Lab Manul
No ratings yet
DS&BD Lab Manul
98 pages
Lab Manual - LP2 - Sem - II - 2022 - 23
No ratings yet
Lab Manual - LP2 - Sem - II - 2022 - 23
91 pages
STQA Mini Project
No ratings yet
STQA Mini Project
19 pages
System Programming & Operating System: A Laboratory Manual FOR
No ratings yet
System Programming & Operating System: A Laboratory Manual FOR
45 pages
Visvesvaraya Technological University BELGAUM-590014: Walking Robot
No ratings yet
Visvesvaraya Technological University BELGAUM-590014: Walking Robot
58 pages
Project PPT Review (Batch 1)
No ratings yet
Project PPT Review (Batch 1)
28 pages
unit 1 fpl
No ratings yet
unit 1 fpl
38 pages
Resume Screening Report (1) - Merged
100% (2)
Resume Screening Report (1) - Merged
43 pages
College Management System Project Report
No ratings yet
College Management System Project Report
30 pages
Rock Paper Scissors
No ratings yet
Rock Paper Scissors
8 pages
Data Communication Unit1 As Per Pune University
No ratings yet
Data Communication Unit1 As Per Pune University
22 pages
OOP Notes
No ratings yet
OOP Notes
18 pages
LAB MANUAL Java CS406 FORMAT
No ratings yet
LAB MANUAL Java CS406 FORMAT
9 pages
Electronic Slips.
50% (2)
Electronic Slips.
13 pages
Mini Project Synopsis
100% (1)
Mini Project Synopsis
5 pages
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
No ratings yet
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
56 pages
DBMS Mini Project
100% (1)
DBMS Mini Project
7 pages
LDCO Lab Manual
No ratings yet
LDCO Lab Manual
78 pages
MGM College of Engineering and Technology: Python Mini-Project
100% (1)
MGM College of Engineering and Technology: Python Mini-Project
13 pages
Art of Programing Notes MCA
100% (1)
Art of Programing Notes MCA
158 pages
Gtu Micro Processor Practical
100% (1)
Gtu Micro Processor Practical
79 pages
Heart Disease Prediction (Review-1)
No ratings yet
Heart Disease Prediction (Review-1)
10 pages
Question Bank - WTL-oral Question Bank - WTL-oral
No ratings yet
Question Bank - WTL-oral Question Bank - WTL-oral
9 pages
Experiment 5
100% (1)
Experiment 5
6 pages
CSE MINI PROJECT Report
No ratings yet
CSE MINI PROJECT Report
14 pages
Java Pratical Slip2222
100% (1)
Java Pratical Slip2222
43 pages
WT Lab Manual
No ratings yet
WT Lab Manual
47 pages
Paper Presentation
No ratings yet
Paper Presentation
2 pages
T.Y. B.SC Computer Science Core Java Slips 2020 Pattern
No ratings yet
T.Y. B.SC Computer Science Core Java Slips 2020 Pattern
25 pages
Advance Java Handwriting Notes
No ratings yet
Advance Java Handwriting Notes
5 pages
18csl47 - Daa Lab Manual
No ratings yet
18csl47 - Daa Lab Manual
36 pages
Back To My Village
50% (10)
Back To My Village
55 pages
Micro Project Simple-Calculator-System-Python
100% (1)
Micro Project Simple-Calculator-System-Python
15 pages
18CS42 Model Question Paper-1 With Effect From 2019-20 (CBCS Scheme) Usn: Fourth Semester B.E. Degree Examination Design and Analysis of Algorithms
No ratings yet
18CS42 Model Question Paper-1 With Effect From 2019-20 (CBCS Scheme) Usn: Fourth Semester B.E. Degree Examination Design and Analysis of Algorithms
3 pages
L-2.9 Hmac Cmac
No ratings yet
L-2.9 Hmac Cmac
14 pages
BVA Test Case Template For Commission Problem
No ratings yet
BVA Test Case Template For Commission Problem
10 pages
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
100% (1)
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
256 pages
Oop Microproject Report
No ratings yet
Oop Microproject Report
14 pages
1.write A Java Program For Sorting A Given List of Names in Ascending Order Using Command Line Arguments
No ratings yet
1.write A Java Program For Sorting A Given List of Names in Ascending Order Using Command Line Arguments
32 pages
Data Structures Using C++ Lab - Record - II A - Dec 2020
No ratings yet
Data Structures Using C++ Lab - Record - II A - Dec 2020
68 pages
Check - Circle: Thumb - Up Thumb - Down
0% (1)
Check - Circle: Thumb - Up Thumb - Down
3 pages
DSU Microproject 3rd SEM
No ratings yet
DSU Microproject 3rd SEM
15 pages
OOP Java - IMP M 1
No ratings yet
OOP Java - IMP M 1
14 pages
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet
Nptel Week 9
No ratings yet
Nptel Week 9
4 pages
Machine Learning and Big Data Analytics in Power Distribution Systems
No ratings yet
Machine Learning and Big Data Analytics in Power Distribution Systems
54 pages
The Basic Necessities Survey (BNS) - Monitoring and Evaluation NEWS
No ratings yet
The Basic Necessities Survey (BNS) - Monitoring and Evaluation NEWS
10 pages
Brochure CU Data Science 250918
No ratings yet
Brochure CU Data Science 250918
12 pages
Ijset v11 Issue6 571
No ratings yet
Ijset v11 Issue6 571
5 pages
Power System Fault Classification and Prediction Based On A Three-Layer Data Mining Structure
No ratings yet
Power System Fault Classification and Prediction Based On A Three-Layer Data Mining Structure
18 pages
ADBMS Chapter No. 6
No ratings yet
ADBMS Chapter No. 6
24 pages
Insaid GCD Brochure
No ratings yet
Insaid GCD Brochure
14 pages
Data Mining in Digital Library
No ratings yet
Data Mining in Digital Library
5 pages
Deep Learning A Comprehensive Guide 1st Edition Vasudevan
No ratings yet
Deep Learning A Comprehensive Guide 1st Edition Vasudevan
60 pages
Recurrent Neural Network Wiki
100% (1)
Recurrent Neural Network Wiki
7 pages
Ai Samvadini an Intelligent Interviewer Development Guide
No ratings yet
Ai Samvadini an Intelligent Interviewer Development Guide
67 pages
Module 1
No ratings yet
Module 1
66 pages
EICT Academy IITG - Advanced Certificate Program in Data Science and AI - Brochure
No ratings yet
EICT Academy IITG - Advanced Certificate Program in Data Science and AI - Brochure
22 pages
MLPrograma1-5 Py
No ratings yet
MLPrograma1-5 Py
17 pages
World Trends in Warehousing Logistics
No ratings yet
World Trends in Warehousing Logistics
19 pages
A Review On Generative Adversarial Networks: Algorithms, Theory, and Applications
No ratings yet
A Review On Generative Adversarial Networks: Algorithms, Theory, and Applications
28 pages
Specialization Related Notice For 2nd Sem Students
No ratings yet
Specialization Related Notice For 2nd Sem Students
2 pages
9 AIML Question bank updated 5 units
No ratings yet
9 AIML Question bank updated 5 units
21 pages
data-science-ai-revision-notes
No ratings yet
data-science-ai-revision-notes
8 pages
Final PPT Heart Disease
67% (3)
Final PPT Heart Disease
23 pages
AIML UNIT-1
No ratings yet
AIML UNIT-1
27 pages
Learning Rules of ANN
No ratings yet
Learning Rules of ANN
25 pages
6151A
No ratings yet
6151A
4 pages
4 TH
No ratings yet
4 TH
5 pages
Introduction To Data Mining
75% (4)
Introduction To Data Mining
45 pages
Download full Machine Learning in Finance: From Theory to Practice Matthew F. Dixon ebook all chapters
100% (6)
Download full Machine Learning in Finance: From Theory to Practice Matthew F. Dixon ebook all chapters
65 pages
Image Scrapper
No ratings yet
Image Scrapper
14 pages
Instant Notes in Bioinformatics 2nd Edition Charlie Hodgman - The ebook in PDF and DOCX formats is ready for download now
100% (2)
Instant Notes in Bioinformatics 2nd Edition Charlie Hodgman - The ebook in PDF and DOCX formats is ready for download now
47 pages
Module2 ML 22 01 2024 WM
No ratings yet
Module2 ML 22 01 2024 WM
42 pages

DSBDA - Mini Project Report

Uploaded by

DSBDA - Mini Project Report

Uploaded by

SCTR's Pune Institute of Computer Technology

Under the guidance of

DEPARTMENT OF COMPUTER ENGINEERING

● Sklearn - Scikit-learn, a widely-used machine learning library in Python, provides a comprehensive

You might also like