DATA SCIENCE
Using python
INTRODUCTION TO
PYTHON
• Python: A Versatile Programming Language
• Created by Guido van Rossum in the late 1980s
• Open-source and widely used for various applications
• Known for its simplicity and readability
CURRICULUM FOR PYTHON
Object and Data structure(String, List, Tuple, dictionary,
sets and booleans)
Statements in Python(if else, for loop and while loop)
Function in Python
Modules and packages
Basic built-in python modules
NECESSARY MODULES FOR
DATA SCIENCE
Matplot Sk-
Numpy Pandas Seaborn
lib Learn
1.NUMPY
o Numpy is a Python library used for working with arrays.
o It also has functions for working in domain of linear
algebra, fourier transform, and matrices.
o Numpy was created in 2005 by Travis Oliphant. It is an
open source project and you can use it freely.
o Numpy stands for Numerical Python.
2.PANDAS
Pandas is a Python library used for working with data
sets.
It has functions for analyzing, cleaning, exploring, and
manipulating data.
The name "Pandas" has a reference to both "Panel Data",
and "Python Data Analysis" and was created by Wes
McKinney in 2008.
3.MATPLOTLIB
Matplotlib is a low level graph plotting library in python
that serves as a visualization utility.
Matplotlib was created by John D. Hunter.
Matplotlib is open source and we can use it freely.
Matplotlib is mostly written in python, a few segments
are written in C, Objective-C and Javascript for Platform
compatibility.
4.SEABORN
Python Seaborn library is a widely popular
data visualization library that is commonly used for data
science and machine learning tasks. You build it on top of
the matplotlib data visualization library and can perform
exploratory analysis. You can create interactive plots to
answer questions about your data.
5.SK-LEARN
Scikit-learn (Sklearn) is the most useful and robust library
for machine learning in Python. It provides a selection of
efficient tools for machine learning and statistical modeling
including classification, regression, clustering and
dimensionality reduction via a consistence interface in
Python. This library, which is largely written in Python, is
built upon NumPy, SciPy and Matplotlib
DATA
PREPROCESSING
Acquiring and
importing the
dataset.
Data Handling the
Decomposition missing values.
and split.
Scaling and Handling
Normalizing the Categorical
data. Features.
Feature
engineering and
Feature
selection(import
ance).
EDA
(EXPLORATORY DATA
ANALYSIS)
Answering Questions
through data.
Data Visualization (line,
scatter plots).
Analyze various aspects of
the data.
Statistical Analysis.
Correlation
Analysis( Positive and
negative correlation,
multicollinearity)
MODELING( MACHINE
LEARNING MODELS)
Creating All
Machine
Regression:
learning Models
From Scratch Linear Ploynomial Multiple Linear
(Theory and
and with regression Regression Regression
code
modules
implementation)
.
KNN (k-Nearest SVM (Support Logistic
Classification Decision Tree
Neighbor) Vector Machine) Regression
Parameter
Optimazation
Random Forest Naïve Bayes
using Grid
Search
EVALUATION