100% found this document useful (2 votes)
109 views

Machine Learning With Real Life Project: by - Rishabh Gaur

The document discusses machine learning and provides an overview of a project using the Adult dataset. It describes the company Ciperschools, introduces machine learning concepts, and outlines the step-wise process for the project including exploratory data analysis, data preparation, modeling using logistic regression, SVM and KNN algorithms, and applications of machine learning. The project analyzed the Adult dataset to predict gender and found that SVM had the highest accuracy of 85.2%.

Uploaded by

Rishab Gaur
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
109 views

Machine Learning With Real Life Project: by - Rishabh Gaur

The document discusses machine learning and provides an overview of a project using the Adult dataset. It describes the company Ciperschools, introduces machine learning concepts, and outlines the step-wise process for the project including exploratory data analysis, data preparation, modeling using logistic regression, SVM and KNN algorithms, and applications of machine learning. The project analyzed the Adult dataset to predict gender and found that SVM had the highest accuracy of 85.2%.

Uploaded by

Rishab Gaur
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Machine Learning

With Real Life Project

By – Rishabh Gaur
Table Of Content
• Certificates of Course
• Company Overview
• Introduction To Machine Learning
• Need For ML
• Pre-Requisite For ML
• Methods Of ML
• ML Algorithms
• Step-Wise Procedure For ML
• Project Overview
• Future Work
• Applications
• References
Company Overview
• Ciperschools is training Institute in Chandigarh.
• It is an online learning platform where a student can get guidance on various technologies such as –
• Machine Learning
• Data Science
• Mern Stack Development
• Web Development
• About Mentor – Kanav Bansal
• Kanav Bansal has more than four year of experience in teaching. Also has a great knowledge in
various technologies that are popular in today’s time
• Technologies – Python, Data Science, R Programming, Machine Learning, Deep Learning.
Introduction To Machine Learning
• Machine Learning is a type of artificial intelligence that extract patterns out of
raw data by using an algorithm or method.
• The main focus of ML is to make the machines able learn from experience
without being explicitly programmed or human intervention.
Need For Machine Learning
• Let us understand this with a very basic example.
• Given a dataset of Height, Weight and Gender.
Height Weight Gender
160 55 Female
175 74 Male
180 82 Male
168 63 Female
168 72 Male

• In the given dataset we have to predict the gender.


• We can say that
• if((height>155 || height<170) && (weight>50 || weight<70))
• then Gender == Female
else
Gender == Male
Pre-Requisite of ML
• Mathematics – Plays a very important role in ML. All the algorithms of ML are
implemented totally on concepts of mathematics. Before jumping on ML we should
have some knowledge on these topics –
1. Linear Algebra
2. Statistics
3. Probability
4. Calculus
• Python - It is a popular object-oriented programing language having the capabilities
of high-level programming language. Its easy to learn syntax and portability
capability makes it popular these days.
Strengths of python – Modules of python used in ml –
1. Extensive Libraries 1. Numpy
2. Improved Productivity 2. Matplotlib
3. Simple and Easy 3. Pandas
4. Readable 4. Seaborn
5. Portable 5. Plotly
6. Scikit-Learn
Methods of Machine Learning

Supervised Learning Unsupervised Learning Reinforcement Learning


Learn through Examples This method helps the When machine interacts with the
and labelled dataset of algorithm to act on that environment decides what to do to
which we know the desired information without perform the given task and learn
output. This is of two guidance. Here information from the experience. As we can see
types- is gathered according to in the fig.
1. Classification similarities, patterns and
2. Regression differences without any
prior training of data. 
Machine Learning Algorithms
• K-Nearest Neighbor Algorithm - The k-nearest neighbors (KNN) algorithm is a
simple, easy-to-implement supervised machine learning algorithm that can be used to solve
both classification and regression problems.

Suppose data of height-180 and weight-70 comes


in the future and we have categorize the data
point.

Then we have to find the K closest point to the


given data point.

And see the category of the K data points. If the


majority of the points are of male category then
the given data point will fall under male category.
• Linear Regression Algorithm – This is a technique used to model the relationships between
observed variables. In linear regression, the relationships are modeled using linear predictor functions
 whose unknown model parameters are estimated from the data. Linear regression is of two types –

Simple Linear Regression – This is a


single explanatory variable i.e., it concerns two-
dimensional sample points with one independent
variable and one dependent variable.

Multiple Linear Regression – This algorithm is


to model the linear relationship between the multiple
variables to the target variable.

Y(target variable) = m X(input variable) +


c
• Logistic Regression - It is a supervised learning algorithm used when the dependent
variable(target) is categorical or discrete. For example,
• To predict whether an email is spam (1) or (0)
• Whether the tumor is malignant (1) or not (0)
This model calculates the line that best separates
the category of the data points.
The model can only be possible to implement if
the data points are linearly separable like we can
see in the fig.
The accuracy of the model is calculated on the
basis of correctly classified points.
This accuracy is done by a mathematical function
known as gradient decent.
• K-Means Clustering Algorithm – This is an unsupervised learning algorithm which
divides the data points into homogeneous classes i.e., points with similar properties in
group and with different in another.

The first step is to identify the cluster


centroids (mean point) of the current
partition.
Assigning each point to a specific cluster
Compute the distances from each point and
allot points to the cluster where the distance
from the centroid is minimum.
After re-allotting the points, find the
centroid of the new cluster formed.
• Support Vector Machine Algorithm – This is the most powerful and flexible
supervised model of machine learning. All the limitations of logistic regression model
are overcome, as it can classify non linear data points. It separates the data points with
hyper plane and the goal of SVM is to divide the datasets into classes to find a
maximum marginal hyper plane (MMH).
Step Wise procedure of ML
Project on Adult Data Set
Step 1 – Collection of data

• This is the data set having 48842 rows and 15 columns.


• Step 2 – Exploratory Data Analysis (EDA)
• When we are getting started with a ML project, one critical principle to keep in mind is that data is
everything. But deriving truth and insight from a pile of data can be a complicated and error-prone
job.
• EDA helps to analyze the data up front i.e., it describes the data by means of statistical and
visualization techniques to bring important aspects of that data into focus.
• EDA can be done in 3 ways –
1. Statistical Analysis
2. Univariate Analysis (only for numerical columns)
3. Bivariate Analysis ( Both for numerical and categorical columns)
• Statistical Analysis – It gives the information about the central tendency as well as the spread of the
data. The central tendency means the mean, median and mode of the data columns and the spread
means the variation in the values of the data.
• The statistical Analysis is done using a special function known as Describe function. Let us see an
example of describe function.
• This is information of the Adult dataset of a place from the describe function.

• Observations –
1. Here the average hours-per-week is 40.42.
2. There is huge difference between mean and median of capital-loss and capital gain columns which that there are a lot
of outliers present in these columns.
3. The IQR of hours-per-week column is very small which means that there is not much variation in data of this column.
4. Since the values of fnlwgt does not make any sense therefore we can remove this column from the data set
• Univariate Analysis – It gives the information about a particular column in the dataset with the help
of some graphs or plots. We can use Matplotlib and seaborn libraries of python to perform Univariate
Analysis. The plot used are-
1. Boxplot 2. Histogram 3. Countplot
•Bivariate Analysis – Here we can do the comparison between two columns of the data set with the help of
plots. Bivariate Analysis done for two numerical columns as well as between numerical and categorical columns.
The plots that are used are –
•For Numerical and
categorical columns

1.Strip Plot
2.Box Plot
3.Bar Plot
4.Line Plot
• Step 3 – Data Preparation
• Since the machine can only understand bits language, therefore before giving data to the model we have to
convert the data into binary form.
• For this purpose we have special function known as get_dummies() function for the categorical data and
StandardScaller() function for numerical data.
• After transformation data looks like this

• Now the only step left is to split the data into train and test.
• Step 4 – Modeling
• Since the target variable is discrete, therefore this problem can be solved by –
1. Logistic Regression
2. S V M
3. K N N
• After applying the above three algorithm the final classification reports are
• Logistic Regression -
• SVM -

• K-Means -

• Conclusion –
1. We can conclude that `SVM` have highest `Accuracy Score` i.e. 85.2%.
2. On the other hand `Logistic Regression` accuracy score is very much near to `SVM` i.e. 84.8 %.
3. For `KNeighborsClassifier` the accuracy score is 82.3 %.
Applications
• Image Recognition
• Speech Recognition
• Traffic prediction
• Product recommendations
• Self-driving cars
• Email Spam and Malware Filtering
• Virtual Personal Assistant
• Online Fraud Detection
• Stock Market trading
• Medical
• Automatic Language Translation
Future Work
• Working on different data sets.
• Starting with Deep Learning.
-- Deep learning in comparison to ml gives more accurate predictions as it
follows the concept of neural networks. Also deep learning require less
computational power as compared to machine learning.
• Natural Processing Language – It concerned with the interaction between the
computers and human language such as speech recognition.
References
• For data sets: https://www.kaggle.com/datasets/
• For Concepts and courses :
https://www.coursera.org/learn/machine-learning
https://www.geeksforgeeks.org/machine-learning/
https://
www.simplilearn.com/big-data-and-analytics/machine-learning-certification-
training-course
For Projects :
https://
www.kdnuggets.com/2020/03/20-machine-learning-datasets-project-ideas.ht
ml

You might also like