MachineLearning Jan2nd
MachineLearning Jan2nd
language
Conducted by
Ratnam and Ratnam Training Zone
Kakinada
What is Machine learning?
• Machine Learning is a part of Artificial Intelligence
that focuses on the study of computing and
mathematical algorithms and datasets to make
decisions without writing manual code.
• In other words, machine learning is writing code that
lets machines make decisions based on pre-defined
algorithms on provided datasets.
• If machines can learn from previous experiences,
without being explicitly programmed, this is known as
Machine Learning
“Previous experience” in Machine learning
Recommendation engines
For example, I am looking for Syska power bank
in Amazon.com and usually it looks like this
Now lets scroll down and we can find two
more sections in the same page or screen like
“Customers who viewed this item also
viewed” and “Customers who bought this
item also bought” as shown in the below
images
Types of Machine learning
Supervised learning
• Supervised learning algorithms try to model
relationship and dependencies between the
target prediction output and the input
features, such that we can predict the output
values for new data based on those
relationships, which it has learned from
previous data-sets fed.
Supervised learning
• The aim of supervised machine learning is to build a
model that makes predictions based on evidence in
the presence of uncertainty.
• A supervised learning algorithm takes a known set
of input data and known responses to the data
(output) and trains a model to generate reasonable
predictions for the response to new data.
• Supervised learning uses classification and
regression techniques to develop predictive models
Supervised learning
• Linear Regression
• Logistic Regression
• Support Vector Machines (SVM)
• K Nearest Neighbors (KNN)
• Random Forest
• Decision Trees
Unsupervised learning algorithms
• K-Means
• Apriori
• C-Means
Reinforcement Learning
• Q-Learning
• SARSA (State Action Reward State Action)
Popular Machine learning algorithms
• Linear Regression
• Logistic Regression
• Decision Tree
• SVM
• Naive Bayes
• kNN
• K-Means
• Random Forest
• Dimensionality Reduction Algorithms
• Gradient Boosting algorithms
– GBM
– XGBoost
– LightGBM
– CatBoost
Linear Regression Algorithm basics
• The dataset contains tips data from different customers’ females and
males smokers and non smokers from days Thursday to Sunday, dinner or
lunch and from different tables size
• We want to predict how much tip the waiter will earn based on other
parameters
• Now lets answer few questions to get the linear regression outputs
• What is the hardest day to work? (Based on number of tables been
served)
• Lets find out what is the best day to work – maximum tips (sum and
percents)
• We can see that the tips are around 15% of the bill
• Next we will analyze who eats more (and tips
more)? Smokers or non smokers?
Transform and clean the data
AY (Actual Yes) 25 63
• Here the AN and PN is 105 and AY and PY is 63
• So calculate the accuracy we need to add the
sum of 105+63 = 168 and divide it by the sum
i.e. (105+21+25+63)
• Here 105 is also called as True Negative
• 21 is called as False Positive
• 63 is called as True positive
• 25 is called as False negative
Decision Trees
• Decision tree is a type of classification algorithms
that comes under supervised learning technique
• Decision trees are graphical representation of all
the possible solutions to decide
• Decisions are made based on some conditions
• Decisions made can easily explained
• Decision trees have root node, decision nodes
and leaf nodes
Decision Tree Terminologies
• Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two
or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree
cannot be segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root
node into sub-nodes according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted
branches from the tree.
• Parent/Child node: The root node of the tree is called the parent
node, and other nodes are called the child nodes.
Decision Tree Steps
• Step-1: Begin the tree with the root node, says S, which contains
the complete dataset.
• Step-2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
• Step-3: Divide the S into subsets that contains possible values for
the best attributes.
• Step-4: Generate the decision tree node, which contains the best
attribute.
• Step-5: Recursively make new decision trees using the subsets of
the dataset created in step -3. Continue this process until a stage is
reached where you cannot further classify the nodes and called the
final node as a leaf node.
Example
Simple examples
• From the above figure
• Root node is Salary at least $50000
• Decision nodes are outcomes of Yes and No
• Leaf nodes are final decisions made by the
model i.e. Accept offer or decline offer
Attribute Selection Measures
• There are two popular techniques for ASM,
which are:
• Information Gain
• Gini Index
Building decision tree manually