Module 1 & 2
Module 1 & 2
Machine Learning algorithms are the programs that can learn the hidden
patterns from the data, predict the output, and improve the performance
from experiences on their own. Different algorithms can be used in
machine learning for different tasks, such as simple linear regression that
can be used for prediction problems like stock market
prediction, and the KNN algorithm can be used for classification
problems.
In this topic, we will see the overview of some popular and most
commonly used machine learning algorithms along with their use cases
and categories.
The below diagram illustrates the different ML algorithm, along with the
categories:
1) Supervised Learning Algorithm
Supervised learning is a type of Machine learning in which the machine
needs external supervision to learn. The supervised learning models are
trained using the labeled dataset. Once the training and processing are
done, the model is tested by providing a sample test data to check
whether it predicts the correct output.
The goal of supervised learning is to map input data with the output data.
Supervised learning is based on supervision, and it is the same as when a
student learns things in the teacher's supervision. The example of
supervised learning is spam filtering.
o Classification
o Regression
o Clustering
o Association
3) Reinforcement Learning
In Reinforcement learning, an agent interacts with its environment by
producing actions, and learn with the help of feedback. The feedback is
given to the agent in the form of rewards, such as for each good action,
he gets a positive reward, and for each bad action, he gets a negative
reward. There is no supervision provided to the agent. Q-Learning
algorithm is used in reinforcement learning. Read more…
1. Linear Regression
Linear regression is one of the most popular and simple machine learning
algorithms that is used for predictive analysis. Here, predictive
analysis defines prediction of something, and linear regression makes
predictions for continuous numbers such as salary, age, etc.
y= a0+ a*x+ b
x= independent variable
a0 = Intercept of line.
The below diagram shows the linear regression for prediction of weight
according to height: Read more..
2. Logistic Regression
Logistic regression is the supervised learning algorithm, which is used
to predict the categorical variables or discrete values. It can be
used for the classification problems in machine learning, and the output of
the logistic regression algorithm can be either Yes or NO, 0 or 1, Red or
Blue, etc.
Logistic regression is similar to the linear regression except how they are
used, such as Linear regression is used to solve the regression problem
and predict continuous values, whereas Logistic regression is used to
solve the Classification problem and used to predict the discrete values.
Instead of fitting the best fit line, it forms an S-shaped curve that lies
between 0 and 1. The S-shaped curve is also known as a logistic function
that uses the concept of the threshold. Any value above the threshold will
tend to 1, and below the threshold will tend to 0. Read more..
The data points that help to define the hyperplane are known as support
vectors, and hence it is named as support vector machine algorithm.
7. K-Means Clustering
K-means clustering is one of the simplest unsupervised learning
algorithms, which is used to solve the clustering problems. The datasets
are grouped into K different clusters based on similarities and
dissimilarities, it means, datasets with most of the commonalties remain
in one cluster which has very less or no commonalities between other
clusters. In K-means, K-refers to the number of clusters, and means refer
to the averaging the dataset in order to find the centroid.
It can be used for spam detection and filtering, identification of fake news,
etc. Read more..
It contains multiple decision trees for subsets of the given dataset, and
find the average to improve the predictive accuracy of the model. A
random-forest should contain 64-128 trees. The greater number of trees
leads to higher accuracy of the algorithm.
Random forest is a fast algorithm, and can efficiently deal with the
missing & incorrect data. Read more..
9. Apriori Algorithm
Apriori algorithm is the unsupervised learning algorithm that is used to
solve the association problems. It uses frequent itemsets to generate
association rules, and it is designed to work on the databases that contain
transactions. With the help of these association rule, it determines how
strongly or how weakly two objects are connected to each other. This
algorithm uses a breadth-first search and Hash Tree to calculate the
itemset efficiently.
The algorithm process iteratively for finding the frequent itemsets from
the large dataset.
The apriori algorithm was given by the R. Agrawal and Srikant in the
year 1994. It is mainly used for market basket analysis and helps to
understand the products that can be bought together. It can also be used
in the healthcare field to find drug reactions in patients. Read more..
PCA works by considering the variance of each attribute because the high
variance shows the good split between the classes, and hence it reduces
the dimensionality.
Some real-world applications of PCA are image processing, movie
recommendation system, optimizing the power allocation in various
communication channels. Read more..
Nowadays, Machine Learning has become one of the first choices for most
freshers and IT professionals. But, in order to enter this field, one must
have some pre-specified skills and one of those skills in Mathematics. Yes,
Mathematics is very much important to learn ML technology and develop
efficient applications for the business. When talking about mathematics
for Machine Learning, it especially focuses on Probability and Statistics,
which are the essential topics to get started with ML. Probability and
statistics are considered as the base foundation for ML and data science
to develop ML algorithms and build decision-making capabilities. Also,
Probability and statistics are the primary prerequisites to learn ML.
P (H) = ½
P (H) = 0.5
Where;
Types of Probability
For better understanding the Probability, it can be categorized further in
different types as follows:
Where;
P(A|B) = P(A∩B)/P(B)
Similarly, P(B|A) = P(A ∩ B)/ P(A) . We can write the joint Probability of as
A and B as P(A ∩ B)= p(A).P(B|A), which means: "The chance of both
things happening is the chance that the first one happens, and then the
second one is given when the first thing happened."
Statistics is the part of applied Mathematics that deals with studying and developing
ways for gathering, analyzing, interpreting and drawing conclusion from empirical data.
It can be used to perform better-informed business decisions.
o Descriptive Statistics
o Inferential Statistics
Use of Statistics in ML
Statistics methods are used to understand the training data as well as
interpret the results of testing different machine learning models. Further,
Statistics can be used to make better-informed business and investing
decisions.
Note: A decision tree can contain categorical data (YES/NO) as well as numeric
data.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further
after getting a leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.
In a decision tree, for predicting the class of the given dataset, the
algorithm starts from the root node of the tree. This algorithm compares
the values of root attribute with the record (real dataset) attribute and,
based on the comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with
the other sub-nodes and move further. It continues the process until it
reaches the leaf node of the tree. The complete process can be better
understood using the below algorithm:
o Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the
best attributes.
o Step-4: Generate the decision tree node, which contains the best
attribute.
o Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as a
leaf node.
Example: Suppose there is a candidate who has a job offer and wants to
decide whether he should accept the offer or Not. So, to solve this
problem, the decision tree starts with the root node (Salary attribute by
ASM). The root node splits further into the next decision node (distance
from the office) and one leaf node based on the corresponding labels. The
next decision node further gets split into one decision node (Cab facility)
and one leaf node. Finally, the decision node splits into two leaf nodes
(Accepted offers and Declined offer). Consider the below diagram:
1. Information Gain:
Where,
2. Gini Index:
A too-large tree increases the risk of overfitting, and a small tree may not
capture all the important features of the dataset. Therefore, a technique
that decreases the size of the learning tree without reducing accuracy is
known as Pruning. There are mainly two types of tree pruning technology
used:
Linear models in machine learning are easy to implement and interpret and are helpful in
solving many real-life use cases.
Linear Regression
Linear Regression is a statistical approach that predicts the result of a response variable by
combining numerous influencing factors. It attempts to represent the linear connection
between features (independent variables) and the target (dependent variables). The cost
function enables us to find the best possible values for the model parameters. A detailed
discussion on linear regression is presented in a different article.
Example: An analyst would be interested in seeing how market movement influences the
price of ExxonMobil (XOM). The value of the S&P 500 index will be the independent
variable, or predictor, in this example, while the price of XOM will be the dependent
variable. In reality, various elements influence an event's result. Hence, we usually have
many independent features.
Logistic Regression
Logistic regression is an extension of linear regression. The sigmoid function first transforms
the linear regression output between 0 and 1. After that, a predefined threshold helps to
determine the probability of the output values. The values higher than the threshold value
tend towards having a probability of 1, whereas values lower than the threshold value tend
towards having a probability of 0. A separate article dives deeper into the mathematics
behind the Logistic Regression Model.
Example: A bank wants to predict if a customer will default on their loan based on their
credit score and income. The independent variables would be credit score and income, while
the dependent variable would be whether the customer defaults (1) or not (0).
The relationship between the boiling point of water and change in altitude.
The relationship between spending on advertising and the revenue of an organization.
The relationship between the amount of fertilizer used and crop yields.
Performance of athletes and their training regimen.