0% found this document useful (0 votes)
30 views

Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)

Machine learning is a branch of artificial intelligence focused on building applications that can learn from data and improve over time without being explicitly programmed. There are two main types of machine learning: supervised learning, where labeled examples are used to train models to map inputs to outputs, and unsupervised learning, where unlabeled data is used to find hidden patterns or grouping in the data. Popular supervised learning algorithms include naive Bayes, support vector machines, k-nearest neighbors, and linear regression.

Uploaded by

Ayola Jayamaha
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Machine Learning: Dr. Windhya Rankothge (PHD - Upf, Barcelona)

Machine learning is a branch of artificial intelligence focused on building applications that can learn from data and improve over time without being explicitly programmed. There are two main types of machine learning: supervised learning, where labeled examples are used to train models to map inputs to outputs, and unsupervised learning, where unlabeled data is used to find hidden patterns or grouping in the data. Popular supervised learning algorithms include naive Bayes, support vector machines, k-nearest neighbors, and linear regression.

Uploaded by

Ayola Jayamaha
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Machine Learning

Dr. Windhya Rankothge (PhD – UPF, Barcelona)

Faculty of Graduate Studies & Research


Machine Learning
• A branch of artificial intelligence (AI) focused on building applications
 that learn from data and
 improve their accuracy over time without being programmed to do so.

2
Machine Learning
• In data science, an algorithm is a sequence of statistical processing steps.
• In machine learning, algorithms are 'trained' to find patterns and features
in massive amounts of data
in order to make decisions and predictions based on new data.
• The better the algorithm,
the more accurate the decisions and predictions will become as it processes more data.

3
Machine Learning

Past Future
i ct
n e d
a r p r
le
Training Model/ Testing Model/
Data Predictor Data Predictor
Machine Learning
Using and improving the model

Select and prepare a training data set 1 4

Choose an algorithm to run on the


training data set 2

3 Training the algorithm to create the


model
Machine Learning

6
Supervised Learning Algorithms

Faculty of Graduate Studies & Research


Supervised Learning Algorithms

Labeled data set

8
Supervised Learning Algorithms
• Supervised learning is where you have input variables (x) and an output variable (Y)
and you use an algorithm to learn the mapping function from the input to the output.

Y = f(X)

• The goal is to approximate the mapping function so well that


when you have new input data (x) that you can predict the output variables (Y) for that data.

9
Supervised Learning Algorithms
Accurately assign test data into specific
categories

Understand the relationship between


dependent and independent variables

10
Supervised Learning Algorithms

11
Naïve Bayes
• A statistical classification technique based on Bayes Theorem.
• One of the simplest and fast supervised learning algorithms.
• Naive Bayes classifier assumes that the effect of a particular feature in a class is independent of
other features.
• For example, a loan applicant is desirable or not depending on his/her income, previous loan and
transaction history, age, and location.
• Even if these features are interdependent, these features are still considered independently.
• This assumption simplifies computation, and that's why it is considered as naive.

12
Naïve Bayes

• P(h): the probability of hypothesis h being true (regardless of the data). This is known as the prior
probability of h.
• P(D): the probability of the data (regardless of the hypothesis). This is known as the prior
probability.
• P(h|D): the probability of hypothesis h given the data D. This is known as posterior probability.
• P(D|h): the probability of data d given that the hypothesis h was true. This is known as posterior
probability.

13
Naïve Bayes
• Assume we have a bunch of emails that we want to classify as spam or not spam.
• Our dataset has 15 Not Spam emails and 10 Spam emails. Some analysis had been done, and the
frequency of each word had been recorded as shown below:

• Note: Stop Words like “the”, “a”, “on”, “is”, “all”


had been removed as they do not carry important meaning
and are usually removed from texts.
The same thing applies to numbers and punctuations.

14
Naïve Bayes
Exploring some probabilities:
• P(Dear|Not Spam) = 8/34
• P(Visit|Not Spam) = 2/34
• P(Dear|Spam) = 3/47
• P(Visit|Spam) = 6/47

• Now assume we have the message “Hello friend”


and we want to know whether it is a spam or not.

15
Naïve Bayes
So, using Bayes’ Theorem:

Ignoring the denominator

But, P(Hello friend | Not Spam) = 0, as this case (Hello friend) doesn’t exist in our dataset, i.e. we deal
with single words, not the whole sentence, and the same for P(Hello friend | Spam) will be zero as
well, which in turn will make both probabilities of being a spam and not spam both are zero, which
has no meaning!!
16
Naïve Bayes
But wait!! we said that the Naive Bayes assumes that `the features we use to predict the target are
independent`.

17
Naïve Bayes
Now let’s calculate the probability of being spam using the same procedure:

So, the message “Hello friend” is not a spam

18
Supervised Learning Algorithms

19
Support Vector Machines (SVM)
• Typically leveraged for classification problems (can be used for regression too),
constructing a hyperplane where the distance between two classes of data points is at its
maximum.
• This hyperplane is known as the decision boundary,
separating the classes of data points (e.g., oranges vs. apples) on either side of the plane

20
Support Vector Machines (SVM)
• Plot each data item as a point in n-dimensional space (where n is number of features you have)
with the value of each feature being the value of a particular coordinate.
• Perform classification by finding the hyper-plane that differentiates the two classes very well.

Which Hyperplane ?

21
Support Vector Machines (SVM)
• The vector points closest to the hyperplane are known as the support vector points because only
these two points are contributing to the result of the algorithm, and other points are not.
• The distance of the vectors from the hyperplane
is called the margin, which is a separation
of a line to the closest support vector points.
• We would like to choose a hyperplane
that maximizes the margin between classes.

22
Support Vector Machines (SVM)

23
Support Vector Machines (SVM)

24
Support Vector Machines (SVM)

25
Support Vector Machines (SVM)
• Maximizing margin is equivalent to Minimizing Loss (Minimizing misclassification)
• The loss function that SVM uses is known as hinge loss

• The dimension of the hyperplane depends upon the number of features.


• If the number of input features is 2, then the hyperplane is just a line (linear-hyperplane).
• If the number of input features is 3, then the hyperplane becomes a two-dimensional plane.
• It becomes difficult to imagine when the number of features exceeds 3.

26
Support Vector Machines (SVM)
• SVM has a technique called the kernel trick.
• These are functions that take low dimensional input space and transform it into a higher-
dimensional space
• It converts not separable problem
to separable problem.

27
Support Vector Machines (SVM)

28
Support Vector Machines (SVM)

29
Supervised Learning Algorithms

30
K-Nearest Neighbors (K-NN)
• Can be used for Regression as well as for Classification but mostly it is used for the Classification
problems.
• A non-parametric algorithm, which means it does not make any assumption on underlying data.

31
K-Nearest Neighbors (K-NN)

32
K-Nearest Neighbors (K-NN)

33
K-Nearest Neighbors (K-NN)

34
K-Nearest Neighbors (K-NN)

35
K-Nearest Neighbors (K-NN)
• Research has shown that no optimal number of neighbors (k) suits all kind of data sets.
• Each dataset has it's own requirements.
• In the case of a small number of neighbors, the noise will have a higher influence on the result,
and a large number of neighbors make it computationally expensive.

36
Supervised Learning Algorithms
Accurately assign test data into specific
categories

Understand the relationship between


dependent and independent variables

37
Supervised Learning Algorithms

38
Linear Regression
• Performs the task to predict a dependent variable value (y) based on a given independent variable
(x).
• So, this regression technique finds out a linear relationship between x (input) and y(output)

39
Linear Regression

40
Linear Regression

41
Linear Regression

42
Linear Regression

43

You might also like