0% found this document useful (0 votes)
6 views

UNIT 1

The document provides an overview of machine learning basics, including its definition, types, and processes involved in building machine learning algorithms. It discusses the importance of data preparation, feature engineering, model building, and evaluation, as well as the distinction between traditional machine learning and deep learning. Additionally, it covers concepts such as neural networks, perceptrons, and optimization techniques like gradient descent and backpropagation.

Uploaded by

Manoj CSE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

UNIT 1

The document provides an overview of machine learning basics, including its definition, types, and processes involved in building machine learning algorithms. It discusses the importance of data preparation, feature engineering, model building, and evaluation, as well as the distinction between traditional machine learning and deep learning. Additionally, it covers concepts such as neural networks, perceptrons, and optimization techniques like gradient descent and backpropagation.

Uploaded by

Manoj CSE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

DEEP LEARNING FOUNDATION

IV YEAR I SEMESTER
UNIT-1: MACHINE LEARNING BASICS

Prepared By:
Manoj G
Assistant Professor
GNIT, Hyderabad
WHY MACHINE LEARNING?
• Machine learning excels at handling vast amounts of complex data.
• Human analysis becomes impractical when dealing with large datasets or intricate
patterns.
• Machine learning models adapt and self-improve with new data, reducing the need
for constant manual intervention.
• Automation of tasks is efficient and consistent, leading to quicker decision-
making.
• Machine learning can make decisions without inherent human biases.
• Exploring Unseen Insights.
WHAT IS LEARNING?
• Herbert Simon: “Learning is any process by which a system improves
performance from experience.”
• What is the task?
Classification
Categorization/clustering
Problem solving / planning / control
Prediction
others
MACHINE LEARNING: DEFINITION
Definition:
A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E.
MACHINE LEARNING BASICS
Machine learning is a field of computer science that gives computers the ability to
learn without being explicitly programmed.

Machine Learning
Labeled Data algorithm

Training
Prediction

Learned
Labeled Data Prediction
model

Methods that can learn from and make predictions on data


MACHINE LEARNING AS A PROCESS
Define - Define measurable and quantifiable goals
Objectives - Use this stage to learn about the problem

Model - Normalization
- Transformation
Deployment Data - Missing Values
Preparation - Outliers

- Study models accuracy


- Work better than the naïve

approach or previous - Data Splitting


- Features Engineering
system
- Do the results make sense - Estimating
in the context of the
Model Model Performance
problem Evaluation Building - Evaluation and Model
Selection
ML AS A PROCESS: DATA PREPARATION

• Needed for several reasons


• Some Models have strict data requirements
• Scale of the data, data point intervals, etc
• Some characteristics of the data may impact dramatically on the model performance
• Time on data preparation should not be underestimated
ML AS A PROCESS: FEATURE ENGINEERING

• Determine the predictors (features) to be used is one of the most critical questions
• Some times we need to add predictors
• Reduce Number:
• Fewer predictors more interpretable model and less costly
• Most of the models are affected by high dimensionality, specially for non-informative predictors
ML AS A PROCESS: MODEL BUILDING

• Data Splitting
• Allocate data to different tasks
• model training
• performance evaluation
• Define Training, Validation and Test sets
• Feature Selection (Review the decision made previously)
• Estimating Performance
• Visualization of results – discovery interesting areas of the problem space
• Statistics and performance measures
• Evaluation and Model selection
• The ‘no free lunch’ theorem no a priory assumptions can be made
• Avoid use of favorite models if NEEDED
TYPES OF LEARNING
• Supervised: Learning with a labeled training set
Example: email classification with already labeled emails
• Unsupervised: Discover patterns in unlabeled data
Example: cluster similar documents based on text
• Reinforcement learning: learn to act based on feedback/reward
Example: learn to play Go, reward: win or lose

class A

class A

Classification Clustering
Regression
MAXIMUM LIKELIHOOD ESTIMATION (MLE)

• Basic Idea:
MLE is a method to find the parameters of a statistical model that make the observed data most
probable under that model.
It seeks the values that maximize the likelihood function, which measures how well the model
explains the data.
• Process:
Given data and a model, MLE calculates the parameter values that maximize the likelihood of
observing the given data.
It finds the parameter values that make the data most likely to occur, assuming the model is true.
• Example in Machine Learning:
Linear Regression: In a simple linear regression model, MLE estimates the slope and intercept
that result in the line best fitting the data points.
MLE: EXAMPLE
• Suppose we have a dataset of house prices and their corresponding sizes (in square feet). We

Price=θ0+​ θ1​⋅Size
want to use a linear regression model to predict house prices based on size. The model is
defined as:
• Here, θ0​ represents the intercept, and θ1 represents the slope of the line. We want to find the
best values for θ0​and θ1 using MLE.
• Given observed data and the model, MLE determines the values of θ0 and θ1 that maximize
the likelihood.
• MLE finds the line that best fits the data points, minimizing the difference between
predicted and actual prices.
• Through MLE, we obtain the optimal parameters that make the linear regression model most
likely to produce the observed house prices based on their sizes.
BUILDING MACHINE LEARNING ALGORITHM

1. Define the Problem: Understand the problem you're addressing and set clear objectives.
Example: The objective is to develop a system that automatically identifies and classifies emails as
either "spam" or "not spam" based on their content and features.
2. Gather Data: Collect relevant and representative data for training and testing.
Example: Collect a dataset containing a diverse range of emails, both spam and non-spam, along
with corresponding labels indicating their classification.
3. Data Preparation: Clean, transform, and preprocess the data to make it suitable for
modeling.
Example: Preprocess the email text data by removing unnecessary symbols, converting text to
lowercase, and transforming words into numerical representations (e.g., word embeddings).
BUILDING MACHINE LEARNING ALGORITHM

4. Choose a Model: Select an appropriate machine learning algorithm for the task.
Example: Select a classification algorithm like Naive Bayes, Support Vector Machine (SVM), or a
deep learning model such as a Recurrent Neural Network (RNN) for text classification.
5. Evaluation: Assess the model's performance using appropriate metrics on test data.
Example: Split the dataset into training and testing sets. Train the chosen model on the training data
and evaluate its performance using metrics like accuracy, precision, recall, and F1-score on the test
set.
6. Parameter Tuning: Optimize model settings (hyperparameters) to achieve better performance.
Example: Fine-tune model parameters, like regularization strength in SVM or learning rate in an
RNN, to optimize classification performance.
7. Prediction: Deploy the trained model to make predictions on new, unseen data.
Example: After achieving satisfactory evaluation results, deploy the trained model to classify new
incoming emails as spam or not spam, helping users filter out unwanted content.
ML VS. DEEP LEARNING
• Most machine learning methods work well because of human-designed representations and input
features
• ML becomes just optimizing weights to best make a final prediction.
WHAT IS DEEP LEARNING (DL) ?
• A machine learning subfield of learning representations of data. Exceptional
effective at learning patterns.
• Deep learning algorithms attempt to learn (multiple levels of) representation by
using a hierarchy of multiple layers
• If you provide the system tons of information, it begins to understand it and
respond in useful ways.
WHY IS DL USEFUL?
o Manually designed features are often over-specified, incomplete and take a long time to
design and validate
o Learned Features are easy to adapt, fast to learn
o Deep learning provides a very flexible, (almost?) universal, learnable framework for
representing world, visual and linguistic information.
o Can learn both unsupervised and supervised
o Effective end-to-end joint system learning
o Utilize large amounts of training data
In ~2010 DL started outperforming other ML techniques first in speech and vision, then NLP
NEURAL NETWORK INTRO
Weights

𝒉= 𝝈 ( 𝐖 𝟏 𝒙 + 𝒃𝟏)
𝒚 =𝝈 (𝑾 𝟐 𝒉+𝒃 𝟐)

Activation functions

How do we train?

4 + 2 = 6 neurons (not counting


inputs)
[3 x 4] + [4 x 2] = 20 weights
4 + 2 = 6 biases
26 learnable parameters

Demo
NEURAL NETWORKS AND THE BRAIN

 A neural network is a model of reasoning inspired by the human brain.


 The brain consists of a densely interconnected set of nerve cells, or basic
information-processing units, called neurons.
 The human brain incorporates nearly 10 billion neurons and 60 trillion connections,
synapses, between them.
 By using multiple neurons simultaneously, the brain can perform its functions much
faster than the fastest computers in existence today.
BIOLOGICAL NEURAL NETWORK
ARCHITECTURE OF A TYPICAL ARTIFICIAL NEURAL
NETWORK
THE NEURON AS A SIMPLE COMPUTING
ELEMENT

• Diagram of a neuron
WHAT IS PERCEPTRON?
• A Perceptron is an Artificial Neuron
• It is the simplest possible Neural Network
• Neural Networks are the building blocks of Machine
Learning.
• In 1957 Frank Rosenblatt invented Perceptron program.
• Frank had the idea that Perceptron's could simulate brain
principles, with the ability to learn and make decisions.
WHAT IS PERCEPTRON?
• The original Perceptron was designed to take a number of binary inputs, and
produce one binary output (0 or 1).
• The idea was to use different weights to represent the importance of each input, and
that the sum of the values should be greater than a threshold value before making a
decision like yes or no (true or false) (0 or 1).
PERCEPTRON EXAMPLE
• Imagine a perceptron (in your brain).
• The perceptron tries to decide if you should go to a concert.
• Is the artist good? Is the weather good?
• What weights should these facts have?

Criteria Input Weight


Artist is Good x1 = 0 or 1 w1 = 0.7
Weather is Good x2 = 0 or 1 w2 = 0.6
Friend will Come x3 = 0 or 1 w3 = 0.5
Food is Served x4 = 0 or 1 w4 = 0.3
Cold drink is Served x5 = 0 or 1 w5 = 0.4
THE PERCEPTRON ALGORITHM
• Frank Rosenblatt suggested this algorithm: 1. Set a threshold value:
• Threshold = 1.5
• Set a threshold value
2. Multiply all inputs with its weights:
• Multiply all inputs with its weights x1 * w1 = 1 * 0.7 = 0.7
• Sum all the results x2 * w2 = 0 * 0.6 = 0
x3 * w3 = 1 * 0.5 = 0.5
• Activate the output x4 * w4 = 0 * 0.3 = 0
x5 * w5 = 1 * 0.4 = 0.4
3. Sum all the results:
0.7 + 0 + 0.5 + 0 + 0.4 = 1.6 (The Weighted Sum)
4. Activate the Output:
Return true if the sum > 1.5 ("Yes I will go to the
Concert")
PERCEPTRON: CLASSIFICATION
• The equation below describes a hyperplane in the input space. This hyperplane is used to
separate the two classes C1 and C2 decision
x2 region for C1
w1x1 + w2x2 + b > 0
decision
boundary C1
decision
region for C2 x1
C2
w1x1 + w2x2 + b <= 0 w1x1 + w2x2 + b = 0
Weighted Bias
MULTILAYER NEURAL NETWORKS

• The simplest kind of feed-forward network is a multilayer perceptron (MLP).


 A multilayer perceptron is a feedforward neural network with one or more hidden
layers.
 The network consists of an input layer of source neurons, at least one middle or
hidden layer of computational neurons, and an output layer of computational
neurons.
 The input signals are propagated in a forward direction on a layer-by-layer basis.
MULTILAYER PERCEPTRON
MLP LEARNING PROCEDURE
The MLP learning procedure is as follows:
• Starting with the input layer, propagate data forward to the output
layer. This step is the forward propagation.
• Based on the output, calculate the error (the difference between the
predicted and known outcome). The error needs to be minimized.
• Backpropagate the error. Find its derivative with respect to each
weight in the network, and update the model.
• Repeat the three steps given above over multiple epochs to learn ideal
weights.
Finally, the output is taken via a threshold function to obtain the predicted class
labels.
ACTIVATION FUNCTION
• The weighted inputs are summed and passed through an activation function, sometimes
called a transfer function. An activation function is a simple mapping of summed weighted
input to the output of the neuron.
• It is called an activation function because it governs the threshold at which the neuron is
activated and the strength of the output signal.
• Traditionally nonlinear activation functions are used. This allows the network to combine
the inputs in more complex ways and in turn provide a richer capability in the functions they
can model. Examples of activation function include:
• Sigmoid Function — returns values between 0 and 1
• ReLu — negative values are floored to zero and the positive values remain as it is.
BACKPROPAGATION ALGORİTHM
• Backpropagation Algorithm has two phases:
• Forward pass phase: computes ‘functional signal’, feed forward propagation of
input pattern signals through network
• Backward pass phase: computes ‘error signal’, propagates the error backwards
through network starting at output units (where the error is the difference between
actual and desired output values)
BACKPROPAGATION ALGORITHM
• Step 1: Determine the architecture
• how many input and output neurons; what output encoding ?
• hidden neurons and layers.
• Step 2: Initialize all weights and biases to small random values, typically ∈ [-1,1], choose a learning rate η.
• Step 3: Repeat until termination criteria satisfied
• Present a training example and propagate it through the network (forward pass)
• Calculate the actual output
• Inputs applied
• Multiplied by weights
• Summed
• ‘Squashed’ by sigmoid activation function
• Output passed to each neuron in next layer
• Adapt weights starting from the output layer and working backwards (backward pass)
GRADIENT DESCENT
Gradient Descent is
▶ An optimization technique/algorithm.
▶ Mostly used in supervised machine learning models and deep learning.
▶ Also called as first order optimization algorithm.
▶ One of the most used algorithms for optimization of parameters in ML models.
The meaning of Gradient Descent:
▶ The meaning of Gradient - first order derivative/ slope of a curve.
▶ The meaning of descent - movement to a lower point.
▶ The algorithm thus makes use of the gradient/slope to reach the minimum/ lowest point of a
Mean Squared Error (MSE) function.
GD FORMULA
While performing the algorithm of gradient descent, the machine iteratively calculates the next
point it has to reach by using the gradient at the current position, and subtracting it from the
parameter value by scaling it with a learning rate. The formula for the same looks like:

Where, is the learning rate, is the parameter to be optimized, and depicts the gradient of the
expected loss function.
STOCHASTIC GRADIENT DESCENT (SGD)
▶ SGD computes the gradient for only one random sample at each iteration.
▶ This property of SGD helps in it being faster and efficient as it does not have to process all
the data in each of its iterations.
▶ However, the randomness of SGD contributes to the fact that it can in some cases give the
suboptimal solutions/local minima as the result rather than the global minimum.
▶ One of the techniques to overcome this fault is to decrease the learning rate of the model
over time, which helps in reducing the updates in the parameter with each iteration.
▶ SGD also has its variants, like Mini-Batch SGD, where the Gradient descent is done for a
random subset of data, and Momentum SGD, where a term is added to the gradient update to
help with the optimization and avoiding getting stuck at a local minima.
▶ SGD is majorly used in Deep Learning and has found applications in classification,
regression, and neural machine translation.
THE CURSE OF DIMENSIONALITY
Many machine learning problems become exceedingly difficult when the number of dimensions in the data is
high. This phenomenon is known as the curse of dimensionality.

For d dimensions and v values to be distinguished along each axis, we seem to need regions and examples. This
is an instance of the curse of dimensionality.
THANK YOU

You might also like