UNIT 1
UNIT 1
IV YEAR I SEMESTER
UNIT-1: MACHINE LEARNING BASICS
Prepared By:
Manoj G
Assistant Professor
GNIT, Hyderabad
WHY MACHINE LEARNING?
• Machine learning excels at handling vast amounts of complex data.
• Human analysis becomes impractical when dealing with large datasets or intricate
patterns.
• Machine learning models adapt and self-improve with new data, reducing the need
for constant manual intervention.
• Automation of tasks is efficient and consistent, leading to quicker decision-
making.
• Machine learning can make decisions without inherent human biases.
• Exploring Unseen Insights.
WHAT IS LEARNING?
• Herbert Simon: “Learning is any process by which a system improves
performance from experience.”
• What is the task?
Classification
Categorization/clustering
Problem solving / planning / control
Prediction
others
MACHINE LEARNING: DEFINITION
Definition:
A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E.
MACHINE LEARNING BASICS
Machine learning is a field of computer science that gives computers the ability to
learn without being explicitly programmed.
Machine Learning
Labeled Data algorithm
Training
Prediction
Learned
Labeled Data Prediction
model
Model - Normalization
- Transformation
Deployment Data - Missing Values
Preparation - Outliers
• Determine the predictors (features) to be used is one of the most critical questions
• Some times we need to add predictors
• Reduce Number:
• Fewer predictors more interpretable model and less costly
• Most of the models are affected by high dimensionality, specially for non-informative predictors
ML AS A PROCESS: MODEL BUILDING
• Data Splitting
• Allocate data to different tasks
• model training
• performance evaluation
• Define Training, Validation and Test sets
• Feature Selection (Review the decision made previously)
• Estimating Performance
• Visualization of results – discovery interesting areas of the problem space
• Statistics and performance measures
• Evaluation and Model selection
• The ‘no free lunch’ theorem no a priory assumptions can be made
• Avoid use of favorite models if NEEDED
TYPES OF LEARNING
• Supervised: Learning with a labeled training set
Example: email classification with already labeled emails
• Unsupervised: Discover patterns in unlabeled data
Example: cluster similar documents based on text
• Reinforcement learning: learn to act based on feedback/reward
Example: learn to play Go, reward: win or lose
class A
class A
Classification Clustering
Regression
MAXIMUM LIKELIHOOD ESTIMATION (MLE)
• Basic Idea:
MLE is a method to find the parameters of a statistical model that make the observed data most
probable under that model.
It seeks the values that maximize the likelihood function, which measures how well the model
explains the data.
• Process:
Given data and a model, MLE calculates the parameter values that maximize the likelihood of
observing the given data.
It finds the parameter values that make the data most likely to occur, assuming the model is true.
• Example in Machine Learning:
Linear Regression: In a simple linear regression model, MLE estimates the slope and intercept
that result in the line best fitting the data points.
MLE: EXAMPLE
• Suppose we have a dataset of house prices and their corresponding sizes (in square feet). We
Price=θ0+ θ1⋅Size
want to use a linear regression model to predict house prices based on size. The model is
defined as:
• Here, θ0 represents the intercept, and θ1 represents the slope of the line. We want to find the
best values for θ0and θ1 using MLE.
• Given observed data and the model, MLE determines the values of θ0 and θ1 that maximize
the likelihood.
• MLE finds the line that best fits the data points, minimizing the difference between
predicted and actual prices.
• Through MLE, we obtain the optimal parameters that make the linear regression model most
likely to produce the observed house prices based on their sizes.
BUILDING MACHINE LEARNING ALGORITHM
1. Define the Problem: Understand the problem you're addressing and set clear objectives.
Example: The objective is to develop a system that automatically identifies and classifies emails as
either "spam" or "not spam" based on their content and features.
2. Gather Data: Collect relevant and representative data for training and testing.
Example: Collect a dataset containing a diverse range of emails, both spam and non-spam, along
with corresponding labels indicating their classification.
3. Data Preparation: Clean, transform, and preprocess the data to make it suitable for
modeling.
Example: Preprocess the email text data by removing unnecessary symbols, converting text to
lowercase, and transforming words into numerical representations (e.g., word embeddings).
BUILDING MACHINE LEARNING ALGORITHM
4. Choose a Model: Select an appropriate machine learning algorithm for the task.
Example: Select a classification algorithm like Naive Bayes, Support Vector Machine (SVM), or a
deep learning model such as a Recurrent Neural Network (RNN) for text classification.
5. Evaluation: Assess the model's performance using appropriate metrics on test data.
Example: Split the dataset into training and testing sets. Train the chosen model on the training data
and evaluate its performance using metrics like accuracy, precision, recall, and F1-score on the test
set.
6. Parameter Tuning: Optimize model settings (hyperparameters) to achieve better performance.
Example: Fine-tune model parameters, like regularization strength in SVM or learning rate in an
RNN, to optimize classification performance.
7. Prediction: Deploy the trained model to make predictions on new, unseen data.
Example: After achieving satisfactory evaluation results, deploy the trained model to classify new
incoming emails as spam or not spam, helping users filter out unwanted content.
ML VS. DEEP LEARNING
• Most machine learning methods work well because of human-designed representations and input
features
• ML becomes just optimizing weights to best make a final prediction.
WHAT IS DEEP LEARNING (DL) ?
• A machine learning subfield of learning representations of data. Exceptional
effective at learning patterns.
• Deep learning algorithms attempt to learn (multiple levels of) representation by
using a hierarchy of multiple layers
• If you provide the system tons of information, it begins to understand it and
respond in useful ways.
WHY IS DL USEFUL?
o Manually designed features are often over-specified, incomplete and take a long time to
design and validate
o Learned Features are easy to adapt, fast to learn
o Deep learning provides a very flexible, (almost?) universal, learnable framework for
representing world, visual and linguistic information.
o Can learn both unsupervised and supervised
o Effective end-to-end joint system learning
o Utilize large amounts of training data
In ~2010 DL started outperforming other ML techniques first in speech and vision, then NLP
NEURAL NETWORK INTRO
Weights
𝒉= 𝝈 ( 𝐖 𝟏 𝒙 + 𝒃𝟏)
𝒚 =𝝈 (𝑾 𝟐 𝒉+𝒃 𝟐)
Activation functions
How do we train?
Demo
NEURAL NETWORKS AND THE BRAIN
• Diagram of a neuron
WHAT IS PERCEPTRON?
• A Perceptron is an Artificial Neuron
• It is the simplest possible Neural Network
• Neural Networks are the building blocks of Machine
Learning.
• In 1957 Frank Rosenblatt invented Perceptron program.
• Frank had the idea that Perceptron's could simulate brain
principles, with the ability to learn and make decisions.
WHAT IS PERCEPTRON?
• The original Perceptron was designed to take a number of binary inputs, and
produce one binary output (0 or 1).
• The idea was to use different weights to represent the importance of each input, and
that the sum of the values should be greater than a threshold value before making a
decision like yes or no (true or false) (0 or 1).
PERCEPTRON EXAMPLE
• Imagine a perceptron (in your brain).
• The perceptron tries to decide if you should go to a concert.
• Is the artist good? Is the weather good?
• What weights should these facts have?
Where, is the learning rate, is the parameter to be optimized, and depicts the gradient of the
expected loss function.
STOCHASTIC GRADIENT DESCENT (SGD)
▶ SGD computes the gradient for only one random sample at each iteration.
▶ This property of SGD helps in it being faster and efficient as it does not have to process all
the data in each of its iterations.
▶ However, the randomness of SGD contributes to the fact that it can in some cases give the
suboptimal solutions/local minima as the result rather than the global minimum.
▶ One of the techniques to overcome this fault is to decrease the learning rate of the model
over time, which helps in reducing the updates in the parameter with each iteration.
▶ SGD also has its variants, like Mini-Batch SGD, where the Gradient descent is done for a
random subset of data, and Momentum SGD, where a term is added to the gradient update to
help with the optimization and avoiding getting stuck at a local minima.
▶ SGD is majorly used in Deep Learning and has found applications in classification,
regression, and neural machine translation.
THE CURSE OF DIMENSIONALITY
Many machine learning problems become exceedingly difficult when the number of dimensions in the data is
high. This phenomenon is known as the curse of dimensionality.
For d dimensions and v values to be distinguished along each axis, we seem to need regions and examples. This
is an instance of the curse of dimensionality.
THANK YOU