0% found this document useful (0 votes)
2 views

01ML Introduction

The document provides an introduction to Machine Learning (ML), defining it as a subset of Artificial Intelligence that enables computers to learn from data without explicit programming. It outlines various types of ML, including supervised, unsupervised, semi-supervised, and reinforcement learning, along with examples and applications in fields such as healthcare and finance. Additionally, it contrasts ML with traditional programming approaches, highlighting the adaptability and data dependency of ML systems.

Uploaded by

tanveerok355
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

01ML Introduction

The document provides an introduction to Machine Learning (ML), defining it as a subset of Artificial Intelligence that enables computers to learn from data without explicit programming. It outlines various types of ML, including supervised, unsupervised, semi-supervised, and reinforcement learning, along with examples and applications in fields such as healthcare and finance. Additionally, it contrasts ML with traditional programming approaches, highlighting the adaptability and data dependency of ML systems.

Uploaded by

tanveerok355
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 80

Machine Learning:

Introduction

1
Books
• Text Books:

• "Machine Learning: A Probabilistic


Perspective" by Kevin P. Murphy.
• "Pattern Recognition and Machine
Learning" by Christopher M. Bishop
• "Hands-On Machine Learning with Scikit-
Learn, Keras, and TensorFlow" by Aurélien
Géron

2
3
Machine Learning -- A Formal Definition
• "A computer program is said to learn from
experience E with respect to some class of tasks T and
performance measure P if its performance at tasks in T,
as measured by P, improves with experience E.” (Tom
M. Mitchell )
• Machine Learning is the field of study that
gives computers the ability to learn without
being explicitly programmed (Arthur Samuel)

4
Machine Learning
• Machine Learning (ML) is a subset of
Artificial Intelligence (AI) that focuses on
building systems that can learn from and
make decisions based on data. Instead of
being explicitly programmed to perform
specific tasks, machine learning models
improve automatically over time by
identifying patterns and relationships
within data.
• Machine Learning is a branch of
artificial intelligence that enables
computers to learn patterns from data and
make predictions or decisions without 5
Human Learning
Iteration 0
The first number in the series is 1.
Example What is the next number ?
the number prediction/guessing game Iteration 1
A person tells you a series of (integer) The first two numbers in the series are 1, 3
numbers and then asks you to What is the next number ?
predict/guess the next number in the series
using the process on the right. . Iteration 2
The first three numbers in the series are 1, 3, 5
What is the next number ?

Iteration 3
The first four numbers in the series are 1, 3, 5, 7
What is the next number ?

6
Human Learning
Iteration 0

Example The first number in the series is 1.


What is the next number ?
the number prediction/guessing game
Iteration 1
As you see, in iteration 0, your ability to predict
The first two numbers in the series are 1, 3
the next number is weak because the data is
insufficient to make a good guess. In iteration 2, What is the next number ?
you ability to guess the next number somewhat
Iteration 2
improves because now two numbers are given.
Your ability to guess the next number is even The first three numbers in the series are 1, 3, 5
better and you can tell with more certainty that What is the next number ?
the next number is 5.
Iteration 3
The more data you see/experience, the better
your performance on predicting the next The first four numbers in the series are 1, 3, 5, 7
number. What is the next number ?

7
Human Learning

Example
the number prediction/guessing game
Iteration, Next number,
0 1
After seeing sufficient numbers in the series, you are 1 3
able to predict the next number with higher 2 5
confidence. If you think more deeply about the link 3 7
between the numbers in the series, you may discover 4 9
a formula/model that enables you to tell the next
number in any iteration. One such model is …. …

The model/predictor/hypothesis is:


Where is the given iteration number ( and N is the 𝑵 =𝟐 𝒏+𝟏
next number in the iteration. We say that you
derived a model of the data, which is the above
equation.
8
Machine Learning and Human Learning

Iteration, Next number, Iteration, Next number,


0 1 0 1
1 3 1 3
replace human with
2 5
ML algorithm 2 5 input to the
3 7 3 7 ML
4 9 4 9 algorithm
…. … …. …

the algorithm infers


the model from the
data
machine
( 𝑁 =2 𝑛+1) learning ( 𝑁 =2 𝑛+1)
algorithms

a human can derive a model from the output from the ML


data that describes the data. algorithm 9
Corona infection propagation
model

(𝑦)
No of infection
day( 𝑥)

The model 𝑦 =2.19 𝑥 2 − 0.86 𝑥 −0.27

10
Machine Learning Process
Example 1. object recognition Example 2. classifying breast cancer
- doctors collect data
of breast cancer patients.
- the data contains information about the tumor, e.g.,
size, diameter, etc. of the tumor,
contains images of hen, cat, and dog and whether the tumor is benign or malignant
in the form of pixel data
data data - The ML algorithm derives a
patterns patterns model from the data that can be used to predict
the cancer type of a new patient
ML
the algo processes the data Algorithm
ML
Algorithm
and outputs a model, e.g., new pattern new pattern
in the form a mathematical what is the class type of
formula this new pattern (i.e., the what is class type of
model object)? model this new pattern?

(the model tells the class type of the (class type of the
new object/pattern, i.e., “the hen”) new pattern)

11
The Machine Learning Process

features of the tumor, inputs outputs

Mean Mean Mean area Mean Fractal


radius texture perimeter ….. dimension target

patient 1 17.9 10.4 12.3 26.54 …. 0.12 0 patient


benign
patient 2 20.56 17.8 13.3 18.6 …. 0.08 0
tumor … … … … … …
observations patient 3 19.6 21.2 13.0 24.3 …. 0.08 0

. . . . .
. . . . …. .
. . . . .

patient m 7.70 24.5 47.8 0.00 …. 0.070 1


malignant
when data of new patient
is put into the model, value
of is expected to come out
to be 0 if the tumor is benign
and 1 if malignant.

machine
learning 0.45 +3.2 𝑥 1 +0.5 𝑥 2+ …+3.2 𝑥𝑛 = ^
𝑦
algorithm

output from the algorithm

12
the (machine) learning process

the dataset
train
the training data

new data
The dataset is partitioned
the test data
into training data and the
the machine learning
algorithm test data. The training set is
used to derive the model and
the test data is used to
infer validate/test the model.
validate predict
the model
(hypothesis, predictor)

(prediction)

13
Types of Machine Learning
• Supervised Learning
• Unsupervised Learning
• Semi-supervised Learning
• Reinforcement Learning

14
Supervised Learning
• The model is trained on labeled data, which means it
knows the correct output during the training process
• Examples: Classification (e.g., spam detection),
regression (e.g., predicting house prices)
• Algorithms: Linear Regression, Decision Trees, Support
Vector Machines (SVM), Neural Networks

15
Example of Classification:
Step-1: Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier


Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
16
Step-2: Testing & Using the Model in Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes 17
Unsupervised Learning
• The model is trained on data without labels. It tries to
find hidden patterns or intrinsic structures in the data
• Examples: Clustering (e.g., customer segmentation),
association rule mining (e.g., market basket analysis)
• Algorithms: K-means Clustering, Principal Component
Analysis (PCA), Hierarchical Clustering

18
Semi-supervised Learning
• Semi-supervised learning (SSL) leverages both
labeled and unlabeled data to improve model
performance
• It uses a small amount of labeled data along
with a large amount of unlabeled data to
improve learning accuracy
• This is particularly useful when labeling data is
expensive or time-consuming, but collecting
unlabeled data is easy
19
Semi-supervised Learning:
Example
• Imagine you are building a spam detection system for emails
• You have 1,000 labeled email (some marked as spam, some as
non-spam)
• You also have 100,000 unlabeled emails (emails without any
spam/non-spam tags)
• Instead of labeling all 100,000 emails manually (which is
expensive), you can use semi-supervised learning to train a
model as following:
1.Train an initial model using the 1,000 labeled emails
2.Use this model to predict labels for the 100,000 unlabeled emails
with some level of confidence
3.Retrain the model using both labeled and confidently predicted data
• This process improves accuracy without requiring extensive
manual labeling 20
Approach Used in Semi-Supervised Learning

• Self-Training (Pseudo-Labeling)
• Train a model on a small labeled dataset
• Predict labels for unlabeled data
• Select emails with high-confidence predictions (e.g.,
above 90% confidence) and add them to the labeled
dataset
• Retrain the model using both real and pseudo-labeled
data

21
Real-World Applications of
Semi-Supervised Learning

• Medical Diagnosis:
• Labeled X-rays are expensive to obtain
(require doctors' expertise), but there are
many unlabeled X-rays
• SSL helps train better models without
excessive manual labeling

22
Reinforcement Learning
• The model, called an agent, interacts with an
environment and learns to take actions that maximize a
cumulative reward over time. Feedback is given through
rewards and penalties.
• Examples: Game playing (e.g., AlphaGo), robotic
control, recommendation systems.
• Algorithms: Q-learning, Deep Q-Networks (DQN),
Policy Gradient Methods

23
Machine Learning vs Traditional
Programming
• The key difference between Machine
Learning (ML) and Traditional
Programming lies in how they handle
problems and generate solutions.

24
Approach to Problem-Solving
• Traditional Programming: In traditional programming, a
developer writes explicit rules (code) to solve a problem. The
logic is manually coded based on the programmer’s
understanding of the task. The system follows these rules to
process inputs and generate outputs.

• Machine Learning: In machine learning, instead of manually


coding rules, the system learns patterns from data. A model is
trained on examples, and based on these examples, it can
generalize and make predictions on new data. The algorithm
learns the rules on its own.

25
Rules vs. Learning
• Traditional Programming: The rules are explicitly defined by
humans. If there is a change in the logic or requirements, the
programmer must modify the code accordingly.

• Machine Learning: The system automatically derives rules from


the data during training. If the input data changes or grows, the
model can learn and adapt without the need for new rules to be
programmed manually.

26
Flexibility and Adaptability
• Traditional Programming: It is less flexible and less adaptive.
If new data or scenarios arise that are not accounted for in the
programmed rules, the system will fail or require additional
coding. Traditional programs are rigid and do not generalize well
to changes in the data.

• Machine Learning: ML models are highly adaptable. Once


trained, they can make predictions on new, unseen data. As the
data evolves, the model can be retrained to capture new patterns
without needing significant code changes.

27
Handling Complexity
• Traditional Programming: Works well for simple or moderately
complex tasks with clear, well-defined rules. For highly complex
problems with many variables and patterns, writing explicit rules
becomes difficult, time-consuming, and error-prone.

• Machine Learning: Excels in handling complex tasks where the


relationships between variables are too complicated for humans
to code explicitly. ML algorithms can handle high-dimensional
data (many features) and non-linear relationships.

28
Data Dependency
• Traditional Programming: Works based on the developer’s
understanding of the problem, with limited reliance on data. Data
is used as input, but it doesn't change the logic or rules of the
system.

• Machine Learning: Heavily dependent on data. The quality and


quantity of data directly influence the performance of the model.
The system learns from data, and more data usually improves
the model’s performance.

29
Errors and Maintenance
• Traditional Programming: If there is an error or bug, it usually
requires human intervention to debug and fix the logic in the
code. Maintaining the system can be difficult, especially as
complexity grows.

• Machine Learning: Errors may come from incorrect training


data or the choice of algorithm, and fixing these issues often
involves tweaking the data or retraining the model rather than
debugging specific code. Maintenance focuses more on updating
the model with new data.

30
Performance Over Time
• Traditional Programming: Over time, traditional systems may
become outdated if new scenarios or edge cases arise. They
require periodic updates and manual modifications to stay
relevant.

• Machine Learning: Machine learning models can improve over


time as they are exposed to more data, leading to better
performance. With more examples, the model learns and
generalizes better to new data.

31
Human Involvement
• Traditional Programming: Heavy reliance on human
intervention for designing and updating systems. Every new case
or scenario must be accounted for by the programmer.

• Machine Learning: Reduces the need for constant human


intervention once the system is trained. Human involvement is
primarily during the training and evaluation phases. Afterward,
the model makes predictions autonomously.

32
Applications of Machine
Learning
• Healthcare: AI models help in diagnosing
diseases, analyzing medical images, and
predicting patient outcomes.
• Finance: Fraud detection systems, and
personalized financial advice are driven by
machine learning.
• Retail: Product recommendation systems
(e.g., Amazon), dynamic pricing, and
customer segmentation.

33
Applications of Machine Learning
• Self-Driving Cars: Machine learning helps cars
recognize objects, make decisions, and navigate streets.
• Natural Language Processing (NLP): Chatbots,
virtual assistants, and language translation are powered
by machine learning models that understand and
generate human language.

34
Challenges in Machine
Learning
• Data Quality and Availability: High-quality, relevant,
and sufficient data is crucial for training effective
models. Poor data leads to poor models.
• Overfitting: When a model performs well on training
data but poorly on new, unseen data.
• Bias and Fairness: ML models can unintentionally
perpetuate biases if the data used for training is biased.

35
Challenges in Machine
Learning
• Explainability: Some models, especially deep learning
models, are often black boxes, making it difficult to
understand how they arrived at a decision.
• Computational Resources: Training large models,
especially in deep learning, can be computationally
expensive and require powerful hardware.

36
Advantages of ML
• Machine Learning (ML) offers several advantages that
make it a powerful tool for solving complex problems
across various industries. These advantages stem from
its ability to learn from data, adapt to changing
conditions, and handle tasks that are difficult for
traditional programming approaches.

37
Automation of Tasks
• Reduced Human Intervention: Once trained, machine learning
models can perform tasks autonomously, reducing the need for
human involvement in decision-making processes. This
automation increases efficiency and reduces labor costs.

• Scalability: ML can automate tasks on a large scale, handling


vast amounts of data without needing manual intervention. This
makes it suitable for applications such as customer service
chatbots, recommendation systems, and autonomous vehicles.

38
Ability to Handle Large and
Complex Data

• Big Data-Friendly: Machine learning algorithms excel at


handling large volumes of data, making them particularly useful
in the era of "big data." They can analyze and derive meaningful
insights from complex and high-dimensional datasets, which
would be difficult to process using traditional methods.

• Multidimensional Data: ML can handle multiple features or


variables simultaneously, uncovering hidden patterns and
relationships within large datasets that may not be obvious.

39
Improved Accuracy Over
Time
• Self-Learning Capability: Machine learning models improve
their accuracy and performance as they are exposed to more
data. The model’s ability to learn from experience (data) allows it
to adapt and refine its predictions over time, reducing errors and
increasing reliability.

• Adaptability: In dynamic environments where data changes


frequently (e.g., financial markets, fraud detection), ML models
can update themselves through retraining, ensuring continued
accuracy without the need for frequent manual updates.

40
Efficiency and Speed
• Fast Processing of Large Data: ML algorithms can process
large datasets much faster than human analysts. Once trained,
models can make predictions and decisions in real time, enabling
quick responses to changing situations (e.g., fraud detection
systems).

• Automation of Complex Calculations: Machine learning can


automate complex tasks that require large-scale calculations,
such as image recognition or natural language processing, which
would be difficult and time-consuming for humans.

41
Ability to Identify Patterns and
Trends
• Pattern Recognition: Machine learning models are excellent at
identifying hidden patterns and trends within data, often beyond
human capability. This allows organizations to gain valuable
insights and make data-driven decisions.

• Predictive Analysis: ML models can predict future outcomes


based on historical data, helping businesses forecast demand,
detect early signs of equipment failure, or anticipate customer
behavior trends.

42
Adaptability to Dynamic
Environments
• Real-Time Learning: Machine learning models, particularly in
the case of online learning or reinforcement learning, can learn
and adapt to new data in real-time. This makes them suitable for
environments that are constantly changing, such as stock
trading, weather forecasting, and traffic management.

• Continuous Improvement: With more data, ML models can


continuously improve their predictions and decision-making
processes, making them highly effective in dynamic settings.

43
Reduction of Human Bias
• Data-Driven Decisions: Machine learning models base their
decisions on data, which can reduce the influence of human
biases. This is especially important in fields like hiring, lending, or
criminal justice, where decisions based on biased human
judgment could lead to unfair outcomes.

• More Objective Insights: By learning from diverse data,


machine learning can provide more objective insights that are
not influenced by personal opinions or emotions.

44
Handling of Unstructured
Data
• Versatility with Different Data Types: Machine learning
algorithms can work with both structured data (e.g., databases,
spreadsheets) and unstructured data (e.g., images, text, audio).
This is particularly valuable as much of the world's data is
unstructured, like social media posts, emails, or videos.

• Natural Language Processing (NLP): ML enables systems to


understand, process, and generate human language, allowing for
applications like language translation, sentiment analysis, and
virtual assistants (e.g., Siri, Alexa).

45
Wide Range of Applications
• Cross-Industry Use Cases: Machine learning is applied in
numerous fields including healthcare, finance, retail, marketing,
autonomous vehicles, and more. Its flexibility means it can be
used for tasks such as disease diagnosis, fraud detection,
customer segmentation, or predictive maintenance.

• Healthcare Advancements: ML can assist in diagnosing


diseases, predicting patient outcomes, and even discovering new
drugs by analyzing medical data, making healthcare more
efficient and precise.

46
Reduction of Errors
• Lower Risk of Human Errors: Since machine learning models
are based on data and predefined algorithms, they are less prone
to the types of errors that humans might make, such as fatigue
or oversight. Once a model is trained, it consistently applies
learned patterns to all inputs, ensuring greater reliability.

• Higher Precision: In applications like medical diagnostics or


financial forecasting, machine learning models can achieve
higher levels of precision than traditional approaches, reducing
the likelihood of errors in critical decision-making.

47
Cost-Effectiveness
• Efficiency Gains: By automating repetitive tasks and reducing
the need for human intervention, machine learning helps reduce
operational costs. For instance, in customer service, ML-powered
chatbots can handle routine queries, freeing up human agents for
more complex tasks.

• Scalability: Once developed, machine learning systems can


scale without the need for additional manual effort, allowing
companies to grow their operations without a corresponding
increase in costs.

48
Real-Time Data Processing
• Immediate Insights: ML systems can process and analyze data
in real-time, making them suitable for applications where quick
decisions are critical, such as real-time bidding in online
advertising or detecting fraudulent transactions as they occur.

• Time-Sensitive Decisions: In sectors like finance or healthcare,


timely insights can make the difference between success and
failure. ML models can provide those insights almost
instantaneously.

49
Solving Complex Problems
• Ability to Handle Non-Linear Relationships: Machine
learning is especially useful for solving problems where
relationships between variables are complex or non-linear. For
example, neural networks can identify patterns in image data
that are far too complex for rule-based systems to capture.

• Multi-Tasking: Multi-Tasking: Some machine learning models


(especially deep learning models) can solve multiple tasks
simultaneously, such as classifying images and identifying
objects within them.

50
Disadvantages of ML
• While Machine Learning (ML) offers many
advantages, it also comes with certain challenges and
disadvantages. Understanding these limitations is
important to ensure its effective application and to
address the potential issues that arise in real-world
scenarios.

51
Data Dependency
• Requires Large Amounts of Data: Machine learning
algorithms are data-hungry, meaning they typically require vast
amounts of data to function effectively. The quality, quantity, and
diversity of the data are crucial for model performance. Without
enough representative data, the model may fail to generalize
well to new data.

• Garbage In, Garbage Out: The accuracy of machine learning


models is heavily dependent on the quality of the data. If the
data is incomplete, noisy, or biased, the model will learn
inaccurate or misleading patterns, resulting in unreliable
predictions.

52
Complexity and Interpretability
• Black Box Nature: Many machine learning models,
especially deep learning and neural networks, are
considered "black boxes" because their decision-making
processes are not easily interpretable. It is often challenging
to understand how the model arrived at a particular decision
or prediction, which can be problematic in fields where
transparency is crucial (e.g., healthcare, finance).

• Lack of Explainability: The lack of explainability in


machine learning models can be a barrier in industries that
require clear and interpretable decision-making processes,
such as legal or regulatory environments. This also makes
debugging and improving the model more challenging.

53
High Computational Costs
• Resource-Intensive: Training complex machine learning
models, especially deep learning models, can require significant
computational resources, including powerful GPUs, large amounts
of memory, and distributed computing. This can make ML models
expensive to develop and deploy, especially for small
organizations.

• Time-Consuming: Developing and fine-tuning machine learning


models can take a lot of time. The process of training, validating,
and tuning models to achieve good performance is iterative and
time-intensive, especially when working with large datasets.

54
Overfitting
• Overfitting to Training Data: Overfitting occurs when a model
performs exceptionally well on the training data but fails to
generalize to new, unseen data. This happens when the model
becomes too complex and starts memorizing the training data,
including noise and outliers, rather than learning the underlying
patterns.

• Mitigating Overfitting: Preventing overfitting requires


techniques like cross-validation, regularization, or pruning, which
can add complexity to the development process.

55
Requires Expertise
• Specialized Knowledge Needed: Developing, training, and fine-tuning
machine learning models requires a high level of expertise in various domains,
such as data science, statistics, and computer science. The scarcity of skilled
professionals in machine learning can make it difficult for organizations to
implement ML solutions effectively.

• Complex Model Selection: Choosing the right algorithm and model for a
particular task can be difficult. There is no one-size-fits-all solution, and
determining the best model often requires experimenting with multiple
algorithms and tuning hyperparameters, which adds complexity to the
process.

56
Bias and Fairness Issues
• Bias in Data: Machine learning models can inherit biases
present in the training data, leading to unfair or discriminatory
outcomes. If the data used for training is not representative of
the broader population, the model may produce biased
predictions.

• Unintended Consequences: Bias in machine learning can have


serious ethical implications in sensitive areas like criminal justice,
hiring, and lending, where biased predictions can lead to
discrimination.

57
Generalization Issues
• Struggles with Generalization: Some machine learning
models, especially those trained on specific, narrow datasets, can
struggle to generalize to new, unseen data. They may work well
within the training environment but perform poorly in real-world
applications where the data distribution differs from the training
data.

• Domain Shifts: Machine learning models can fail when the data
distribution changes significantly after deployment. This issue,
called domain shift, can make the model's predictions unreliable
in new environments.

58
Security and Privacy Concerns
• Data Privacy Issues: Machine learning models often require
vast amounts of personal or sensitive data to function effectively.
Collecting, storing, and processing such data can raise privacy
concerns, especially when used in applications like healthcare,
finance, or surveillance.

• Vulnerability to Attacks: Machine learning models are


vulnerable to adversarial attacks, where malicious actors
manipulate the input data to deceive the model. For instance, in
image recognition, small changes to an image can trick the
model into making incorrect classifications.

59
High Initial Costs
• Initial Development Costs: Developing and deploying machine
learning solutions can be expensive, especially when building
custom models from scratch. This includes costs related to hiring
experts, acquiring necessary computational resources, and
curating high-quality datasets.

• Ongoing Maintenance Costs: Machine learning models require


continuous monitoring, retraining, and maintenance to ensure
they remain accurate as new data becomes available. This
ongoing maintenance can add to the long-term cost.

60
Ethical Concerns
• Lack of Accountability: In many machine learning systems,
particularly those used in decision-making processes (e.g., in
hiring, lending, legal, or law enforcement), it can be difficult to
assign accountability if the system makes a wrong or biased
decision.

• Unintended Consequences: ML systems can have far-reaching


societal impacts, sometimes reinforcing harmful stereotypes or
creating dependencies that could lead to job displacement in
certain industries.

61
Prerequisites of ML
• Before diving into machine learning, it's helpful to have
a solid foundation in several key areas. These
prerequisites will give you the tools to understand how
machine learning models work, how to apply them
effectively, and how to troubleshoot issues when they
arise.

62
Linear Algebra
• Understanding concepts such as vectors,
matrices, eigenvalues, and eigenvectors is
crucial. Many machine learning algorithms,
particularly in deep learning, rely heavily
on linear algebra for operations like matrix
multiplication, transformations, and
optimizations.
• Key topics: Matrices and vectors, matrix
multiplication, dot product, matrix inverses,
determinants, eigenvalues, and eigenvectors.

63
Probability and Statistics
• Machine learning models often make
predictions based on probabilities.
Understanding probability distributions,
Bayes' theorem, and statistical methods for
hypothesis testing helps in interpreting
model predictions and performance.
• Key topics: Conditional probability, Bayes'
theorem, distributions (normal, binomial, etc.),
variance, expectation, hypothesis testing.

64
Calculus
• Many machine learning algorithms,
especially in optimization (such as gradient
descent), require a solid understanding of
differential calculus. You’ll often use
derivatives to minimize a loss function.

• Key topics: Differentiation, partial derivatives,


gradients, chain rule, optimization techniques.

65
Programming
• Python: Python is the most widely used programming
language for machine learning due to its simplicity and
powerful libraries. You should be familiar with writing
clean, efficient code, debugging, and working with
Python’s ML-related libraries.
• Key topics: Variables, functions, control structures (loops,
conditionals), object-oriented programming, modules, and
error handling.

66
Programming
• Key Libraries
• NumPy: For numerical computations and
working with arrays.
• Pandas: For data manipulation and analysis.
• Matplotlib / Seaborn: For data visualization.
• Scikit-Learn: For implementing various
machine learning algorithms.
• TensorFlow / PyTorch: For deep learning and
neural networks.

67
Data Handling
• Data Collection: Understand how to collect,
store, and retrieve data from various sources
(databases, APIs, CSV files, etc.).
• Data Preprocessing: Before feeding data into
a model, it often requires cleaning (handling
missing values, duplicates), normalization,
encoding categorical data, feature selection, and
handling imbalanced datasets.
• Exploratory Data Analysis (EDA): Techniques
like data visualization and descriptive statistics
are essential for understanding the data before
applying machine learning models.
68
Algorithms and Data Structures
• Basic Algorithms: Understanding sorting, searching,
and optimization algorithms can be helpful in optimizing
machine learning workflows.
• Data Structures: Knowledge of arrays, lists,
dictionaries, stacks, queues, and trees will help in
organizing and managing data efficiently.

69
Machine Learning Concepts
• Understanding of ML Types: You should
be familiar with different types of machine
learning—supervised, unsupervised, and
reinforcement learning.
• Model Evaluation Metrics: Learn about
metrics like accuracy, precision, recall, F1
score, confusion matrix, and AUC-ROC for
classification tasks, and RMSE or MAE for
regression tasks.
• Training vs Testing: Understand the
concept of training a model on a portion of
the dataset and testing it on unseen data.
70
Machine Learning Concepts
• Overfitting and Underfitting: Learn about these
common problems and how to tackle them using
techniques like regularization and cross-validation.
• Cross-Validation: Techniques like K-fold cross-
validation ensure that your model generalizes well to
unseen data.

71
Deep Learning
• Once you have a solid understanding of
traditional machine learning, you can
explore deep learning. This involves using
artificial neural networks (ANNs) for tasks
like image recognition, natural language
processing, and reinforcement learning.
• Neural Networks: Understand the structure of
a neural network (layers, neurons, activation
functions, weights, biases).
• Optimization: Learn about advanced
optimization techniques like stochastic gradient
descent, Adam optimizer, and backpropagation.

72
Problem-Solving and Analytical Thinking

• Machine learning involves working with real-world data,


which is often messy and complex. You’ll need strong
problem-solving skills to clean the data, choose
appropriate models, tune hyperparameters, and
interpret results.

73
ML Frameworks and Tools
• Jupyter Notebooks: A popular environment for writing
Python code, visualizing results, and sharing ML
projects.
• ML Frameworks: Familiarity with frameworks like
Scikit-learn for classical ML tasks, TensorFlow, or
PyTorch for deep learning will help you implement
models effectively.
• Version Control (Git): Using version control systems
like Git helps in managing code and collaborating on
projects.

74
Cloud Computing
(Optional but Valuable)

• Knowledge of cloud platforms like AWS, Google Cloud,


or Azure for deploying machine learning models can be
beneficial, especially when dealing with large-scale
models or datasets.

75
learning modes

supervised unsupervised semi-supervised reinforcement


learning learning learning learning
features of type of features of features of type of
the tumor the tumor the tumor the tumor the tumor a machine learning mode/strategy for learn
in a dynamic environment … when the learning
agent makes an action, it receives a negative or
positive feedback from the environment telling
it whether the action was correct or incorrect
Observations

Observations

Observations
forcing it to modify its behavior.

Autonomous vehicle and robots may use this


type of learning, for example, how to navigate
congested environments.

Input output Input Input output


The expert provides us data No output is provided to us.
about tumors of various The learning algorithm has to Only partial output is
patients and tells which cases group the data in categories. provided to us, i.e., some
were found to be or indicated of the output values are
malignant and which benign. -also called learning without missing.
a teacher
-also called learning with a
teacher
76
learning tasks

regression classification clustering dimensionality


reduction

involves the reduction of


features or inputs in the data.

input output input output There is no output given in the training


data and the task of the algorithms is to
group the instances in categories based
on some measure of similarity.

The output is a continuous value. In math terms,


the task is finding a function that maps the input to
one or more continuous output values, e.g.,
The output is a discrete value that represents a class
label. For example, in the case of breast cancer
or or
problem, which involves the classification of tumor
as benign or malignant based on the features of the
tumor, the output may be 1 or 0 with 1 denoting
malignant and o, benign. In math terms, this is
equivalent to finding a function such that,

77
learning modes and tasks

regression classification clustering dimensionality


reduction

may occur in all


types of learning,
supervised, unsupervised,
and semi supervised.
input output just input
input output
unsupervised learning
supervised/ (as in data mining)
semi-supervised
learning

78
machine learning models
o artificial neural networks
o decision trees
o support vector machines
o regression analysis
o Bayesian networks
o genetic algorithms
o training models
o federated learning

79
machine learning applications
o computer vision
o natural language processing (NLP)
o support vector machines
o regression analysis
o Bayesian networks
o genetic algorithms
o training models
o federated learning

80

You might also like