0% found this document useful (0 votes)

10 views41 pages

Deep Learning for Natural Language GDG Bloomington 1690248059

Uploaded by

Asit Sahoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views41 pages

Deep Learning for Natural Language GDG Bloomington 1690248059

Uploaded by

Asit Sahoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Deep Learning for Natural Language

Bloomington-Normal
July 18th, 2023

💬🧠🤖 Myles Harrison

https://www.linkedin.com/in/mylesharrison/
Who am I?
I'm Myles, a data scientist.

Most recently, I was the head of data science

at a Global Tech Bootcamp (~4 years).

Previously, I was a career consultant having

worked in data science at large organizations
such as Accenture, PwC, and Sapient.

Currently, I am teaching Conversational AI at

Georgian College north of Toronto.

I live in Ontario's Lake Country and love the

outdoors (ﬁshing and hiking) as well as creative
writing.
Fundamentals

🏛🔠
Okay, but what's the deal with ChatGPT?
ChatGPT is an example of a large language
model (LLM), a type of deep learning model
trained with hundreds of millions or billions
of parameters on very large bodies of text.
Large language models currently represent
the state of the art in NLP.

While we're here, ChatGPT is not sentient,

nor is it an example of an Artiﬁcial General
Intelligence (AGI).

Let's take a step back…

Image credit: Leon Neal/Getty Images

What is Natural Language Processing (NLP)?
Natural language processing lies at the intersection of the domains of linguistics, computer
science, and artiﬁcial intelligence.

We are primarily concerned with NLP as it pertains to the ﬁeld of data science and AI, in this
meaning referring to teaching computers to process - and perhaps even "understand" - text
written in ordinary language and perform associated tasks.

Though the term processing usually refers speciﬁcally to altering and preparing data, in the
domain of AI, NLP is often used to refer more generally to any language problem, including
those of applying machine learning (ML) to language, since these still require processing text
data beforehand.

🔡🛠💡
A Brief History of NLP (according to Wikipedia)

󰠁 🧮 🧠
Symbolic Statistical / ML Neural
(1950's- 1970's) (1980's- 2000's) (2000's - Present)

Rules-based methods Advent of statistical Breakthroughs in deep

for language tasks techniques and learning leading to
such as translation application of rapid advances in the
and conversation. machine learning. ﬁeld up to today.
What is Machine Learning?
Machine learning (ML) is a relatively new ﬁeld and sits at the intersection of software
engineering, computational mathematics, and statistics.

Whereas traditional software development is deterministic and requires the coding of speciﬁc
logic, machine learning models can learn from training data and infer relationships or make
predictions based upon patterns in a given data set, without being given explicit instructions.

Much of the mathematical backing for machine learning techniques has existed for quite some
time; it is only fairly recent advances in computing power, scale, and availability that have
enabled their application computationally, giving rise to the ﬁeld of ML.

🤖🎓
Types of Machine Learning
A
B

Supervised Learning Unsupervised Learning Reinforcement Learning

Make predictions from a dataset Uses statistical techniques to Teaches an agent a behavior by
and data labels - associated uncover patterns in a dataset. optimizing against a target
categorical or numeric values, objective with a reward function.
In NLP, a major applications are
In NLP, can be used to classify topic modeling and embeddings - It is an important aspect of some
documents based upon their ﬁnding representations of large language models (LLMs) as
content, or to predict the next language in a vector space that part of their training using
character in generative text captures their statistical Reinforcement Learning from
applications. properties. Human Feedback (RLHF).
What is Deep Learning?
• Deep Learning is a specialized type of ML that takes
inspiration from the structure of the human brain

• Unlike other machine learning, deep learning models -

or artiﬁcial neural networks - are composed of many
nodes which can be viewed as individual "sub-models"

• The theoretical foundations for deep learning have

existed since the 1960s (or even earlier), but it only
recently been realized with the rise of cheap, power
computing

• In NLP, deep learning models represent the

state-of-the-art (SOTA) and can be used for supervised,
unsupervised, and semi-supervised problems
A Neuron in the Brain
The human brain is composed of
billions of neurons, electrically
excitable cells composed of a cell
body, dendrites, an axon, and
terminal.

Neurons receive input through

their dendrites, and when firing,
an electrical impulse travels
down the axon to the terminal
and release neurotransmitters to
the next cell.
An Artificial Neuron (Perceptron)
The structure of an artificial neuron
or perceptron follows that of those
in the human brain.

Inputs are multiplied by weights

which form a weighted sum, this
are then passed through an
activation function which produces
the model activation, which is
analogous to the electrical impulse
of a neuron ﬁring.

Image from: https://deepai.org/machine-learning-glossary-and-terms/perceptron

Structure of a Neural Network
• Multiple perceptrons are put together into
layers composed of nodes (each
perceptron) to create a neural network.

• The outputs of previous layers become the

inputs of the following layer.

• The number of layers and number of nodes

- known as the network architecture - is
arbitrary and up to the choice of the
modeler. There are also speciﬁc
architectures that are well suited to
particular types of problems.
Input Layer
• The input layer is not a “true” layer but just
passes the data through to the following
layers – no activation, linear passthrough

• Each neuron in the input layer represents a

feature of the data

• Number of nodes in the input layer =

number of features
Output Layer
• Final layer of a neural network that
produces the network's predictions (output)

• Number of nodes dependent on problem:

single node for binary classiﬁcation or
regression, for multi-class number of nodes
= number of classes

• Activation function depends on problem:

e.g. sigmoid for binary classiﬁcation
(probability 0-1), softmax for multiclass
(multiple probabilities 0-1 that sum to to 1)
Hidden Layers
• Intermediate layers between the input and
output – so-called as they are hidden
between

• Perform computations on outputs of previous

layer (linear combo of outputs and weights
with activation function applied)

• Size of each layer arbitrary (network

architecture)

• Speaking here only of fully-connected

(feed-forward) networks
Activation Functions

• Applied to the linear combination of

inputs and weights for each layer

• Each layer may have different activation

function

• Families of well-known functions that

perform well and have desirable
mathematical properties

• It is from these that the power of deep

learning arises and ability to learn highly
complex relationships

Image source: https://machine-learning.paperspace.com/wiki/activation-function

Loss Functions Mean Squared Error (Regression)

• Measurement of error

• Not unique to deep learning – used in

traditional machine learning

• How “wrong” are the predictions?

• Used to optimize weights Cross-Entropy Loss (Classiﬁcation)

How do Neural Networks Learn?
• Calculate the direction of change
where that the slope of the loss
with respect to the weights is
negative (gradients)

• Find global minimum of error

• Large number of weights

(millions?) = highly complex
optimization problem
Forward Pass and Backpropagation
Forward Pass (Perform calculations)
• In the forward pass, data is run
through the network to compute
the predictions and error
calculated from the loss function

• Backpropagation (“backprop”)
Loss
applies changes to weights in Training
&
network as determined from Data
Gradients
gradients (direction of greatest
decrease of error)

Backprop (Update weights)

Epochs and Batches Training Data

Epoch 1
• Deep learning differs from other machine
learning in the neural networks are Batch 1
trained with batches (subsets of ﬁxed
Batch 2
size) of the training data
Batch 3
• Once all the data has gone through the
network once, this is referred to as a Batch 4

single epoch of training Epoch 2

• One epoch = many batches, networks are Batch 1

trained for many epochs and see the Batch 2

whole training dataset multiple times
Batch 3

Batch 4
(Python) Deep Learning Frameworks

• Google product
• Graph-based computation, GPU training
• Other deployment options (Tensorﬂow Lite, TF.js)
• Easy with integration of Keras into TF 2.x

• Facebook product
• Graph-based computation, GPU training
• Pytorch Mobile for embedded, no web (ONNX?)
• OOP dev focus (ML eng), Lightning equivalent to Keras
Language Models

🔠🧪💻
Sequence-to-Sequence Models
• Sequence-to-sequence
(Seq2Seq) neural networks take
a sequence as input and return a
sequence as output

• Applications in language
(generative models, translation,
text-to speech, summarization),
time series, audio / video
(captioning, transcription)

• e.g. RNNs and Transformers

Image source: https://google.github.io/seq2seq/

Recurrent Neural Networks (RNNs)
• Simplest type of sequence-to-sequence
model

• Uses output of previous nodes to affect

inputs of following (memory) One-to-one

• Can be one-to-one, one-to-many, One-to-many

many-to-one, or many-to-many

• Computationally expensive to train, as

sequential in nature

• Suffered from "forgetting" of vanishing

gradients

Many-to-one Many-to-many
Long-Short Term Memory Networks (LSTMs)
• Special type of RNN that captures
long-term dependencies

• Deals with problem of vanishing

gradients (forgetting)

• Additional components for

remembering and forgetting

• Still difﬁcult to train due to lack of

parallelization, and did not solve
the forgetting problem entirely
Image from: https://d2l.ai/chapter_recurrent-modern/lstm.html
The Transformer Architecture
• Groundbreaking paper "Attention is All You
Need" from Google researchers (2017)
introduced Transformer architecture

• Attention discards the notion of recurrence:

deals with forgetting by having decoder look
at all previous states of encoder (weighted
sum)

• Now represents the state of the art for LLMs

and also applied in domains outside of
language (image generation)
RNNs vs. Transformers
RNNs Transformers
● Recurrent structure: outputs from previous ● Non-sequential in nature (no recurrence) -
inputs are used to make future predictions make predictions based on the whole
● Suffered from "vanishing gradients" making it dataset
difﬁcult to learn long-term dependencies (i.e. ● Do not require information to be in order
your model has ADHD)
and keep track of position (e.g. words in a
● LSTM networks addressed the vanishing sentence), instead use positional encoding
gradient problem somewhat but not entirely
● Have a much more complex architecture
● Do not parallelize well due to their recurrent implementing the notion of self-attention
nature
● Parallelize well in training as no
sequential dependencies
LLM Model Development History

Image Source: https://huggingface.co/learn/nlp-course/chapter1/4

GPT-2

GPT-1

https://research.aimultiple.com/gpt/
To the Moon?
GPT-2 (2019):
1.5B parameters
GPT-3
GPT-3 (2020): GPT-2
175B parameters

GPT-4 (March 14, 2023):

1.76T (?) parameters
GPT-1

https://research.aimultiple.com/gpt/
Evaluating Large Language Models
Massive Multitask Language Understanding (MMLU) Performance
As LLMs have become more
sophisticated and began excelling at
"few-shot" and "zero-shot" learning
tasks, general evaluation has more
become more challenging.
As such a series of benchmarks has
arisen which are closer to knowledge and
reasonings tasks that would be given to a
human.
Some of these benchmarks are
composites encompassing existing
benchmarks as a suite (e.g. HeLM).

Source: https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu
Chinchilla: Bigger is better?
The Chinchilla Model was presented by
Google Deepmind in March 2022 and
followed the development of the earlier
Gopher model.

Though smaller in size than previous

LLMs (70B parameters vs. Gopher's
280B) it was trained on a larger dataset.
It outperformed other larger models on
standard benchmarks, including their
claiming it to outperform GPT-3 (175B
parameters).

Figure from: https://www.deepmind.com/blog/an-empirical-analysis-of-compute-optimal-large-language-model-training

Breaking News
Meta released Llama 2 today, a 70B
parameter model optimized for dialog.
Following the Chinchilla path, Llama is a
smaller model compared with some more
recent ones, but uses a very large
training set (2 trillion tokens).
The Llama models are "semi-open":
available for research and commercial
use under a license, however the details
of the speciﬁc training data and methods
have not been entirely released.
Limitations

https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI
🤯
🙃
🙄
LLMs: Player Pianos or beginnings of AGI?
The Shameless Plug

NLP4Free 🔠⚡🤖🧠😃
https://mylesharrison.com/nlp4free/

A Free Natural Language Processing (NLP) microcourse, from basics to deep learning
Let's keep learning together!
Feel free to connect with me and
continue the conversation:

www.mylesharrison.com

linkedin.com/in/mylesharrison/

calendly.com/mylesmharrison
Thanks for
listening!

😀
Image Attribution

Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
Sae Arp 699e-1997 - 781
No ratings yet
Sae Arp 699e-1997 - 781
3 pages
Deep Learning
No ratings yet
Deep Learning
243 pages
An Ingression Into Deep Learning - Resp
No ratings yet
An Ingression Into Deep Learning - Resp
25 pages
Aero Supplies - Catalogue CHAPTER 6 (505-595) PDF
No ratings yet
Aero Supplies - Catalogue CHAPTER 6 (505-595) PDF
91 pages
Andover Continum Manual
100% (3)
Andover Continum Manual
748 pages
Chapter 31
No ratings yet
Chapter 31
13 pages
Planescape Monstrous Compendium 0
100% (2)
Planescape Monstrous Compendium 0
120 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
CP4252 ML UNIT- V
No ratings yet
CP4252 ML UNIT- V
17 pages
FAI UNIT-4TB
No ratings yet
FAI UNIT-4TB
18 pages
deep learning
No ratings yet
deep learning
18 pages
deep learning UNIT 1
No ratings yet
deep learning UNIT 1
22 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Deep Learning Fundamentals
No ratings yet
Deep Learning Fundamentals
19 pages
Unit 3 Introduction to Deep Learning part 1
No ratings yet
Unit 3 Introduction to Deep Learning part 1
7 pages
Neural Networks
No ratings yet
Neural Networks
17 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
Deep Learning - Unit 1 Notes
No ratings yet
Deep Learning - Unit 1 Notes
27 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Unit 1 part 1
No ratings yet
Unit 1 part 1
61 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
Deep learning for Computer vision
No ratings yet
Deep learning for Computer vision
125 pages
Unit 4
100% (1)
Unit 4
57 pages
Ch 4 Deep Learning
No ratings yet
Ch 4 Deep Learning
7 pages
Module 3
No ratings yet
Module 3
97 pages
UNIT - 5 Lecture 2
No ratings yet
UNIT - 5 Lecture 2
26 pages
Session 8
No ratings yet
Session 8
24 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
Types of Neural Networks and Definition of Neural Network
No ratings yet
Types of Neural Networks and Definition of Neural Network
15 pages
UNIT-1 Foundations of Deep Learning
100% (1)
UNIT-1 Foundations of Deep Learning
51 pages
Deep Learning c1
No ratings yet
Deep Learning c1
86 pages
5 Neural Networks 30-07-2024
No ratings yet
5 Neural Networks 30-07-2024
32 pages
What are Neural Networks
No ratings yet
What are Neural Networks
5 pages
Unit 4 Hca
No ratings yet
Unit 4 Hca
57 pages
ML06_Neural-Network_2024-2025
No ratings yet
ML06_Neural-Network_2024-2025
78 pages
DL_Unit II
No ratings yet
DL_Unit II
78 pages
Abstarct
No ratings yet
Abstarct
11 pages
Unit_5_ Neural Pattern Recognition
No ratings yet
Unit_5_ Neural Pattern Recognition
45 pages
01-DSE DL Session 1 - Intro
No ratings yet
01-DSE DL Session 1 - Intro
138 pages
Report On Neural Networks
No ratings yet
Report On Neural Networks
15 pages
Unit 5 PR
No ratings yet
Unit 5 PR
47 pages
Features of Cnns
No ratings yet
Features of Cnns
3 pages
UNIT I part 1 notes
No ratings yet
UNIT I part 1 notes
28 pages
Lecture 2
No ratings yet
Lecture 2
37 pages
Lecture 1
No ratings yet
Lecture 1
26 pages
Deep Learning - Intro, Methods & Applications
100% (1)
Deep Learning - Intro, Methods & Applications
37 pages
DL Unit -1 Notes
No ratings yet
DL Unit -1 Notes
45 pages
Introduction (BT4222) YL
No ratings yet
Introduction (BT4222) YL
48 pages
PP&DS-5
No ratings yet
PP&DS-5
31 pages
unit 4- DL
No ratings yet
unit 4- DL
33 pages
Deep Learning Trial Lecture
No ratings yet
Deep Learning Trial Lecture
12 pages
DL+lect+4 (1)
No ratings yet
DL+lect+4 (1)
41 pages
Introduction To Deep Learning - With Complexe Python and TensorFlow Examples - Jürgen Brauer PDF
No ratings yet
Introduction To Deep Learning - With Complexe Python and TensorFlow Examples - Jürgen Brauer PDF
245 pages
dl notes
No ratings yet
dl notes
97 pages
Expanded_Deep_Learning_Document-1
No ratings yet
Expanded_Deep_Learning_Document-1
11 pages
Intro To AI
No ratings yet
Intro To AI
44 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
2 pages
Artificial-Neural-Network (1)
No ratings yet
Artificial-Neural-Network (1)
21 pages
Introduction To Neural Networks: Training Learn Generalization
No ratings yet
Introduction To Neural Networks: Training Learn Generalization
46 pages
Introduction To Neural Networks
100% (1)
Introduction To Neural Networks
46 pages
Deep Learning
No ratings yet
Deep Learning
100 pages
Deep Learning Concepts
No ratings yet
Deep Learning Concepts
14 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
7 Personal Recount
No ratings yet
7 Personal Recount
5 pages
Updated Laryngeal Airway
No ratings yet
Updated Laryngeal Airway
21 pages
7-Lecture 38 State Space Representation
No ratings yet
7-Lecture 38 State Space Representation
16 pages
ASTM A672 Grade CC 60 EFW Pipe Suppliers
No ratings yet
ASTM A672 Grade CC 60 EFW Pipe Suppliers
4 pages
Connectors in FPD
94% (16)
Connectors in FPD
64 pages
Presented By: Se Elex Students Pvppcoe
No ratings yet
Presented By: Se Elex Students Pvppcoe
30 pages
Organization of Genetic Materials in Eukaryotes and Prokaryotes
No ratings yet
Organization of Genetic Materials in Eukaryotes and Prokaryotes
26 pages
Solution Physical Geography
No ratings yet
Solution Physical Geography
16 pages
Blocklaying Bricklaying and Concreting
100% (1)
Blocklaying Bricklaying and Concreting
6 pages
20.05.McPhy Portfolio ELY Augmented McLyzer en
No ratings yet
20.05.McPhy Portfolio ELY Augmented McLyzer en
1 page
Kuta
No ratings yet
Kuta
7 pages
Mission To Mars
No ratings yet
Mission To Mars
10 pages
10x10-Mezzanine
No ratings yet
10x10-Mezzanine
1 page
FX 45a 55a SS
No ratings yet
FX 45a 55a SS
2 pages
Technology Vs CA Digest
100% (1)
Technology Vs CA Digest
2 pages
Core Hardware & PC Maintainence
No ratings yet
Core Hardware & PC Maintainence
58 pages
IMO 2006 Solutions
No ratings yet
IMO 2006 Solutions
15 pages
Trinal Exam
No ratings yet
Trinal Exam
1 page
Project On Mini Flour Mill
No ratings yet
Project On Mini Flour Mill
5 pages
Policy No ICMC 18-31
No ratings yet
Policy No ICMC 18-31
14 pages
TSR Electric Storage Heaters CREDA
No ratings yet
TSR Electric Storage Heaters CREDA
2 pages
Caterpillar 988f II | Wheel Loader | Service Manual | Download PDF
No ratings yet
Caterpillar 988f II | Wheel Loader | Service Manual | Download PDF
33 pages
Vendor List Wef. 01.07.17
No ratings yet
Vendor List Wef. 01.07.17
175 pages
(2.7) Labs Goes 2d
No ratings yet
(2.7) Labs Goes 2d
3 pages
Perth Dundee Arbroath Montrose Stonehaven Aberdeen X7: From 14 May 2018
No ratings yet
Perth Dundee Arbroath Montrose Stonehaven Aberdeen X7: From 14 May 2018
7 pages

Deep Learning for Natural Language GDG Bloomington 1690248059

Uploaded by

Deep Learning for Natural Language GDG Bloomington 1690248059

Uploaded by

Deep Learning for Natural Language

💬🧠🤖 Myles Harrison

Most recently, I was the head of data science

Previously, I was a career consultant having

Currently, I am teaching Conversational AI at

I live in Ontario's Lake Country and love the

While we're here, ChatGPT is not sentient,

Let's take a step back…

Image credit: Leon Neal/Getty Images

Rules-based methods Advent of statistical Breakthroughs in deep

Supervised Learning Unsupervised Learning Reinforcement Learning

• Unlike other machine learning, deep learning models -

• The theoretical foundations for deep learning have

• In NLP, deep learning models represent the

Neurons receive input through

Inputs are multiplied by weights

Image from: https://deepai.org/machine-learning-glossary-and-terms/perceptron

• The outputs of previous layers become the

• The number of layers and number of nodes

• Each neuron in the input layer represents a

• Number of nodes in the input layer =

• Number of nodes dependent on problem:

• Activation function depends on problem:

• Perform computations on outputs of previous

• Size of each layer arbitrary (network

• Speaking here only of fully-connected

• Applied to the linear combination of

• Each layer may have different activation

• Families of well-known functions that

• It is from these that the power of deep

Image source: https://machine-learning.paperspace.com/wiki/activation-function

• Not unique to deep learning – used in

• How “wrong” are the predictions?

• Used to optimize weights Cross-Entropy Loss (Classiﬁcation)

• Find global minimum of error

• Large number of weights

Backprop (Update weights)

single epoch of training Epoch 2

• One epoch = many batches, networks are Batch 1

trained for many epochs and see the Batch 2

• e.g. RNNs and Transformers

Image source: https://google.github.io/seq2seq/

• Uses output of previous nodes to affect

• Can be one-to-one, one-to-many, One-to-many

• Computationally expensive to train, as

• Suffered from "forgetting" of vanishing

• Deals with problem of vanishing

• Additional components for

• Still difﬁcult to train due to lack of

• Attention discards the notion of recurrence:

• Now represents the state of the art for LLMs

Image Source: https://huggingface.co/learn/nlp-course/chapter1/4

GPT-4 (March 14, 2023):

Though smaller in size than previous

Figure from: https://www.deepmind.com/blog/an-empirical-analysis-of-compute-optimal-large-language-model-training

You might also like