Lecture-5-Intro DL
Lecture-5-Intro DL
D EEP L EA R N I N G &
L A RG E L A N G UAG E
MOD EL S
TWUMASI MENSAH-BOATENG
INTRODUCTION
INTRODUCTION
INTRODUCTION
INTRODUCTION
WHY DEEP LEARNING?
• Big Data • Hardware • Software
%
ŷ = 𝑔 𝑤! + ' 𝑥" 𝑤"
"#$
(Non-linear) Where:
𝑔 is the activation function
𝑤! is the bias weight
AC T I VAT I O N F U N C T I O N S
1 𝑒 ' − 𝑒 &'
𝑔𝑧 = 𝑔 𝑧 = max(0, 𝑧)
1 + 𝑒 &' 𝑔𝑧 = '
𝑒 + 𝑒 &'
1, 𝑧>0
𝑔( 𝑧 = 𝑔(𝑧)(1 − 𝑔 𝑧 ) 𝑔( 𝑧 = 1 − 𝑔(𝑧)) 𝑔′ 𝑧 = 5
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
I M P O RTA N C E O F AC T I VAT I O N
FUNCTIONS
Demonstration
NEURAL NETWORKS
%
($) ($)
𝑧* = 𝑔 𝑤!,* + ' 𝑥" 𝑤",*
"#$
.
()) ())
ŷ = 𝑔 𝑤! + ' 𝑧" 𝑤"
"#$
TRAINING A NEURAL NETWORK
• Training a neural network involves two Optimizing with Gradient Descent
main steps
1. Quantify the losses. – Forward pass. Algorithm
2. Optimize the losses. – Backpropagation. 1. Initialize weights randomly
2. Loop until convergence
Loss/Cost Function: /0(1)
3. compute gradient, /1
.
1 4. Update weights, 𝑤 ← 𝑤 − ƞ
/0(1)
𝐽 𝑊 = ' 𝐿 𝑓 𝑥 (*) ; 𝑤 , 𝑦 (*) /1
𝑛 5. Return weights
*#$
𝜕𝐽(𝑊) 𝜕𝐽(𝑊) 𝜕ŷ 𝜕𝑧
= ∗ ∗
𝜕𝑤$ 𝜕ŷ 𝜕𝑧 𝜕𝑤$
𝜕𝐽(𝑊) 𝜕𝐽(𝑊) 𝜕ŷ
= ∗
𝜕𝑤) 𝜕ŷ 𝜕𝑤)
10-MINUTES BREAK
INTRODUCTION TO
LARGE LANGUAGE
MODELS
W H AT I S L A N G U AG E ?
A language is a system of communication that allows individuals to express thoughts, ideas,
and emotions.
Sequential modeling has applications in a wide range of tasks, including but not limited
to natural language processing (e.g., language modeling, machine translation, text
generation), time series analysis (e.g., forecasting, anomaly detection), speech
recognition, music generation, and DNA sequence analysis
SEQUENTIAL MODELING
To model sequences, we need to:
• Indexing
• Embedding
ENCODING LANGUAGE FOR NEURAL
NETWORK.
THE MODEL:TRANSFORMERS
Self-Attention Mechanism: The Transformer model relies
on self-attention mechanisms to weigh the importance of
different words in a sentence when processing natural
language data. This mechanism allows the model to focus
on relevant words and learn contextual representations
effectively.
• Sentiments classification
My name is
My name is Tom
ENCODER-DECODER MODELS
• They are also known as sequence-to-sequence models.
Tasks include:
• Summarization
• Translation
THANK YOU!
RESOURCES TO LEARN MORE
• Intro to Deep Learning (Tensforflow)
• https://www.deeplearning.ai/ai-notes/initialization/index.htmlIT)
• https://playground.tensorflow.org/