0% found this document useful (0 votes)

2 views

Word Embedding Learning Process

The document explains the process of learning word embeddings using a neural language model, detailing steps such as word representation, embedding lookup, contextual processing, predicting the next word, training through backpropagation, and learning word relationships. It also covers the role of the Softmax function in converting raw scores into probabilities for word prediction and provides a detailed explanation of the Skip-Gram model, which predicts context words based on a target word. The Skip-Gram model is highlighted for its effectiveness in capturing semantic relationships and is compared with the Continuous Bag of Words (CBOW) model.

Uploaded by

Huzaifa

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Word Embedding Learning Process

Uploaded by

Huzaifa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Word Embedding Learning Process

please explain the process of learning word embedding using the picture

This image illustrates the process of learning word embeddings in a neural language model. Here’s a
step-by-step breakdown of what’s happening:

1. Word Representation as Indices

Words in a sentence are represented as numerical indices.

Example:

"I" → 4343

"want" → 9665

"a" → 1

"glass" → 3852

"of" → 6163
"orange" → 6257

2. Embedding Lookup

Each word index is mapped to an embedding vector E using an embedding matrix.

Example:

o4343 → E4343 (embedding for "I")

o9665 → E9665 (embedding for "want")

Similarly, each word has its corresponding embedding vector.

3. Contextual Processing

The words "I" and "want" are connected with a red box, indicating they are part of the context
input.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/6
Words like "a", "glass", "of", and "orange" are mapped to embeddings as well.

4. Predicting the Next Word

The embeddings of the context words are passed through a neural network.

The final layer uses a Softmax function to predict the most probable next word.
The output vocabulary contains around 10,000 words (as indicated in the yellow box).

5. Training Through Backpropagation

The model learns embeddings by adjusting the weights W and biases b through backpropagation.

These weights and biases improve over time to generate meaningful word embeddings.

6. Learning Word Relationships

The handwritten notes show that the model learns relationships like:
"a glass of orange" → "juice"

Similarly, "apple" is also associated with "juice"

This means the model is learning semantic similarities between words.

Conclusion

The image demonstrates how a neural language model learns word embeddings by mapping words to
dense vectors, processing them through layers, and predicting the next word in context. Over time, the
learned embeddings capture meaningful relationships between words, improving language
understanding.

what does the softmax do in the situation above lease explain with an example

Role of Softmax in the Neural Language Model

In the situation above, the Softmax function is used to predict the next word based on the given
context. It converts raw scores (logits) from the neural network into probabilities that sum to 1, helping
the model make a prediction.

Example Walkthrough

Step 1: Input Sentence & Context

We have the input phrase:

"I want a glass of orange"

The model needs to predict the next word, which is likely "juice".

Step 2: Embedding Lookup

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/6
Each word is converted into a word embedding.

These embeddings are processed by the neural network.

Step 3: Raw Scores (Logits)

The network generates raw scores (logits) for all possible words in the vocabulary. Suppose we have a
vocabulary of 10,000 words, and the model produces scores for a few relevant words:

Word Raw Score (Logit)

juice 7.2
apple 4.5

table 1.8
book 0.5

These logits are unnormalized and can be any real number.

Step 4: Applying Softmax

Softmax transforms these raw scores into probabilities:

e oi

P (yi ) =
∑ j e oj

where:

oi is the raw score for a word.

The denominator is the sum of exponentials of all logits.

Applying Softmax:

e7.2
P ("juice") = 7.2
e + e4.5 + e1.8 + e0.5

e7.2
= 7.2
e + e4.5 + e1.8 + e0.5

Approximating:

1348
P ("juice") = ≈ 0.92
1348 + 90 + 6 + 1.6

Similarly, for "apple":

e4.5
P ("apple") = 7.2 ≈ 0.06
e + e4.5 + e1.8 + e0.5

Step 5: Prediction

The model picks the word with the highest probability.

Since P("juice") = 0.92, the model predicts "juice" as the most likely next word.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/6
Conclusion

The Softmax function converts raw scores into probabilities that sum to 1, allowing the model to select
the most probable next word. Over training, the model learns to assign higher probabilities to correct
words, improving its language understanding.

please explain skip gram model in detail with an example

Skip-Gram Model in Detail

The Skip-Gram model is a popular method for learning word embeddings using a neural network. It
is part of the Word2Vec family and is designed to predict surrounding words (context) given a target
word. This approach helps the model learn meaningful word relationships.

1. What is the Skip-Gram Model?

Given a target word, the Skip-Gram model tries to predict the context words that appear around
it.
It uses a neural network to learn word embeddings such that similar words have similar vector
representations.

The model is trained using large text corpora and updates embeddings based on co-occurrence
patterns.

2. How Does the Skip-Gram Model Work?

Step 1: Define the Context Window

We choose a window size (context size), which determines how many words around the target
word we consider.
Example: Consider the sentence:

"The cat sat on the mat"

If we use a context window of 2, then for the target word "sat", the context words are:

Previous words: "The", "cat"

Next words: "on", "the"

Step 2: Create Training Pairs

For each target word, we create training pairs consisting of the target word and each context word.
Example for context window = 2:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/6
Target Word Context Words

"cat" "The", "sat"

"sat" "cat", "on"
"on" "sat", "the"
"the" "on", "mat"
"mat" "the"

So the training data consists of pairs like:

("sat", "cat")

("sat", "on")
("sat", "The")

("sat", "the")

Step 3: Training the Model

The model consists of:

1. Input Layer: One-hot encoded vector for the target word.

2. Hidden Layer: A simple weight matrix that serves as the word embedding lookup table.
3. Output Layer: A softmax classifier that predicts context words.

Neural Network Structure

Input: One-hot vector of the target word.

Hidden Layer: A weight matrix where each row corresponds to a word embedding.
Output Layer: Softmax function that predicts the probability of each word in the vocabulary being a
context word.

Step 4: Learning Word Embeddings

The hidden layer weights (embeddings) are updated using backpropagation.

Words that appear in similar contexts develop similar embeddings.

Over time, similar words (like "cat" and "dog") have closer vectors in the embedding space.

3. Example Walkthrough
Let's assume we have a small vocabulary:
["the", "cat", "sat", "on", "mat"]

For "sat" with a window size of 2, the training pairs are:

("sat", "cat")

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/6
("sat", "on")
("sat", "The")
("sat", "the")

Training Process

1. The input word "sat" is converted to a one-hot vector:

csharp

`[0, 0, 1, 0, 0]
`

2. This vector is multiplied by the weight matrix (word embedding layer).

3. The resulting embedding vector is then passed to a softmax function, which assigns probabilities
to all words in the vocabulary.

4. The model is trained to maximize the probability of correct context words appearing near the
target word.

4. Advantages of Skip-Gram
✅ Works well with small datasets
✅ Learns meaningful word relationships
✅ Handles infrequent words better than CBOW
✅ Embeddings capture word analogies (e.g., "king" - "man" + "woman" = "queen")
5. Comparison with CBOW (Continuous Bag of Words)
Feature Skip-Gram CBOW
Predicts Context words from a target word Target word from context words

Works better for Rare words Frequent words

Computational cost Higher Lower

Learning method Each word trains multiple context words Uses average of context words

6. Conclusion
The Skip-Gram model is an effective technique for learning word embeddings by predicting
surrounding words based on a given target word. It captures rich semantic relationships between words
and is widely used in NLP applications like text classification, sentiment analysis, and machine
translation.

Would you like me to show how to implement Skip-Gram in Python using Word2Vec? 🚀

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/6

The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time - .Booklet
No ratings yet
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time - .Booklet
14 pages
Explaining The Intuition of Word2Vec & Implementing It in Python
No ratings yet
Explaining The Intuition of Word2Vec & Implementing It in Python
13 pages
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
No ratings yet
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
39 pages
Pre-Installation-Haas Automation
No ratings yet
Pre-Installation-Haas Automation
21 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
L4_CSE256_FA24_WE
No ratings yet
L4_CSE256_FA24_WE
68 pages
Chapter II
No ratings yet
Chapter II
26 pages
Word Embedding
No ratings yet
Word Embedding
9 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
ML for NLP-LO4
No ratings yet
ML for NLP-LO4
42 pages
Lecture Word Embeddings WordTo Vec IR
No ratings yet
Lecture Word Embeddings WordTo Vec IR
60 pages
07_word_embeddings_notes
No ratings yet
07_word_embeddings_notes
23 pages
Next Word Prediction With NLP and Deep Learning
No ratings yet
Next Word Prediction With NLP and Deep Learning
13 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
NLPPR8
No ratings yet
NLPPR8
4 pages
Word Embeddings Notes
No ratings yet
Word Embeddings Notes
9 pages
Module03 Embeddings
No ratings yet
Module03 Embeddings
102 pages
Word 2 Vec
No ratings yet
Word 2 Vec
6 pages
Word and Document Embeddings
No ratings yet
Word and Document Embeddings
94 pages
Vector Semantics and Embedding (part 2)
No ratings yet
Vector Semantics and Embedding (part 2)
47 pages
Unsupervised Learning of Sentence Embeddings Using Compositional N-Gram Features
No ratings yet
Unsupervised Learning of Sentence Embeddings Using Compositional N-Gram Features
11 pages
GEN AI LAB PROGRAMS
No ratings yet
GEN AI LAB PROGRAMS
15 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
Llms Course Andrew
No ratings yet
Llms Course Andrew
46 pages
Nn4nlp 02 LM
No ratings yet
Nn4nlp 02 LM
47 pages
module 5 part2new
No ratings yet
module 5 part2new
71 pages
NLP Concepts
No ratings yet
NLP Concepts
37 pages
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
No ratings yet
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
9 pages
Continuous Bag of Words (Cbow) - Single Word Model - How It Works - Thinkinfi
No ratings yet
Continuous Bag of Words (Cbow) - Single Word Model - How It Works - Thinkinfi
14 pages
Large Language Models From Scratch
No ratings yet
Large Language Models From Scratch
29 pages
NLP Lab
No ratings yet
NLP Lab
18 pages
word embedding
No ratings yet
word embedding
35 pages
Lecture 2a - Word Level Semantics
No ratings yet
Lecture 2a - Word Level Semantics
34 pages
Dan Jurafsky and James Martin Speech and Language Processing
No ratings yet
Dan Jurafsky and James Martin Speech and Language Processing
46 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
Word Embeddings in NLP - Gunjan Agicha - Medium
No ratings yet
Word Embeddings in NLP - Gunjan Agicha - Medium
5 pages
Intro DL 10 NLP
No ratings yet
Intro DL 10 NLP
99 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
NLP Summary
No ratings yet
NLP Summary
6 pages
Generative AI (1)
No ratings yet
Generative AI (1)
16 pages
DLNLP CH-3 N
No ratings yet
DLNLP CH-3 N
11 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick
No ratings yet
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick
18 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
Creating Word Embeddings - Coding The Word2Vec Algorithm in Python Using Deep Learning - by Eligijus Bujokas - Towards Data Science
No ratings yet
Creating Word Embeddings - Coding The Word2Vec Algorithm in Python Using Deep Learning - by Eligijus Bujokas - Towards Data Science
11 pages
cs224n 2023 Lecture9 Pretraining
No ratings yet
cs224n 2023 Lecture9 Pretraining
54 pages
Enriching Word Vectors With Subword Information: Piotr Bojanowski
No ratings yet
Enriching Word Vectors With Subword Information: Piotr Bojanowski
7 pages
Hung-yi Lee word2vec (v3)
No ratings yet
Hung-yi Lee word2vec (v3)
23 pages
Lebijp 59 SZ 31 Py
No ratings yet
Lebijp 59 SZ 31 Py
69 pages
Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
How Exactly Does Word2vec Work?: David Meyer
No ratings yet
How Exactly Does Word2vec Work?: David Meyer
18 pages
wordembed
No ratings yet
wordembed
31 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
state_of_multilingual_and_multimodal_NLP
No ratings yet
state_of_multilingual_and_multimodal_NLP
27 pages
anlp-02-wordrep-textclass
No ratings yet
anlp-02-wordrep-textclass
59 pages
Anlp 02 Wordrep Textclass
No ratings yet
Anlp 02 Wordrep Textclass
58 pages
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
From Everand
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
Eric Elliott
No ratings yet
MATLAB Games for Kids: Learn to Code and Play
From Everand
MATLAB Games for Kids: Learn to Code and Play
Eric Okoth Ogur
No ratings yet
Final Manuscript in Practical
No ratings yet
Final Manuscript in Practical
21 pages
11.2 Volume of Revolution 7
No ratings yet
11.2 Volume of Revolution 7
7 pages
Problems Solving in Dyeing of Cotton Textile Material With Reactive Dyes
No ratings yet
Problems Solving in Dyeing of Cotton Textile Material With Reactive Dyes
7 pages
DBMS Report
No ratings yet
DBMS Report
84 pages
Norma ISA Clase 4
No ratings yet
Norma ISA Clase 4
13 pages
5E's Friction Day 4
No ratings yet
5E's Friction Day 4
7 pages
VERB TENSE REVISION Ex
No ratings yet
VERB TENSE REVISION Ex
11 pages
Pearson Scope and Sequence
No ratings yet
Pearson Scope and Sequence
12 pages
Planning Process, Types of Planning
No ratings yet
Planning Process, Types of Planning
19 pages
Pantheon Study Cash Management Strategies For Private Equity Investors
No ratings yet
Pantheon Study Cash Management Strategies For Private Equity Investors
18 pages
electrosoft-class-3-2091931
No ratings yet
electrosoft-class-3-2091931
4 pages
Figures of Speech
No ratings yet
Figures of Speech
48 pages
Ae2403 LP
No ratings yet
Ae2403 LP
2 pages
Tesco: Losing Ground in The UK Case Analysis
No ratings yet
Tesco: Losing Ground in The UK Case Analysis
9 pages
08 Passage 3 - Communicating Conflict Q27-40
No ratings yet
08 Passage 3 - Communicating Conflict Q27-40
6 pages
IP Video Surge Protection: Dtk-Mrjpoes or Dtk-Mrjpoe
No ratings yet
IP Video Surge Protection: Dtk-Mrjpoes or Dtk-Mrjpoe
1 page
Introduction To Modeling and Simulation
No ratings yet
Introduction To Modeling and Simulation
7 pages
Vasudev Palle (B - R) Resume SCM
No ratings yet
Vasudev Palle (B - R) Resume SCM
4 pages
InfoClim & Climate Risk Management, Crop Calendar, Far North Region of Cameroon
No ratings yet
InfoClim & Climate Risk Management, Crop Calendar, Far North Region of Cameroon
3 pages
6066 T6 Aircraft Aluminum Alloy Sheet Suppliers
No ratings yet
6066 T6 Aircraft Aluminum Alloy Sheet Suppliers
12 pages
Probability - Sample Problems W Bring Home Problems PDF
No ratings yet
Probability - Sample Problems W Bring Home Problems PDF
4 pages
4-Relation Algebra Extended
No ratings yet
4-Relation Algebra Extended
21 pages
Prov Result After Stage3
No ratings yet
Prov Result After Stage3
6 pages
Dokumen.pub the New Commodity Trading Guide Breakthrough Strategies for Capturing Market Profits 1st Edition 0137145292 9780137145294
No ratings yet
Dokumen.pub the New Commodity Trading Guide Breakthrough Strategies for Capturing Market Profits 1st Edition 0137145292 9780137145294
193 pages
Kumaresan44242023UPJOZ31371 (1)
No ratings yet
Kumaresan44242023UPJOZ31371 (1)
8 pages
GenAI IN HIGHER EDUCATION FALL 2023 UPDATE TIME FO 240205 203837
No ratings yet
GenAI IN HIGHER EDUCATION FALL 2023 UPDATE TIME FO 240205 203837
17 pages
Personal Account Application Form PDF
No ratings yet
Personal Account Application Form PDF
2 pages
User Operation Manual: Eclipse 8 Eclipse 16 Eclipse 32
No ratings yet
User Operation Manual: Eclipse 8 Eclipse 16 Eclipse 32
44 pages
ClApp - Introduction
No ratings yet
ClApp - Introduction
16 pages

Word Embedding Learning Process

Uploaded by

Word Embedding Learning Process

Uploaded by

Word Embedding Learning Process

1. Word Representation as Indices

Words in a sentence are represented as numerical indices.

Each word index is mapped to an embedding vector E using an embedding matrix.

o4343 → E4343 (embedding for "I")

o9665 → E9665 (embedding for "want")

Similarly, each word has its corresponding embedding vector.

4. Predicting the Next Word

5. Training Through Backpropagation

6. Learning Word Relationships

Similarly, "apple" is also associated with "juice"

This means the model is learning semantic similarities between words.

Role of Softmax in the Neural Language Model

Step 1: Input Sentence & Context

We have the input phrase:

"I want a glass of orange"

Step 2: Embedding Lookup

These embeddings are processed by the neural network.

Step 3: Raw Scores (Logits)

Word Raw Score (Logit)

These logits are unnormalized and can be any real number.

Step 4: Applying Softmax

Softmax transforms these raw scores into probabilities:

oi is the raw score for a word.

The denominator is the sum of exponentials of all logits.

Similarly, for "apple":

The model picks the word with the highest probability.

please explain skip gram model in detail with an example

Skip-Gram Model in Detail

1. What is the Skip-Gram Model?

2. How Does the Skip-Gram Model Work?

"The cat sat on the mat"

Previous words: "The", "cat"

Step 2: Create Training Pairs

"cat" "The", "sat"

So the training data consists of pairs like:

Step 3: Training the Model

The model consists of:

1. Input Layer: One-hot encoded vector for the target word.

Neural Network Structure

Input: One-hot vector of the target word.

Step 4: Learning Word Embeddings

The hidden layer weights (embeddings) are updated using backpropagation.

Words that appear in similar contexts develop similar embeddings.

For "sat" with a window size of 2, the training pairs are:

1. The input word "sat" is converted to a one-hot vector:

2. This vector is multiplied by the weight matrix (word embedding layer).

Works better for Rare words Frequent words

Computational cost Higher Lower

You might also like