0% found this document useful (0 votes)
2 views

Word Embedding Learning Process

The document explains the process of learning word embeddings using a neural language model, detailing steps such as word representation, embedding lookup, contextual processing, predicting the next word, training through backpropagation, and learning word relationships. It also covers the role of the Softmax function in converting raw scores into probabilities for word prediction and provides a detailed explanation of the Skip-Gram model, which predicts context words based on a target word. The Skip-Gram model is highlighted for its effectiveness in capturing semantic relationships and is compared with the Continuous Bag of Words (CBOW) model.

Uploaded by

Huzaifa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Word Embedding Learning Process

The document explains the process of learning word embeddings using a neural language model, detailing steps such as word representation, embedding lookup, contextual processing, predicting the next word, training through backpropagation, and learning word relationships. It also covers the role of the Softmax function in converting raw scores into probabilities for word prediction and provides a detailed explanation of the Skip-Gram model, which predicts context words based on a target word. The Skip-Gram model is highlighted for its effectiveness in capturing semantic relationships and is compared with the Continuous Bag of Words (CBOW) model.

Uploaded by

Huzaifa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Word Embedding Learning Process

please explain the process of learning word embedding using the picture

This image illustrates the process of learning word embeddings in a neural language model. Here’s a
step-by-step breakdown of what’s happening:

1. Word Representation as Indices

Words in a sentence are represented as numerical indices.

Example:

"I" → 4343

"want" → 9665

"a" → 1

"glass" → 3852

"of" → 6163
"orange" → 6257

2. Embedding Lookup

Each word index is mapped to an embedding vector E using an embedding matrix.

Example:

o4343 → E4343 (embedding for "I")


​ ​

o9665 → E9665 (embedding for "want")


​ ​

Similarly, each word has its corresponding embedding vector.

3. Contextual Processing

The words "I" and "want" are connected with a red box, indicating they are part of the context
input.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/6
Words like "a", "glass", "of", and "orange" are mapped to embeddings as well.

4. Predicting the Next Word

The embeddings of the context words are passed through a neural network.

The final layer uses a Softmax function to predict the most probable next word.
The output vocabulary contains around 10,000 words (as indicated in the yellow box).

5. Training Through Backpropagation

The model learns embeddings by adjusting the weights W and biases b through backpropagation.

These weights and biases improve over time to generate meaningful word embeddings.

6. Learning Word Relationships

The handwritten notes show that the model learns relationships like:
"a glass of orange" → "juice"

Similarly, "apple" is also associated with "juice"

This means the model is learning semantic similarities between words.

Conclusion

The image demonstrates how a neural language model learns word embeddings by mapping words to
dense vectors, processing them through layers, and predicting the next word in context. Over time, the
learned embeddings capture meaningful relationships between words, improving language
understanding.

what does the softmax do in the situation above lease explain with an example

Role of Softmax in the Neural Language Model

In the situation above, the Softmax function is used to predict the next word based on the given
context. It converts raw scores (logits) from the neural network into probabilities that sum to 1, helping
the model make a prediction.

Example Walkthrough

Step 1: Input Sentence & Context

We have the input phrase:

"I want a glass of orange"


The model needs to predict the next word, which is likely "juice".

Step 2: Embedding Lookup


Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/6
Each word is converted into a word embedding.

These embeddings are processed by the neural network.

Step 3: Raw Scores (Logits)

The network generates raw scores (logits) for all possible words in the vocabulary. Suppose we have a
vocabulary of 10,000 words, and the model produces scores for a few relevant words:

Word Raw Score (Logit)

juice 7.2
apple 4.5

table 1.8
book 0.5

These logits are unnormalized and can be any real number.

Step 4: Applying Softmax

Softmax transforms these raw scores into probabilities:

e oi ​

P (yi ) =
∑ j e oj
​ ​

where:

oi is the raw score for a word.


The denominator is the sum of exponentials of all logits.

Applying Softmax:

e7.2
P ("juice") = 7.2
e + e4.5 + e1.8 + e0.5

e7.2
= 7.2
e + e4.5 + e1.8 + e0.5

Approximating:

1348
P ("juice") = ≈ 0.92
1348 + 90 + 6 + 1.6

Similarly, for "apple":

e4.5
P ("apple") = 7.2 ≈ 0.06
e + e4.5 + e1.8 + e0.5

Step 5: Prediction

The model picks the word with the highest probability.


Since P("juice") = 0.92, the model predicts "juice" as the most likely next word.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/6
Conclusion

The Softmax function converts raw scores into probabilities that sum to 1, allowing the model to select
the most probable next word. Over training, the model learns to assign higher probabilities to correct
words, improving its language understanding.

please explain skip gram model in detail with an example

Skip-Gram Model in Detail

The Skip-Gram model is a popular method for learning word embeddings using a neural network. It
is part of the Word2Vec family and is designed to predict surrounding words (context) given a target
word. This approach helps the model learn meaningful word relationships.

1. What is the Skip-Gram Model?


Given a target word, the Skip-Gram model tries to predict the context words that appear around
it.
It uses a neural network to learn word embeddings such that similar words have similar vector
representations.

The model is trained using large text corpora and updates embeddings based on co-occurrence
patterns.

2. How Does the Skip-Gram Model Work?


Step 1: Define the Context Window

We choose a window size (context size), which determines how many words around the target
word we consider.
Example: Consider the sentence:

"The cat sat on the mat"

If we use a context window of 2, then for the target word "sat", the context words are:

Previous words: "The", "cat"


Next words: "on", "the"

Step 2: Create Training Pairs

For each target word, we create training pairs consisting of the target word and each context word.
Example for context window = 2:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/6
Target Word Context Words

"cat" "The", "sat"


"sat" "cat", "on"
"on" "sat", "the"
"the" "on", "mat"
"mat" "the"

So the training data consists of pairs like:

("sat", "cat")

("sat", "on")
("sat", "The")

("sat", "the")

Step 3: Training the Model

The model consists of:

1. Input Layer: One-hot encoded vector for the target word.


2. Hidden Layer: A simple weight matrix that serves as the word embedding lookup table.
3. Output Layer: A softmax classifier that predicts context words.

Neural Network Structure

Input: One-hot vector of the target word.

Hidden Layer: A weight matrix where each row corresponds to a word embedding.
Output Layer: Softmax function that predicts the probability of each word in the vocabulary being a
context word.

Step 4: Learning Word Embeddings

The hidden layer weights (embeddings) are updated using backpropagation.

Words that appear in similar contexts develop similar embeddings.


Over time, similar words (like "cat" and "dog") have closer vectors in the embedding space.

3. Example Walkthrough
Let's assume we have a small vocabulary:
["the", "cat", "sat", "on", "mat"]

For "sat" with a window size of 2, the training pairs are:

("sat", "cat")

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/6
("sat", "on")
("sat", "The")
("sat", "the")

Training Process

1. The input word "sat" is converted to a one-hot vector:

csharp

`[0, 0, 1, 0, 0]
`

2. This vector is multiplied by the weight matrix (word embedding layer).


3. The resulting embedding vector is then passed to a softmax function, which assigns probabilities
to all words in the vocabulary.

4. The model is trained to maximize the probability of correct context words appearing near the
target word.

4. Advantages of Skip-Gram
✅ Works well with small datasets
✅ Learns meaningful word relationships
✅ Handles infrequent words better than CBOW
✅ Embeddings capture word analogies (e.g., "king" - "man" + "woman" = "queen")
5. Comparison with CBOW (Continuous Bag of Words)
Feature Skip-Gram CBOW
Predicts Context words from a target word Target word from context words

Works better for Rare words Frequent words

Computational cost Higher Lower


Learning method Each word trains multiple context words Uses average of context words

6. Conclusion
The Skip-Gram model is an effective technique for learning word embeddings by predicting
surrounding words based on a given target word. It captures rich semantic relationships between words
and is widely used in NLP applications like text classification, sentiment analysis, and machine
translation.

Would you like me to show how to implement Skip-Gram in Python using Word2Vec? 🚀

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/6

You might also like