05. Vector Semantics and Embeddings
05. Vector Semantics and Embeddings
Embeddings
Contents
Vector Semantics
Word Embeddings
Distributed Representation
Training Word Embeddings
Semantic Relationships in Embeddings
Limitations
Contextualized Word Embeddings
2
Vector Semantics
Concept
Vector semantics represents words or phrases as numerical vectors in
a high-dimensional space to capture their semantic relationships and
meanings.
This involves mapping words to points in the vector space, enabling
mathematical manipulation and analysis based on semantic
similarities.
3
Word Embeddings
Concept
Word embeddings, derived from vector semantics, are compact,
continuous-value word representations that maintain semantic
relationships. Learned from extensive text data via techniques like
neural networks, word embeddings ensure analogous words share akin
vector forms.
4
Word Embeddings
Effectiveness in Natural Language Processing Tasks
Text Classification: Word embeddings help models understand
the meanings and context of words, improving the accuracy of
text classification tasks like sentiment analysis, topic
categorization, and spam detection.
5
Word Embeddings
Effectiveness in Natural Language Processing Tasks
Information Retrieval: Word embeddings enable more relevant
search results by understanding synonyms, context, and
relationships between query terms and document content.
6
Word Embeddings
Effectiveness in Natural Language Processing Tasks
Natural Language Understanding: In tasks like sentiment
analysis, emotion detection, and intent recognition, word
embeddings enable models to capture nuanced linguistic patterns.
7
Example
In this example:
When running this code, it will output the similarity scores between words and the
word most similar to "queen" based on vector arithmetic
8
Distributed
Representation
Concept
9
Distributed
Representation
Encoding Semantic Features in Vector Dimensions
10
Distributed
Representation
Illustrating Similar Words in the Embedding Space
11
Distributed
Representation
Illustrating Similar Words in the Embedding Space
12
Example
In this example
We use the Word2Vec model from gensim to learn word embeddings from a
small dataset of sentences.
The parameters vector_size define the dimensionality of the word vectors,
window sets the maximum distance between the current and predicted word, and
min_count sets the minimum number of occurrences of a word to consider for
training.
We use the Skip-gram model (sg=0) to predict context words from the target
word.
We access the learned word vectors using the .wv attribute of the trained model.
We find the word vector for the word "machine" and find the most similar words to
13
"learning" based on the learned embeddings.
Training Word
Embeddings
Techniques for Training Word Embeddings
Word2Vec
This method learns embeddings by predicting context words from a target word
(Skip-gram) or predicting a target word from context (Continuous Bag of Words,
CBOW).
Through text iterations, it enhances embeddings, adept at semantic relations and
analogies.
FastText:
Extending Word2Vec, FastText views words as character n-gram sets.
This enables embeddings for new words and subword insights.
Words can be represented by summing their character n-grams.
14
Training Word
Embeddings
Iterative Optimization Process
Initialization
Initialize word vectors randomly or with pre-trained values
Objective Function
Gradient Descent
Negative Sampling (Word2Vec)
Updating Vectors
Convergence
Extracting Word Embeddings
15
Training Word
Embeddings
Iterative Optimization Process
Initialization
Objective Function
Define an objective function that quantifies the quality of the embeddings. For
Word2Vec, the objective function measures how well the model predicts context
words from target words (or vice versa).
Gradient Descent
Negative Sampling (Word2Vec)
Updating Vectors
Convergence
Extracting Word Embeddings
16
Training Word
Embeddings
Iterative Optimization Process
Initialization
Objective Function
Gradient Descent
Use optimization techniques like gradient descent to update the word vectors
iteratively. The goal is to minimize the difference between predicted and actual
words in the context.
17
Training Word
Embeddings
Iterative Optimization Process
Initialization
Objective Function
Gradient Descent
Negative Sampling (Word2Vec)
To avoid updating weights for all words in the vocabulary (which can be
computationally expensive), Word2Vec uses negative sampling. It selects a
small number of negative samples (words not in the context) for each positive
sample and updates their weights.
Updating Vectors
Convergence
Extracting Word Embeddings
18
Training Word
Embeddings
Iterative Optimization Process
Initialization
Objective Function
Gradient Descent
Negative Sampling (Word2Vec)
Updating Vectors
The optimization process adjusts word vectors to improve the model's ability to
predict context words. Similar words (based on co-occurrence) end up having
similar vectors.
Convergence
Extracting Word Embeddings
19
Training Word
Embeddings
Iterative Optimization Process
Initialization
Objective Function
Gradient Descent
Negative Sampling (Word2Vec)
Updating Vectors
Convergence
Repeat the optimization process for multiple epochs until the model's performance stabilizes or
converges.
Extracting Word Embeddings
20
Training Word
Embeddings
Iterative Optimization Process
Initialization
Objective Function
Gradient Descent
Negative Sampling (Word2Vec)
Updating Vectors
Convergence
Extracting Word Embeddings
Once training is complete, the learned vectors represent word embeddings.
These vectors can be used for various NLP tasks.
21
Semantic Relationships in
Embeddings
22
Semantic Relationships in
Embeddings
23
Semantic Relationships in
Embeddings
Similar Words in the Vector Space
In the embedding space, words with similar meanings or contexts
are represented by vectors that are close together.
For example:
The vectors for "cat" and "dog" are closer to each other than to the
vector for "elephant," indicating their semantic similarity.
The vectors for synonyms like "big" and "large" are also close, as are
words that often appear in similar contexts.
24
Limitations
Polysemy and Homonymy Challenges:
Polysemy: Words often have multiple meanings based on context.
Word embeddings may struggle to disambiguate these meanings,
resulting in a single vector that represents all contexts.
Homonymy: Different words with the same spelling (homonyms) can
be confused in embeddings, leading to vectors that mix their
meanings.
Capturing Rare Words:
Word embeddings are trained on large corpora, which can make them
less effective for representing rare words or domain-specific terms
with limited occurrences in the training data. Rare words may not have
well-defined vectors.
Domain-Specific Information:
Word embeddings trained on general text may not capture domain-
specific nuances and terminology. For specialized domains, the
embeddings might not adequately reflect the required semantics.
25
Contextualized Word
Embeddings
BERT (Bidirectional Encoder Representations from
Transformers):
BERT models consider both left and right context words, capturing
richer contextual information for each word.
This approach helps in resolving polysemy and homonymy challenges,
as words can have different meanings based on their surroundings.
ELMo (Embeddings from Language Models):
ELMo uses a bidirectional LSTM (Long Short-Term Memory) model to
create embeddings that are sensitive to context.
It generates multiple layers of embeddings, each capturing different
levels of language information.
GPT (Generative Pre-trained Transformer):
GPT models, like BERT, are trained to predict words in a sentence
based on their context.
They excel in generating coherent and contextually relevant text.
26
Practice
Dataset
sentences = [
["natural", "language", "processing", "is", "a", "subfield", "of", "artificial",
"intelligence"],
["word", "embeddings", "capture", "semantic", "meanings", "of", "words"],
["machine", "learning", "techniques", "are", "widely", "used", "in", "text",
"mining"],
["vector", "semantics", "enable", "semantic", "similarities", "between", "words"],
["deep", "learning", "models", "have", "revolutionized", "language", "processing"]
]
Instructions:
Create a Python program that explores advanced concepts in vector semantics and word
embeddings using the Word2Vec model.
You'll work with a larger dataset of sentences and perform more sophisticated tasks to
27
Practice
Tasks
Task 1: Training Word2Vec Model
Write a function train_word2vec_model(sentences) that takes a list of sentences as
input and trains a Word2Vec model on the sentences. Use parameters vector_size=100,
window=5, min_count=1, and sg=0 for the model.
Task 2: Calculate Similarity
Write a function calculate_similarity(model, word1, word2) that takes the trained
Word2Vec model and two words as input. Calculate the cosine similarity between the word
vectors of the given words using the model.wv attribute.
Task 3: Most Similar Words
Write a function find_most_similar_words(model, word, topn) that takes the trained
Word2Vec model, a word, and the number of most similar words to find (topn) as input.
Use the model.wv.most_similar() method to find the most similar words to the given
word.
Task 4: Word Analogies
Write a function word_analogies(model, word1, word2, word3, topn) that takes the
trained Word2Vec model and three words (word1, word2, and word3) as input. The
function should perform the analogy: word1 - word2 + word3 and find the most similar
words to the result.
Task 5: Test Your Functions
Apply each function to the provided dataset and demonstrate the use of word embeddings
for similarity calculations, finding most similar words, and word analogies.
28
Q&A
29