0% found this document useful (0 votes)
18 views29 pages

05. Vector Semantics and Embeddings

The document discusses vector semantics and word embeddings, which represent words as numerical vectors in high-dimensional space to capture semantic relationships. It covers various techniques for training word embeddings, their applications in natural language processing, and the limitations of traditional embeddings, including challenges with polysemy and domain-specific terms. Additionally, it introduces contextualized word embeddings like BERT and ELMo, which address some of these limitations by considering word context.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views29 pages

05. Vector Semantics and Embeddings

The document discusses vector semantics and word embeddings, which represent words as numerical vectors in high-dimensional space to capture semantic relationships. It covers various techniques for training word embeddings, their applications in natural language processing, and the limitations of traditional embeddings, including challenges with polysemy and domain-specific terms. Additionally, it introduces contextualized word embeddings like BERT and ELMo, which address some of these limitations by considering word context.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 29

Vector Semantics and

Embeddings
Contents
 Vector Semantics
 Word Embeddings
 Distributed Representation
 Training Word Embeddings
 Semantic Relationships in Embeddings
 Limitations
 Contextualized Word Embeddings

2
Vector Semantics
 Concept
 Vector semantics represents words or phrases as numerical vectors in
a high-dimensional space to capture their semantic relationships and
meanings.
 This involves mapping words to points in the vector space, enabling
mathematical manipulation and analysis based on semantic
similarities.

 Capturing Semantic Relationships


 Vector semantics aims to capture both syntactic and semantic
relationships among words.
 Similar or contextually linked words are depicted as vectors positioned
closely in the vector space.
 Vectors' proximity and orientation convey word relationships and
meanings, facilitating algorithms to measure semantic similarities and
execute tasks such as word analogy and similarity computations.

3
Word Embeddings
 Concept
 Word embeddings, derived from vector semantics, are compact,
continuous-value word representations that maintain semantic
relationships. Learned from extensive text data via techniques like
neural networks, word embeddings ensure analogous words share akin
vector forms.

 Word embeddings excel at grasping intricate word connections like


synonyms, antonyms, context, and analogies. A prime instance
involves the vector for "king" - "man" + "woman" closely aligning with
the vector for "queen" in a proficiently trained embedding space.

 Leveraging word embeddings, natural language processing (NLP)


algorithms enhance diverse tasks like text classification, sentiment
analysis, and machine translation. These embeddings substantially
elevate computers' grasp of human language, resulting in enhanced
precision and effectiveness across NLP applications.

4
Word Embeddings
 Effectiveness in Natural Language Processing Tasks
 Text Classification: Word embeddings help models understand
the meanings and context of words, improving the accuracy of
text classification tasks like sentiment analysis, topic
categorization, and spam detection.

 Machine Translation: By learning word embeddings from


multilingual text, models can better understand how words in
different languages relate to each other, leading to improved
translation accuracy.

 Named Entity Recognition (NER): Word embeddings aid in


identifying named entities (names of people, places,
organizations) by recognizing patterns and contexts associated
with them.

5
Word Embeddings
 Effectiveness in Natural Language Processing Tasks
 Information Retrieval: Word embeddings enable more relevant
search results by understanding synonyms, context, and
relationships between query terms and document content.

 Question Answering: Models can use word embeddings to


better understand questions and find relevant passages or
answers by identifying the semantic meanings within the text.

 Text Similarity and Clustering: Word embeddings help


measure semantic similarity between texts and cluster similar
documents together.

6
Word Embeddings
 Effectiveness in Natural Language Processing Tasks
 Natural Language Understanding: In tasks like sentiment
analysis, emotion detection, and intent recognition, word
embeddings enable models to capture nuanced linguistic patterns.

 Word Analogy Tasks: Word embeddings can solve analogy tasks


like "king - man + woman = queen" by manipulating the vectors to
preserve semantic relationships.

 Text Generation: Word embeddings enhance the quality and


coherence of generated text by ensuring that words chosen by the
model fit contextually.

 Speech Recognition: In automatic speech recognition, word


embeddings assist in understanding the spoken words and their
semantic meanings.

7
Example
In this example:

We use the spaCy library to load pre-


trained word vectors for English words
(en_core_web_md).

We calculate the similarity between


words "apple," "banana," and "orange"
using the .similarity() method.

We demonstrate vector arithmetic by


finding a word similar to "king - man +
woman," which should be similar to
"queen."

When running this code, it will output the similarity scores between words and the
word most similar to "queen" based on vector arithmetic

8
Distributed
Representation
 Concept

 Distributed representation, or distributed semantics, is central to


word embeddings.

 This concept represents words as vectors in a multi-dimensional


space.

 Each vector dimension encapsulates a unique semantic trait.

 Here, a word's meaning stems from interactions with other words,


fostering nuanced and contextually profound comprehension.

9
Distributed
Representation
 Encoding Semantic Features in Vector Dimensions

 In word embeddings, each dimension of a vector is dedicated to


encoding a particular semantic feature.

 These features might include things like gender, tense, sentiment,


context, and more.

 Instead of representing a word as a binary presence-absence


signal (as in one-hot encodings), distributed representation assigns
values to multiple dimensions, allowing words to capture a
combination of semantic attributes.

10
Distributed
Representation
 Illustrating Similar Words in the Embedding Space

 Consider a hypothetical two-dimensional vector space for


illustration purposes. In reality, word embeddings typically have
hundreds of dimensions. Let's say we're working with a small
vocabulary of animals, and we've learned the following word
embeddings:

 "cat": [0.5, 0.7]


 "dog": [0.4, 0.6]
 "elephant": [0.2, 0.1]
 "tiger": [0.45, 0.65]

11
Distributed
Representation
 Illustrating Similar Words in the Embedding Space

 In this two-dimensional space, vectors are visualized as points,


showing close proximity for similar animals like "cat" and "dog,"
distinguishing them from "elephant."

 Distances and angles between vectors reflect word relationships.


Similar words have close vectors; distinct ones are distant. Word
embeddings' distributed nature captures intricate semantics.

 Real-world embeddings span hundreds of dimensions, capturing


vast nuances. This vital distributed representation enhances
natural language processing by refining word meanings and
relationships..

12
Example

In this example

We use the Word2Vec model from gensim to learn word embeddings from a
small dataset of sentences.
The parameters vector_size define the dimensionality of the word vectors,
window sets the maximum distance between the current and predicted word, and
min_count sets the minimum number of occurrences of a word to consider for
training.
We use the Skip-gram model (sg=0) to predict context words from the target
word.
We access the learned word vectors using the .wv attribute of the trained model.
We find the word vector for the word "machine" and find the most similar words to
13
"learning" based on the learned embeddings.
Training Word
Embeddings
 Techniques for Training Word Embeddings

 Word2Vec
 This method learns embeddings by predicting context words from a target word
(Skip-gram) or predicting a target word from context (Continuous Bag of Words,
CBOW).
 Through text iterations, it enhances embeddings, adept at semantic relations and
analogies.

 GloVe (Global Vectors for Word Representation)


 GloVe learns embeddings from corpus-wide word co-occurrence stats.
 It factorizes a co-occurrence matrix into vectors, aiming for both local (contextual)
and global (corpus-wide) relationships.

 FastText:
 Extending Word2Vec, FastText views words as character n-gram sets.
 This enables embeddings for new words and subword insights.
 Words can be represented by summing their character n-grams.

14
Training Word
Embeddings
 Iterative Optimization Process

 Initialization
 Initialize word vectors randomly or with pre-trained values

 Objective Function
 Gradient Descent
 Negative Sampling (Word2Vec)
 Updating Vectors
 Convergence
 Extracting Word Embeddings

15
Training Word
Embeddings
 Iterative Optimization Process

 Initialization
 Objective Function
 Define an objective function that quantifies the quality of the embeddings. For
Word2Vec, the objective function measures how well the model predicts context
words from target words (or vice versa).

 Gradient Descent
 Negative Sampling (Word2Vec)
 Updating Vectors
 Convergence
 Extracting Word Embeddings

16
Training Word
Embeddings
 Iterative Optimization Process

 Initialization
 Objective Function
 Gradient Descent
 Use optimization techniques like gradient descent to update the word vectors
iteratively. The goal is to minimize the difference between predicted and actual
words in the context.

 Negative Sampling (Word2Vec)


 Updating Vectors
 Convergence
 Extracting Word Embeddings

17
Training Word
Embeddings
 Iterative Optimization Process

 Initialization
 Objective Function
 Gradient Descent
 Negative Sampling (Word2Vec)
 To avoid updating weights for all words in the vocabulary (which can be
computationally expensive), Word2Vec uses negative sampling. It selects a
small number of negative samples (words not in the context) for each positive
sample and updates their weights.

 Updating Vectors
 Convergence
 Extracting Word Embeddings

18
Training Word
Embeddings
 Iterative Optimization Process

 Initialization
 Objective Function
 Gradient Descent
 Negative Sampling (Word2Vec)
 Updating Vectors
 The optimization process adjusts word vectors to improve the model's ability to
predict context words. Similar words (based on co-occurrence) end up having
similar vectors.

 Convergence
 Extracting Word Embeddings

19
Training Word
Embeddings
 Iterative Optimization Process

 Initialization
 Objective Function
 Gradient Descent
 Negative Sampling (Word2Vec)
 Updating Vectors
 Convergence
 Repeat the optimization process for multiple epochs until the model's performance stabilizes or
converges.
 Extracting Word Embeddings

20
Training Word
Embeddings
 Iterative Optimization Process

 Initialization
 Objective Function
 Gradient Descent
 Negative Sampling (Word2Vec)
 Updating Vectors
 Convergence
 Extracting Word Embeddings
 Once training is complete, the learned vectors represent word embeddings.
These vectors can be used for various NLP tasks.

21
Semantic Relationships in
Embeddings

 Word embeddings capture semantic relationships by


representing words as dense vectors in a high-
dimensional space.

 The key idea is that words with similar meanings or


contextual usage are represented by vectors that are
close to each other in this space.

 This closeness enables word embeddings to capture


various types of semantic relationships and analogies.

22
Semantic Relationships in
Embeddings

23
Semantic Relationships in
Embeddings
 Similar Words in the Vector Space
 In the embedding space, words with similar meanings or contexts
are represented by vectors that are close together.

 This proximity reflects the semantic similarity between the words.

 For example:

 The vectors for "cat" and "dog" are closer to each other than to the
vector for "elephant," indicating their semantic similarity.

 The vectors for synonyms like "big" and "large" are also close, as are
words that often appear in similar contexts.

24
Limitations
 Polysemy and Homonymy Challenges:
 Polysemy: Words often have multiple meanings based on context.
Word embeddings may struggle to disambiguate these meanings,
resulting in a single vector that represents all contexts.
 Homonymy: Different words with the same spelling (homonyms) can
be confused in embeddings, leading to vectors that mix their
meanings.
 Capturing Rare Words:
 Word embeddings are trained on large corpora, which can make them
less effective for representing rare words or domain-specific terms
with limited occurrences in the training data. Rare words may not have
well-defined vectors.
 Domain-Specific Information:
 Word embeddings trained on general text may not capture domain-
specific nuances and terminology. For specialized domains, the
embeddings might not adequately reflect the required semantics.

25
Contextualized Word
Embeddings
 BERT (Bidirectional Encoder Representations from
Transformers):
 BERT models consider both left and right context words, capturing
richer contextual information for each word.
 This approach helps in resolving polysemy and homonymy challenges,
as words can have different meanings based on their surroundings.
 ELMo (Embeddings from Language Models):
 ELMo uses a bidirectional LSTM (Long Short-Term Memory) model to
create embeddings that are sensitive to context.
 It generates multiple layers of embeddings, each capturing different
levels of language information.
 GPT (Generative Pre-trained Transformer):
 GPT models, like BERT, are trained to predict words in a sentence
based on their context.
 They excel in generating coherent and contextually relevant text.

26
Practice
 Dataset
sentences = [
["natural", "language", "processing", "is", "a", "subfield", "of", "artificial",
"intelligence"],
["word", "embeddings", "capture", "semantic", "meanings", "of", "words"],
["machine", "learning", "techniques", "are", "widely", "used", "in", "text",
"mining"],
["vector", "semantics", "enable", "semantic", "similarities", "between", "words"],
["deep", "learning", "models", "have", "revolutionized", "language", "processing"]
]

 Instructions:
Create a Python program that explores advanced concepts in vector semantics and word
embeddings using the Word2Vec model.
You'll work with a larger dataset of sentences and perform more sophisticated tasks to

analyze word embeddings and relationships.

27
Practice
 Tasks
 Task 1: Training Word2Vec Model
Write a function train_word2vec_model(sentences) that takes a list of sentences as
input and trains a Word2Vec model on the sentences. Use parameters vector_size=100,
window=5, min_count=1, and sg=0 for the model.
 Task 2: Calculate Similarity
Write a function calculate_similarity(model, word1, word2) that takes the trained
Word2Vec model and two words as input. Calculate the cosine similarity between the word
vectors of the given words using the model.wv attribute.
 Task 3: Most Similar Words
Write a function find_most_similar_words(model, word, topn) that takes the trained
Word2Vec model, a word, and the number of most similar words to find (topn) as input.
Use the model.wv.most_similar() method to find the most similar words to the given
word.
 Task 4: Word Analogies
Write a function word_analogies(model, word1, word2, word3, topn) that takes the
trained Word2Vec model and three words (word1, word2, and word3) as input. The
function should perform the analogy: word1 - word2 + word3 and find the most similar
words to the result.
 Task 5: Test Your Functions
Apply each function to the provided dataset and demonstrate the use of word embeddings
for similarity calculations, finding most similar words, and word analogies.

28
Q&A

29

You might also like