Assignment 05 CL
Assignment 05 CL
55
BE AI& DS
---------------------------------------------------------------------------------------------------------------------------------------
1. What is word embedding, and why is it important in natural language processing (NLP)?
ANS: Word embedding is a technique used in natural language processing (NLP) and machine learning to repre-
sent words as dense vectors of real numbers. Each word is mapped to a point in a high-dimensional space where
the position of the word in that space is based on its contextual relationships with other words. Word embed-
dings are typically learned from large corpora of text using methods like Word2Vec, GloVe, or FastText.
1. Semantic Similarity: Words with similar meanings tend to have similar vector representations. This
allows algorithms to understand the semantic relationships between words and capture nuances in
meaning.
2. Dimensionality Reduction: Word embeddings reduce the dimensionality of the input space, making it
easier to work with large vocabularies and allowing for more efficient computation.
3. Contextual Understanding: Word embeddings capture syntactic and semantic relationships between
words based on their context in the training data. This contextual understanding is crucial for tasks like
sentiment analysis, named entity recognition, and machine translation.
4. Improved Generalization: Models trained using word embeddings tend to generalize better to new,
unseen data. By leveraging the semantic information captured in the embeddings, models can better un-
derstand and process text even if the exact words or phrases were not seen during training.
5. Efficient Representation: Compared to one-hot encoding or other sparse representations of words,
word embeddings provide a more efficient and dense representation of words, which is more suitable
for neural network architectures commonly used in NLP tasks.
6. What is Neural Machine Translation (NMT), and how does it differ from traditional statistical machine trans-
lation approaches?
ANS: Neural Machine Translation (NMT) is an approach to machine translation that utilizes artificial neural
networks to translate text from one language to another. Unlike traditional statistical machine translation (SMT)
approaches, which rely on hand-engineered features and complex pipelines, NMT models directly learn the
translation mapping from input to output text.
1. End-to-End Learning: In NMT, the entire translation process is modeled by a single neural network
architecture, typically using recurrent neural networks (RNNs), convolutional neural networks (CNNs),
or transformer architectures. This end-to-end learning approach allows for better optimization and inte-
gration of various components of the translation process.
2. Word Embeddings: NMT models often use word embeddings to represent words in continuous vector
spaces, capturing semantic and syntactic information. These embeddings are learned directly from the
training data and are optimized alongside the translation model. In contrast, traditional SMT systems
typically rely on sparse representations of words or phrases and handcrafted features.
3. Contextual Understanding: NMT models have the ability to capture long-range dependencies and
contextual information in the source and target languages, allowing for more accurate translations of
complex sentences. This is achieved through the use of recurrent or self-attention mechanisms in the
neural network architectures.
4. Parameterization: NMT models have a large number of parameters that are learned from data, allow-
ing them to capture complex relationships between words and phrases in the source and target lan-
guages. Traditional SMT systems, on the other hand, rely on manually-tuned parameters and feature
weights, which may not generalize well across different language pairs or domains.
5. Data Requirements: NMT models typically require larger amounts of parallel training data compared
to traditional SMT systems. However, once trained on sufficient data, NMT models often outperform
traditional approaches in terms of translation quality, especially for languages with complex syntax and
morphology.
7. Explain the BERT score and how it differs from BLEU in evaluating translation quality.
ANS: BERT (Bidirectional Encoder Representations from Transformers) score is a metric used for evaluating
the quality of machine translation outputs. Unlike BLEU (Bilingual Evaluation Understudy), which is based on
n-gram overlap between the reference (human-generated) translation and the machine-generated translation,
BERT score measures the similarity in semantics between the reference and the candidate translations by com-
puting contextual embeddings.
1. Contextual Embeddings: BERT score utilizes contextual embeddings generated by pre-trained BERT
models to capture the semantic meaning of words and phrases in the reference and candidate transla-
tions. This allows it to consider not only individual words or n-grams but also the context in which they
appear, resulting in a more nuanced evaluation of translation quality.
2. Bidirectionality: BERT models are bidirectional, meaning they can capture dependencies from both
left and right contexts in a sentence. This enables BERT score to better understand the relationships be-
tween words and phrases in the translations, resulting in more accurate evaluations, especially for lan-
guages with flexible word order and complex syntactic structures.
3. Robustness to Synonyms and Paraphrases: BERT score is more robust to variations in word choice
and sentence structure compared to BLEU. Since BERT embeddings capture semantic similarity, trans-
lations that use synonyms or paraphrases of the reference text are more likely to receive higher scores if
they convey the same meaning effectively.
4. No Need for Reference Length: Unlike BLEU, which penalizes translations based on the difference in
length between the reference and candidate translations, BERT score does not require reference length
normalization. This makes BERT score more suitable for evaluating translations across different
lengths and styles of text.
5. Correlation with Human Judgments: BERT score has been shown to correlate more strongly with
human judgments of translation quality compared to BLEU, especially for languages with complex
syntax and semantics. This makes it a more reliable metric for assessing the fluency and adequacy of
machine translations.
8. What is the BERT (Bidirectional Encoder Representations from Transformers) model, and how is it pre-
trained for various NLP tasks?
ANS: BERT (Bidirectional Encoder Representations from Transformers) is a state-of-the-art pre-trained model
for natural language processing (NLP) tasks introduced by researchers at Google AI Language in 2018. It is
based on the Transformer architecture, which is a neural network architecture designed specifically for sequence
transduction tasks such as machine translation and language modeling.
1. Bidirectional: BERT is bidirectional, meaning it can capture context from both left and right directions
in a sentence. This bidirectionality allows it to better understand the meaning of words and phrases in
context, which is crucial for many NLP tasks.
2. Transformer Architecture: BERT is based on the Transformer architecture, which utilizes self-atten-
tion mechanisms to capture long-range dependencies in sequences. This architecture enables BERT to
effectively model relationships between words in a sentence without relying on recurrent neural net-
works (RNNs) or convolutional neural networks (CNNs).
3. Pre-training: BERT is pre-trained on large amounts of text data using unsupervised learning objec-
tives. During pre-training, BERT learns to predict masked words in a sentence (Masked Language
Model, MLM) and to predict the relationship between pairs of sentences (Next Sentence Prediction,
NSP). By pre-training on large corpora of text data, BERT learns general language representations that
can be fine-tuned for specific NLP tasks.
4. Transfer Learning: After pre-training, the BERT model can be fine-tuned on downstream NLP tasks
such as text classification, named entity recognition, question answering, and machine translation.
Fine-tuning involves training BERT on task-specific labeled data, which allows it to adapt its pre-
learned representations to the specific requirements of the task.
5. Multi-layer Representation: BERT consists of multiple layers of encoders, each capturing different
levels of abstraction in the input text. The final hidden states of these encoders are used as contextual-
ized representations of words, which can then be used as input to task-specific output layers.