NLP Assignment 2024
NLP Assignment 2024
Multiple groups
cannot do the same assignment. Assignments will be allotted on a
first-come-first-allotted basis. One member of each group will email
me regarding which assignment that group wants to do along with the
email-ids of other members of that group. After getting the approval
from me, please fill the following form.
https://docs.google.com/spreadsheets/d/145rzZE2yrKEOGQaUZsH-98
JhWxF991m82xR6BpsVwxM/edit?usp=sharing
Objective:
Fine-tune a BERT-based model for Named Entity Recognition (NER)
using a publicly available dataset like CoNLL-2003.
Tasks:
1. Preprocessing: Load and preprocess the CoNLL-2003 dataset.
This includes: - Tokenizing the text using BERT tokenizer.
- Structuring the data into a format compatible with the `transformers`
library for token classification.
2. Fine-Tuning:
- Fine-tune BERT on the NER task using Hugging Face’s
`transformers` library. - Implement appropriate hyperparameter
tuning for model optimization.
3. Evaluation:
- Use precision, recall, and F1-score as evaluation metrics to
measure the model's performance.
- Compare the performance of your model with the benchmark
results from the CoNLL-2003 challenge.
4. Error Analysis:
- Perform detailed error analysis to understand the common mistakes
made by the model (e.g., confusion between similar entity types).
- Suggest potential improvements, such as using Conditional Random
Fields (CRF) or data augmentation.
Deliverables:
- A Jupyter notebook with the implementation of preprocessing,
model training, and evaluation.
- A written report detailing the model architecture, the
hyperparameters used, and an analysis of the model’s performance
along with the error analysis.
---
Objective:
Fine-tune RoBERTa for sentiment analysis using the IMDB movie reviews
dataset for binary classification (positive/negative).
Tasks:
1. Preprocessing:
- Clean the IMDB dataset (e.g., removing special characters,
handling missing data). - Tokenize the data using RoBERTa’s
tokenizer.
2. Fine-Tuning:
- Fine-tune RoBERTa on the sentiment classification task.
- Experiment with different hyperparameters like learning rate, batch
size, and training epochs to optimize model performance.
3. Model Evaluation:
- Evaluate the fine-tuned model using metrics like accuracy,
precision, recall, and F1-score.
- Test the model on a custom set of movie reviews and analyze the model’s
performance.
4. Hyperparameter Experimentation:
- Conduct an experiment to study the effect of various
hyperparameters on model performance (learning rate, batch size,
etc.).
Deliverables:
- A notebook showing the fine-tuning process and hyperparameter
experiments. - A report analyzing the impact of hyperparameters on
performance, including any failure cases (reviews incorrectly classified).
---
Objective:
Fine-tune BERT for document classification using the 20 Newsgroups
dataset.
Tasks:
1. Preprocessing:
- Clean and preprocess the text in the 20 Newsgroups dataset.
- Tokenize using BERT’s tokenizer, making sure to handle input length
properly by splitting long documents if necessary.
2. Fine-Tuning:
- Fine-tune BERT for multi-class classification on the dataset.
- Experiment with different pooling strategies (CLS token pooling,
mean pooling) to aggregate document representations.
3. Evaluation:
- Use accuracy, precision, recall, and F1-score to evaluate the
model’s performance. - Perform cross-validation to get robust results
and reduce the risk of overfitting.
Deliverables:
- Code that shows the preprocessing, fine-tuning, and evaluation
steps. - A report comparing pooling strategies and analyzing their
effect on the model’s performance, supported by experimental
results.
---
Objective:
Fine-tune mBERT for Named Entity Recognition (NER) using a
multilingual dataset like WikiAnn.
Tasks:
1. Preprocessing:
- Load and preprocess the WikiAnn dataset for multiple languages.
- Tokenize the text using the multilingual BERT tokenizer, and
structure the data accordingly.
2. Fine-Tuning:
- Fine-tune mBERT on NER for multiple languages (e.g., English,
German, French). - Implement transfer learning: fine-tune the model
on one language and evaluate on another.
3. Evaluation:
- Compare performance across different languages using
standard NER metrics (precision, recall, F1-score).
- Perform error analysis on low-resource languages to understand
where the model struggles.
Deliverables:
- A Jupyter notebook/code that fine-tunes mBERT and evaluates
performance across multiple languages.
- A detailed report discussing the challenges of multilingual NER and the
impact of transfer learning on low-resource languages.
---
Objective:
Fine-tune RoBERTa for zero-shot classification on a custom set of
documents, using Hugging Face’s `transformers` library.
Tasks:
1. Model Setup:
- Load a pre-trained RoBERTa model using the `transformers`
library for zero-shot classification.
2. Data Collection:
- Create or use a custom dataset containing various categories
(e.g., news, sports, technology).
- Use the zero-shot learning setup to classify the documents into
predefined categories.
3. Evaluation:
- Evaluate the model’s performance by comparing the assigned labels
with the ground truth.
- Perform error analysis to identify which categories are difficult for the
model to classify.
4. Improvements:
- Suggest improvements based on error analysis, such as better
prompt engineering or dataset augmentation.
Deliverables:
- Code demonstrating the setup, fine-tuning, and evaluation of
RoBERTa for zero-shot classification.
- A report analyzing the model’s performance, the challenges
encountered, and potential improvements.
---
Objective:
Fine-tune DistilBERT for token classification on a task like Part-of-Speech
(POS) tagging.
Tasks:
1. Preprocessing:
- Preprocess a POS tagging dataset, ensuring that the text is
tokenized properly using DistilBERT’s tokenizer.
2. Fine-Tuning:
- Fine-tune DistilBERT on the POS tagging task, experimenting with
different batch sizes, learning rates, and epochs.
3. Evaluation:
- Evaluate the model using token-level accuracy, F1-score, and other
relevant metrics. - Compare the performance of DistilBERT to BERT and
analyze the trade-offs between model size and performance.
4. Efficiency Analysis:
- Analyze the performance of DistilBERT in terms of computational
efficiency (e.g., training time, memory usage) compared to BERT.
Deliverables:
- A Jupyter notebook with the preprocessing, fine-tuning, and evaluation
of DistilBERT for token classification.
- A report discussing the performance trade-offs between DistilBERT
and BERT, and an analysis of computational efficiency.
Objective:
Fine-tune a BART-based model (facebook/bart-base recommended as it is
smaller and faster) for sequence classification on CoLA dataset. Ref:
https://openreview.net/pdf?id=rJ4km2R5t7
Each example is a sequence of words annotated with whether it is a
grammatical English sentence.
Tasks:
1. Load and preprocess the CoLA dataset . This includes:
- Tokenizing the text using BARTTokenizer/AutoTokenizer.
2. Fine-Tuning:
- Fine-tune BART on the sequence classification task.
- Implement appropriate hyperparameter tuning for model optimization.
3. Evaluation:
- Use accuracy metric to measure the model's performance.
- The test data with gold labels is not available in CoLA dataset. So, use a
small part of the development data as the development set and use the
remaining data as the test set.
- Compare the performance of your model with the benchmark result from
the BERT model.
Objective:
Fine-tune a BART-based model for bitext classification on RTE dataset.
Ref: https://openreview.net/pdf?id=rJ4km2R5t7
RTE stands for recognizing textual entailment i.e. whether one sentence
entails/supports another.
2. Fine-Tuning:
- Fine-tune BART on the bitext classification task.
- Implement appropriate hyperparameter tuning for model optimization.
3. Evaluation:
- Use accuracy metric to measure the model's performance.
- Compare the performance of your model with the benchmark result from
the BERT model.
Objective:
Fine-tune T5 model to classify whether two sentences are semantically
(meaningfully) equivalent. Use MRPC dataset, Ref:
https://openreview.net/pdf?id=rJ4km2R5t7
Tasks:
1. Load and preprocess the MRPC dataset . This includes:
- Tokenizing the text using T5Tokenizer.
2. Fine-Tuning:
- Fine-tune T5 for paraphrasing task.
- Implement appropriate hyperparameter tuning for model optimization.
3. Evaluation:
- Use accuracy and F1 metric to measure the model's performance.
- Compare the performance of your model with the benchmark result from
the BERT model.
Objective:
Fine-tune T5 model on WiC dataset. Ref: https://arxiv.org/pdf/1905.00537
Do preprocessing of the dataset, and appropriate hyperparameter tuning
for model optimization.
Task description:
Input: a word w which is present in two sentences. The task is to classify
whether the given word is used in the same sense in both sentences or not.
This is a binary classification problem.
Evaluation:
- Report accuracy.
Objective:
Fine-tune T5 model on BoolQ dataset. Ref:
https://arxiv.org/pdf/1905.00537.
Do preprocessing of the dataset, and appropriate hyperparameter tuning
for model optimization.
Task example:
Passage: Barq’s – Barq’s is an American soft drink. Its brand of root beer is
notable for having caffeine. Barq’s, created by Edward Barq and bottled
since the turn of the 20th century, is owned by the Barq family but bottled
by the Coca-Cola Company. It was known as Barq’s Famous Olde Tyme
Root Beer until 2012.
Question: is barq’s root beer a pepsi product
Answer: No
Evaluation:
Report accuracy
Objective:
Fine-tune T5 model on WSC dataset. Ref: https://arxiv.org/pdf/1905.00537.
Do preprocessing of the dataset, and appropriate hyperparameter tuning
for model optimization.
Task Description:
Coreference resolution is determining the particular entity that a pronoun
refers to.
Example:
Text: Mark told Pete many lies about himself, which Pete included in his
book. He should have been more truthful. Coreference: False
## The output should be “False” because “he” doesn’t refer to Pete. “Mark”
is referred to by the pronoun “he”.
Evaluation:
Report accuracy