NLP Quick NOtes
NLP Quick NOtes
QUICK NOTES
S.NO Topic Name Summary
1 NLP "Natural Language Processing "
Natural language processing (NLP) refers to the branch of computer
science—and more specifically, the branch of artificial intelligence or
2 What is NLP
AI—concerned with giving computers the ability to understand text and
spoken words in much the same way human beings can.
Term Frequency and Term frequency refers to the number of times that a term t occurs in
12 Inverse Document document d. The inverse document frequency is a measure of whether a
Frequency term is common or rare in a given document corpus
The cased version of BERT preserves the original case of the text, while
BERT CASED AND the uncased version converts all text to lowercase. For example, the
34
UNCASED FORMATS? cased version would treat "Apple" and "apple" as two distinct tokens,
while the uncased version would treat them as the same token.
Hugging Face Transformers is an open-source library developed by
Hugging Face,which can be fine-tuned on task-specific datasets for
various NLP tasks such as text classification, question answering, and
HUGGING FACE
35 language translation.The library provides pre-trained models in PyTorch
TRANSFORMERS?
and TensorFlow, making it accessible to users of both frameworks.The
library also includes utilities for tokenization, data preprocessing, and
evaluation of NLP models.
In Natural Language Processing (NLP), the term "hidden
representation" generally refers to the output of the final layer of a pre-
trained language model that captures a contextualized encoding of the
input text, where each token in the input sequence is represented as a
high-dimensional vector that encodes its meaning and context in the
input sequence.
HIDDEN REP AND CLS The "CLS" head in NLP stands for "classification head" and refers to a
36 neural network layer that is typically added on top of the hidden
HEAD IN NLP?
representation of a pre-trained language model.The CLS head works by
taking the hidden representation of the special [CLS] token that is added
to the beginning of the input sequence in BERT, and passing it through a
linear layer and activation function to produce the final output vector.
This output vector is then used as input to a final classification layer that
outputs the predicted class for the input text.
Text classification: BERT can be fine-tuned for text classification tasks
such as sentiment analysis, topic classification, and spam detection. Fine-
37 Text classification tuning involves training the BERT model on a task-specific dataset to
generate task-specific embeddings, which can then be used as inputs to
a classification model.
Question answering: BERT can be used for question answering tasks,
where the model is trained to generate answers to questions based on a
38 Question answering:
given context. This has applications in tasks such as chatbots and
customer service.
Named entity recognition: BERT can be fine-tuned for named entity
39 Named entity recognition recognition, where the model is trained to identify and classify named
entities such as people, organizations, and locations in a text.
Text generation: BERT can be used for text generation tasks such as
41 Text generation summarization, paraphrasing, and text completion, where the model
generates text based on a given prompt or context.