BERT
BERT
• Self-attention mechanisms
• The trick here is that BERT doesn't know the exact word that's missing. It's like trying to solve a
puzzle without knowing all the pieces. So, BERT has to pay close attention to the context or the
other words in the sentence to make an educated guess about what the missing word could be.
• Because of this, BERT doesn't have a fixed idea of what each word means on its own. Instead, it
learns the meaning of words based on how they're used in different sentences. This way, each word
gets its meaning from the words around it, not from some pre-set definition. This helps BERT
understand language in a more flexible and context-dependent way.
Self-attention mechanisms
• BERT also relies on a self-attention mechanism that captures and understands relationships among
words in a sentence. The bidirectional transformers at the center of BERT's design make this
possible. This is significant because often, a word may change meaning as a sentence develops.
Each word added augments the overall meaning of the word the NLP algorithm is focusing on. The
more words that are present in each sentence or phrase, the more ambiguous the word in focus
becomes. BERT accounts for the augmented meaning by reading bidirectionally, accounting for the
effect of all other words in a sentence on the focus word and eliminating the left-to-right
momentum that biases words towards a certain meaning as a sentence progresses.
Self-attention mechanisms
• Think of BERT like a detective trying to understand a story. It uses a special tool called self-attention
to figure out how all the words in a sentence relate to each other. This helps BERT understand how
the meaning of a word might change as the sentence goes on.
• The cool thing about BERT is that it doesn't just look at words one after another. It looks at all the
words in the sentence at the same time, kind of like how you might scan a whole page of a book.
This helps it understand the connections between words better.
• For example, if you have a sentence like "She went to the bank to deposit money," the word "bank"
could mean a riverbank or a place where you put money. BERT looks at all the words around "bank"
to figure out which meaning makes sense.
• By reading both forwards and backwards in the sentence, BERT can catch these changes in meaning
as the sentence unfolds. This helps it avoid getting stuck on just one meaning of a word and makes
it better at understanding the whole story.
Next sentence prediction
• NSP is a training technique that teaches BERT to predict whether a certain sentence follows a
previous sentence to test its knowledge of relationships between sentences.
• Specifically, BERT is given both sentence pairs that are correctly paired and pairs that are wrongly
paired so it gets better at understanding the difference.
• Over time, BERT gets better at predicting next sentences accurately.
Next sentence prediction
• NSP involves giving BERT two sentences, sentence 1 and sentence 2. Then, BERT is asked the
question: “HEY BERT, DOES SENTENCE 1 COME AFTER SENTENCE 2?” --- and BERT replies with
isNextSentence or NotNextSentence.
• Which of the sentences would you say followed the other logically? 2 after 1? Probably not. These
are the questions that BERT is supposed to answer.
• Sentence 3 comes after 1 because of the contextual follow-up in both sentences. Secondly, an easy
takeaway is that both sentences contain “Tony”.
What is BERT used for?
Sequence-to-sequence language generation tasks such as:
• Question answering.
• Abstract summarization.
• Sentence prediction.
• Conversational response generation.