Understanding BERT
Understanding BERT
Transformers
What is BERT?
Architecture of BERT
BERT is trained on large corpora (like Wikipedia and BookCorpus) using two
tasks:
Example:
Input:
The man went to the [MASK] to buy milk.
Output:
store
This forces BERT to learn bidirectional context since it needs both left and
right words to fill in the blank.
Example:
This enables BERT to perform well on tasks like question answering and
natural language inference.
Common Tasks:
Where:
Limitations of BERT