BERT

Uploaded by

Shukdev Datta 1911838042

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

BERT

Uploaded by

Shukdev Datta 1911838042

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

BERT

INSTRUCTOR NAME: SHUKDEV DATTA

ML DEVELOPER AT INNOVATIVE SKILLS
What is BERT?
• BERT language model is an open source machine learning framework for natural language
processing (NLP). BERT is designed to help computers understand the meaning of ambiguous
language in text by using surrounding text to establish context. The BERT framework was
pretrained using text from Wikipedia and can be fine-tuned with question-and-answer data
sets.
• BERT, which stands for Bidirectional Encoder Representations from Transformers, is based on
transformers, a deep learning model in which every output element is connected to every
input element, and the weightings between them are dynamically calculated based upon
their connection.
Background and history of BERT
• Google first introduced the transformer model in 2017. At that time, language models
primarily used recurrent neural networks (RNN) and convolutional neural networks (CNN) to
handle NLP tasks.
• CNNs and RNNs are competent models, however, they require sequences of data to be
processed in a fixed order. Transformer models are considered a significant improvement
because they don't require data sequences to be processed in any fixed order.
• Because transformers can process data in any order, they enable training on larger amounts
of data than was possible before their existence. This facilitated the creation of pretrained
models like BERT, which was trained on massive amounts of language data prior to its
release.
How BERT works
• BERT was pretrained using only a collection of unlabeled, plain text, namely the entirety of
English Wikipedia and the Brown Corpus. It continues to learn through unsupervised
learning from unlabeled text and improves even as it's being used in practical applications
such as Google search.
• BERT's pretraining serves as a base layer of knowledge from which it can build its responses.
From there, BERT can adapt to the ever-growing body of searchable content and queries,
and it can be fine-tuned to a user's specifications. This process is known as transfer learning.
Components leading to BERT
Creation
• Transformers

• Masked language modeling

• Self-attention mechanisms

• Next sentence prediction

Transformers
• Google's work on transformers made BERT possible. The
transformer is the part of the model that gives BERT its
increased capacity for understanding context and
ambiguity in language. The transformer processes any
given word in relation to all other words in a sentence,
rather than processing them one at a time. By looking at
all surrounding words, the transformer enables BERT to
understand the full context of the word and therefore
better understand searcher intent.
• This is contrasted against the traditional method of
language processing, known as word embedding. This
approach was used in models such as GloVe and
word2vec. It would map every single word to a vector,
which represented only one dimension of that word's Fig: Word Embeddings
meaning.
Masked language modeling
• Word embedding models require large data sets of structured data. While they are adept at many
general NLP tasks, they fail at the context-heavy, predictive nature of question answering because
all words are in some sense fixed to a vector or meaning.
• BERT uses an MLM method to keep the word in focus from seeing itself, or having a fixed meaning
independent of its context. BERT is forced to identify the masked word based on context alone. In
BERT, words are defined by their surroundings, not by a prefixed identity. How?
Masked language modeling
• Imagine you're playing a guessing game where you have to figure out a missing word in a sentence.
BERT, which is a type of language model, plays a similar game. But instead of just guessing, it learns
to predict the missing word by looking at the words around it in a sentence.

• The trick here is that BERT doesn't know the exact word that's missing. It's like trying to solve a
puzzle without knowing all the pieces. So, BERT has to pay close attention to the context or the
other words in the sentence to make an educated guess about what the missing word could be.

• Because of this, BERT doesn't have a fixed idea of what each word means on its own. Instead, it
learns the meaning of words based on how they're used in different sentences. This way, each word
gets its meaning from the words around it, not from some pre-set definition. This helps BERT
understand language in a more flexible and context-dependent way.
Self-attention mechanisms
• BERT also relies on a self-attention mechanism that captures and understands relationships among
words in a sentence. The bidirectional transformers at the center of BERT's design make this
possible. This is significant because often, a word may change meaning as a sentence develops.
Each word added augments the overall meaning of the word the NLP algorithm is focusing on. The
more words that are present in each sentence or phrase, the more ambiguous the word in focus
becomes. BERT accounts for the augmented meaning by reading bidirectionally, accounting for the
effect of all other words in a sentence on the focus word and eliminating the left-to-right
momentum that biases words towards a certain meaning as a sentence progresses.
Self-attention mechanisms
• Think of BERT like a detective trying to understand a story. It uses a special tool called self-attention
to figure out how all the words in a sentence relate to each other. This helps BERT understand how
the meaning of a word might change as the sentence goes on.

• The cool thing about BERT is that it doesn't just look at words one after another. It looks at all the
words in the sentence at the same time, kind of like how you might scan a whole page of a book.
This helps it understand the connections between words better.

• For example, if you have a sentence like "She went to the bank to deposit money," the word "bank"
could mean a riverbank or a place where you put money. BERT looks at all the words around "bank"
to figure out which meaning makes sense.

• By reading both forwards and backwards in the sentence, BERT can catch these changes in meaning
as the sentence unfolds. This helps it avoid getting stuck on just one meaning of a word and makes
it better at understanding the whole story.
Next sentence prediction
• NSP is a training technique that teaches BERT to predict whether a certain sentence follows a
previous sentence to test its knowledge of relationships between sentences.
• Specifically, BERT is given both sentence pairs that are correctly paired and pairs that are wrongly
paired so it gets better at understanding the difference.
• Over time, BERT gets better at predicting next sentences accurately.
Next sentence prediction
• NSP involves giving BERT two sentences, sentence 1 and sentence 2. Then, BERT is asked the
question: “HEY BERT, DOES SENTENCE 1 COME AFTER SENTENCE 2?” --- and BERT replies with
isNextSentence or NotNextSentence.

Consider the following three sentences below:

1. Tony drove home after playing football in front of his friend’s house for three hours.
2. In the milky way galaxy, there are eight planets, and Earth is neither the smallest nor the largest.
3. Once home, Tony ate the remaining food he left in the Fridge and fell asleep on the floor.

• Which of the sentences would you say followed the other logically? 2 after 1? Probably not. These
are the questions that BERT is supposed to answer.
• Sentence 3 comes after 1 because of the contextual follow-up in both sentences. Secondly, an easy
takeaway is that both sentences contain “Tony”.
What is BERT used for?
Sequence-to-sequence language generation tasks such as:

• Question answering.
• Abstract summarization.
• Sentence prediction.
• Conversational response generation.

NLU tasks such as:

• Polysemy and coreference resolution.

• Word sense disambiguation.
• Sentiment classification
Polysemy and coreference
resolution
Word sense disambiguation
Word sense disambiguation
Word sense disambiguation
Clearly the word bank in sentence S1 refers to a
sloping land near a water body and bank in S2
refers to a financial institution. This is an example
of lexical ambiguity that arises in linguistics due to
different interpretations of meanings of a word.
While this task of disambiguation of a polysemous
word seems pretty obvious for humans, it turns out
that it is not so for machines and algorithms. In
NLP, we formally call this a problem of Word Sense
Disambiguation (WSD) and BERT addresses this
issues well.
BERT vs. generative pre-trained
transformers (GPT)
• While BERT and GPT models are among the best language models, they exist for different reasons.
The initial GPT-3 model, along with OpenAI's subsequent more advanced GPT models, are also
language models trained on massive data sets. While they share this in common with BERT, BERT
differs in multiple ways.
BERT
• Google developed BERT to serve as a bidirectional transformer model that examines words within
text by considering both left-to-right and right-to-left contexts. It helps computer systems
understand text as opposed to creating text, which GPT models are made to do. BERT excels at NLU
tasks as well as performing sentiment analysis. It's ideal for Google searches and customer
feedback.
GPT
• GPT models differ from BERT in both their objectives and their use cases. GPT models are forms of
generative AI that generate original text and other forms of content. They're also well-suited for
summarizing long pieces of text and text that's hard to interpret.
Thank You!!!

Artifcial Intelligence (AI) Student Assistants in the Classroom- Designing Chatbots to Support Student Success
No ratings yet
Artifcial Intelligence (AI) Student Assistants in the Classroom- Designing Chatbots to Support Student Success
23 pages
COMM 1103 Final Exam Winter 2024
No ratings yet
COMM 1103 Final Exam Winter 2024
3 pages
Algorithmia 2021 - Enterprise - ML - Trends
No ratings yet
Algorithmia 2021 - Enterprise - ML - Trends
41 pages
Art Appreciation Art Appreciation: Art Appreciation (Cavite State University) Art Appreciation (Cavite State University)
No ratings yet
Art Appreciation Art Appreciation: Art Appreciation (Cavite State University) Art Appreciation (Cavite State University)
42 pages
IDC Executive Insights January2011 T 76-4420 PDF
No ratings yet
IDC Executive Insights January2011 T 76-4420 PDF
5 pages
Fruits of Ethiopia: The Rights of The Child or The Harvesting of Children
No ratings yet
Fruits of Ethiopia: The Rights of The Child or The Harvesting of Children
120 pages
Imbalanced DATA Conditional Generative Adversarial Network: (Douzas and Bacao, 2017)
No ratings yet
Imbalanced DATA Conditional Generative Adversarial Network: (Douzas and Bacao, 2017)
27 pages
HARA Token White Paper v20180923
No ratings yet
HARA Token White Paper v20180923
60 pages
BERT A Review of Applications in Sentiment Analysis
No ratings yet
BERT A Review of Applications in Sentiment Analysis
10 pages
Module 3 Image Segmentation
No ratings yet
Module 3 Image Segmentation
296 pages
Mall Customer Data Analysis PDF
No ratings yet
Mall Customer Data Analysis PDF
10 pages
Securing Generative Ai
No ratings yet
Securing Generative Ai
5 pages
2023 Predicts Composable Applications
No ratings yet
2023 Predicts Composable Applications
23 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Text Mining With R - Twitter Data Analysis
No ratings yet
Text Mining With R - Twitter Data Analysis
24 pages
Real Time Object Detection Using Deep Learning Andmachine Learning Project
No ratings yet
Real Time Object Detection Using Deep Learning Andmachine Learning Project
56 pages
Text Classification on Call Center Data Using BERT
No ratings yet
Text Classification on Call Center Data Using BERT
4 pages
The Dawn of LMMS: Preliminary Explorations With Gpt-4V (Ision)
No ratings yet
The Dawn of LMMS: Preliminary Explorations With Gpt-4V (Ision)
166 pages
Autonomous Innovation Summit Key Takeaways
No ratings yet
Autonomous Innovation Summit Key Takeaways
62 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
The COMPLETE TRUTH About AI Agents (2024)
No ratings yet
The COMPLETE TRUTH About AI Agents (2024)
32 pages
Data Science Bootcamp Curriculum 2
No ratings yet
Data Science Bootcamp Curriculum 2
7 pages
Spam Detection Using BERT
No ratings yet
Spam Detection Using BERT
6 pages
Maximize The Business Value of Generative Ai
No ratings yet
Maximize The Business Value of Generative Ai
19 pages
Driverless A I Booklet
No ratings yet
Driverless A I Booklet
120 pages
Lecture 5
No ratings yet
Lecture 5
114 pages
Artificial Intelligence: Yunita Sari Kamis, 23 Feb 2012
No ratings yet
Artificial Intelligence: Yunita Sari Kamis, 23 Feb 2012
24 pages
Training Generative Adversarial Networks With Limited Data
No ratings yet
Training Generative Adversarial Networks With Limited Data
37 pages
Autonomous Networks
No ratings yet
Autonomous Networks
33 pages
Transformers
No ratings yet
Transformers
21 pages
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
No ratings yet
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
76 pages
Deep Learning Approaches For Network Int
No ratings yet
Deep Learning Approaches For Network Int
116 pages
Augmented Analytics
No ratings yet
Augmented Analytics
8 pages
CS485 Ch5 Transformers
No ratings yet
CS485 Ch5 Transformers
50 pages
NLP - Twitter Sentiment Analysis With Tensorflow - Sebastian Correa - Medium
No ratings yet
NLP - Twitter Sentiment Analysis With Tensorflow - Sebastian Correa - Medium
13 pages
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
From Everand
The Definitive Guide to Data Integration: Unlock the power of data integration to efficiently manage, transform, and analyze data
Pierre-yves Bonnefoy
No ratings yet
Lecture Notes 1 - Introduction To SMEs
No ratings yet
Lecture Notes 1 - Introduction To SMEs
7 pages
Deep Learning For Health Informatics
No ratings yet
Deep Learning For Health Informatics
18 pages
TensorFlow Lite Micro Embedded Machine L
No ratings yet
TensorFlow Lite Micro Embedded Machine L
13 pages
Lab1 Dimensional Modeling
No ratings yet
Lab1 Dimensional Modeling
13 pages
Cloud Computing - A Tool For Telecommunications Growth
No ratings yet
Cloud Computing - A Tool For Telecommunications Growth
16 pages
Introduction To Learning: Frederic Precioso 24/01/2019
No ratings yet
Introduction To Learning: Frederic Precioso 24/01/2019
179 pages
Anomaly Detection: Course: Data Mining II
No ratings yet
Anomaly Detection: Course: Data Mining II
12 pages
Machine Learning For Automation Software Testing Challenges, Use Cases Advantages & Disadvantages
No ratings yet
Machine Learning For Automation Software Testing Challenges, Use Cases Advantages & Disadvantages
7 pages
Structured Approach To Solution Architecture: Alan Mcsweeney
No ratings yet
Structured Approach To Solution Architecture: Alan Mcsweeney
108 pages
Ebook - Unleash The Next Wave of Productivity With AI A Practical Guide For IT Leaders
No ratings yet
Ebook - Unleash The Next Wave of Productivity With AI A Practical Guide For IT Leaders
9 pages
Session 11-12 - Text Analytics
No ratings yet
Session 11-12 - Text Analytics
38 pages
Brief Introduction To GenAI
No ratings yet
Brief Introduction To GenAI
1 page
Instant Download Explainable Artificial Intelligence For Biomedical And Healthcare Applications 1st Edition Edited By Aditya Khamparia & Deepak Gupta PDF All Chapters
100% (1)
Instant Download Explainable Artificial Intelligence For Biomedical And Healthcare Applications 1st Edition Edited By Aditya Khamparia & Deepak Gupta PDF All Chapters
67 pages
AI White Paper March 2024
No ratings yet
AI White Paper March 2024
35 pages
Bigdata MINT PDF
No ratings yet
Bigdata MINT PDF
4 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
30 pages
The EU AI Act A Pioneering Effort To Regulate Frontier AIInteligencia Artificial
No ratings yet
The EU AI Act A Pioneering Effort To Regulate Frontier AIInteligencia Artificial
10 pages
Everest Group - Continuous Discovery and Intelligent Automation Playbook
No ratings yet
Everest Group - Continuous Discovery and Intelligent Automation Playbook
151 pages
Semantic Web Technologies For Intelligent Engineering Applications
No ratings yet
Semantic Web Technologies For Intelligent Engineering Applications
413 pages
Recomender System Notes
No ratings yet
Recomender System Notes
28 pages
DATA Mining
No ratings yet
DATA Mining
55 pages
Low Code
No ratings yet
Low Code
28 pages
Adrian Hilton, Graham Thomas, Thomas B. Moeslund - Computer Vision in Sports-Springer (2015)
No ratings yet
Adrian Hilton, Graham Thomas, Thomas B. Moeslund - Computer Vision in Sports-Springer (2015)
322 pages
Turban Dss9e Ch08
No ratings yet
Turban Dss9e Ch08
39 pages
Business Architecture
No ratings yet
Business Architecture
25 pages
Predictive Machine Learning Applying Cross Industry Standard Process For Data Mining For The Diagnosis of Diabetes Mellitus Type 2
No ratings yet
Predictive Machine Learning Applying Cross Industry Standard Process For Data Mining For The Diagnosis of Diabetes Mellitus Type 2
14 pages
The Next Frontier For Innovation, Competition and Productivity
No ratings yet
The Next Frontier For Innovation, Competition and Productivity
23 pages
Assignment-2 Solution
No ratings yet
Assignment-2 Solution
4 pages
Mock-2 Decision Tree
No ratings yet
Mock-2 Decision Tree
1 page
Classification Algorithms
No ratings yet
Classification Algorithms
44 pages
Transformer
No ratings yet
Transformer
21 pages
Instructor Name: Shukdev Datta ML Developer at Innovative Skills
No ratings yet
Instructor Name: Shukdev Datta ML Developer at Innovative Skills
17 pages
Introduction To The OBASHI Methdology
No ratings yet
Introduction To The OBASHI Methdology
28 pages
The Fifth Discipline Fieldbook
0% (1)
The Fifth Discipline Fieldbook
2 pages
5 Masigasig Reading Profile Filipino 2020 2021
No ratings yet
5 Masigasig Reading Profile Filipino 2020 2021
3 pages
Complete Road-Map 2024
No ratings yet
Complete Road-Map 2024
18 pages
Earliest Vinayā - by Frauwallner - in Chrono Order
No ratings yet
Earliest Vinayā - by Frauwallner - in Chrono Order
110 pages
Action Plan Crisis Management
No ratings yet
Action Plan Crisis Management
4 pages
Article 9
No ratings yet
Article 9
18 pages
Narrative Report Final
No ratings yet
Narrative Report Final
4 pages
Simple Present Tense and Present Continuous Tense
100% (1)
Simple Present Tense and Present Continuous Tense
23 pages
Timetable Format
100% (1)
Timetable Format
4 pages
Animal Systems of Communication
No ratings yet
Animal Systems of Communication
7 pages
Les Trois Amis
No ratings yet
Les Trois Amis
11 pages
Talent and Workforce Effects in The Age of AI
No ratings yet
Talent and Workforce Effects in The Age of AI
28 pages
Cvs and Interviews: Graphic Design and Illustration
No ratings yet
Cvs and Interviews: Graphic Design and Illustration
32 pages
Coursera Construction Management Specialization
No ratings yet
Coursera Construction Management Specialization
1 page
List of Colors (Compact) - Wikipedia, The Free Encyclopedia
No ratings yet
List of Colors (Compact) - Wikipedia, The Free Encyclopedia
23 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
Stake's Countenance Model
No ratings yet
Stake's Countenance Model
15 pages
PHD (MGT) Sample Test Paper-Spring-2017
No ratings yet
PHD (MGT) Sample Test Paper-Spring-2017
6 pages
Team Effectiveness Experiment
No ratings yet
Team Effectiveness Experiment
21 pages
Cover Letter - System Analyst PDF
No ratings yet
Cover Letter - System Analyst PDF
1 page
KW Only
100% (1)
KW Only
33 pages
Lesson Plan Have Got
No ratings yet
Lesson Plan Have Got
4 pages
FCL8 Prelim Period
No ratings yet
FCL8 Prelim Period
12 pages
Zacaria, Sittie Alyanna N. - Matatag Agenda (Reflection Paper)
100% (4)
Zacaria, Sittie Alyanna N. - Matatag Agenda (Reflection Paper)
3 pages
S.NO. Name of University Address Contact Details: National Institute of Technology, Raipur
No ratings yet
S.NO. Name of University Address Contact Details: National Institute of Technology, Raipur
7 pages
Anni (
No ratings yet
Anni (
2 pages

BERT

Uploaded by

BERT

Uploaded by

BERT

INSTRUCTOR NAME: SHUKDEV DATTA

• Masked language modeling

• Next sentence prediction

Consider the following three sentences below:

NLU tasks such as:

• Polysemy and coreference resolution.

You might also like