Introduction To Natural Language Processing - GeeksforGeeks
Introduction To Natural Language Processing - GeeksforGeeks
Machine Learning Tutorial Data Analysis Tutorial Python – Data visualization tutorial NumPy Pandas
The essence of Natural Language Processing lies in making computers understand the
natural language. That’s not an easy task though. Computers can understand the
structured form of data like spreadsheets and the tables in the database, but human
languages, texts, and voices form an unstructured category of data, and it gets difficult
for the computer to understand it, and there arises the need for Natural Language
Processing. There’s a lot of natural language data out there in various forms and it
would get very easy if computers can understand and process that data. We can train
the models in accordance with expected output in different ways. Humans have been
writing for thousands of years, there are a lot of literature pieces available, and it
would be great if we make computers understand that. But the task is never going to
be easy. There are various challenges floating out there like understanding the correct
meaning of the sentence, correct Named-Entity Recognition(NER), correct prediction of
various parts of speech, coreference resolution(the most challenging thing in my
opinion). Computers can’t truly understand the human language. If we feed enough
data and train a model properly, it can distinguish and try categorizing various parts of
speech(noun, verb, adjective, supporter, etc…) based on previously fed data and
experiences. If it encounters a new word it tried making the nearest guess which can
be embarrassingly wrong few times. It’s very difficult for a computer to extract the
exact meaning from a sentence. For example – The boy radiated fire like vibes. The
boy had a very motivating personality or he actually radiated fire? As you see over
here, parsing English with a computer is going to be complicated. There are various
stages involved in training a model. Solving a complex problem in Machine Learning
means building a pipeline. In simple terms, it means breaking a complex problem into
a number of small problems, making models for each of them and then integrating
these models. A similar thing is done in NLP. We can break down the process of
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 1/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks
understanding English for a model into a number of small pieces. It would be really
great if a computer could understand that San Pedro is an island in Belize district in
Central America with a population of 16, 444 and it is the second largest town in
Belize. But to make the computer understand this, we need to teach computer very
basic concepts of written language. So let’s start by creating an NLP pipeline. It has
various steps which will give us the desired output(maybe not in a few rare cases) at
the end.
Overall, NLP is a rapidly evolving field that is driving new advances in computer
science and artificial intelligence, and has the potential to transform the way we
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 2/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks
Natural Language Processing (NLP) is a field of Artificial Intelligence (AI) that deals
with the interaction between computers and human languages. NLP is used to
analyze, understand, and generate natural language text and speech. The goal of NLP
is to enable computers to understand and interpret human language in a way that is
similar to how humans process language.
NLP involves the use of several techniques, such as machine learning, deep learning,
and rule-based systems. Some popular tools and libraries used in NLP include NLTK
(Natural Language Toolkit), spaCy, and Gensim.
Overall, NLP is a rapidly growing field with many practical applications, and it has the
potential to revolutionize the way we interact with computers and machines using
natural language.
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 3/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks
Language translation: NLP techniques are used to translate text from one
language to another, which is useful for tasks such as global communication and e-
commerce.
Text summarization: NLP techniques are used to summarize long text documents
into shorter versions, which is useful for tasks such as news summarization and
document indexing.
Sentiment analysis: NLP techniques are used to determine the sentiment or
emotion expressed in text, which is useful for tasks such as customer feedback
analysis and social media monitoring.
Question answering: NLP techniques are used to answer questions asked in natural
language, which is useful for tasks such as chatbots and virtual assistants.
NLP is a rapidly growing field and it is being used in many industries such as
healthcare, education, e-commerce, and customer service. NLP is also used to
improve the performance of natural language-based systems like chatbot, virtual
assistants, recommendation systems, and more. With the advancement in NLP, it
has become possible for computers to understand and process human languages in
a way that can be used for various applications such as speech recognition,
language translation, question answering, and more.
Step #1: Sentence Segmentation Breaking the piece of text in various sentences.
Input : San Pedro is a town on the southern part of the island of Ambergris Caye
in the Belize District of the nation of Belize, in Central America. According to
2015 mid-year estimates, the town has a population of about 16, 444. It is the
second-largest town in the Belize District and largest in the Belize Rural South
constituency. Output : San Pedro is a town on the southern part of the island of
Ambergris Caye in the 2.Belize District of the nation of Belize, in Central
America. According to 2015 mid-year estimates, the town has a population of
about 16, 444. It is the second-largest town in the Belize District and largest in
the Belize Rural South constituency. For coding a sentence segmentation model,
we can consider splitting a sentence when it encounters any punctuation mark.
But modern NLP pipelines have techniques to split even if the document isn’t
formatted properly.
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 4/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks
Step #2: Word Tokenization Breaking the sentence into individual words called as
tokens. We can tokenize them whenever we encounter a space, we can train a
model in that way. Even punctuations are considered as individual tokens as they
have some meaning.
Input : San Pedro is a town on the southern part of the island of Ambergris Caye
in the Belize District of the nation of Belize, in Central America. According to
2015 mid-year estimates, the town has a population of about 16, 444. It is the
second-largest town in the Belize District and largest in the Belize Rural South
constituency. Output : ‘San Pedro’, ’ is’, ’a’, ’town’ and so.
Step #3: Predicting Parts of Speech for each token Predicting whether the word is
a noun, verb, adjective, adverb, pronoun, etc. This will help to understand what the
sentence is talking about. This can be achieved by feeding the tokens( and the
words around it) to a pre-trained part-of-speech classification model. This model
was fed a lot of English words with various parts of speech tagged to them so that
it classifies the similar words it encounters in future in various parts of speech.
Again, the models don’t really understand the ‘sense’ of the words, it just classifies
them on the basis of its previous experience. It’s pure statistics. The process will
look like this:
For example –
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 5/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks
Here, both Buffalo and Buffaloes mean the same. But, the computer can confuse it
as two different terms as it doesn’t know anything. So we have to teach the
computer that both terms mean the same. We have to tell a computer that both
sentences are talking about the same concept. So we need to find out the most
basic form or root form or lemma of the word and feed it to the model accordingly.
In a similar fashion, we can use it for verbs too. ‘Play’ and ‘Playing’ should be
considered as same.
Step #5: Identifying stop words There are various words in the English language
that are used very frequently like ‘a’, ‘and’, ‘the’ etc. These words make a lot of
noise while doing statistical analysis. We can take these words out. Some NLP
pipelines will categorize these words as stop words, they will be filtered out while
doing some statistical analysis. Definitely, they are needed to understand the
dependency between various tokens to get the exact sense of the sentence. The
list of stop words varies and depends on what kind of output are you expecting.
Step 6.1: Dependency Parsing This means finding out the relationship between the
words in the sentence and how they are related to each other. We create a parse
tree in dependency parsing, with root as the main verb in the sentence. If we talk
about the first sentence in our example, then ‘is’ is the main verb and it will be the
root of the parse tree. We can construct a parse tree of every sentence with one
root word(main verb) associated with it. We can also identify the kind of
relationship that exists between the two words. In our example, ‘San Pedro’ is the
subject and ‘island’ is the attribute. Thus, the relationship between ‘San Pedro’ and
‘is’, and ‘island’ and ‘is’ can be established. Just like we trained a Machine Learning
model to identify various parts of speech, we can train a model to identify the
dependency between words by feeding many words. It’s a complex task though. In
2016, Google released a new dependency parser Parsey McParseface which used a
deep learning approach.
Step 6.2: Finding Noun Phrases We can group the words that represent the same
idea. For example – It is the second-largest town in the Belize District and largest in
the Belize Rural South constituency. Here, tokens ‘second’, ‘largest’ and ‘town’ can
be grouped together as they together represent the same thing ‘Belize’. We can use
the output of dependency parsing to combine such words. Whether to do this step
or not completely depends on the end goal, but it’s always quick to do this if we
don’t want much information about which words are adjective, rather focus on
other important details.
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 6/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks
Step #7: Named Entity Recognition(NER) San Pedro is a town on the southern part
of the island of Ambergris Caye in the 2. Belize District of the nation of Belize, in
Central America. Here, the NER maps the words with the real world places. The
places that actually exist in the physical world. We can automatically extract the
real world places present in the document using NLP. If the above sentence is the
input, NER will map it like this way:
NER systems look for how a word is placed in a sentence and make use of other
statistical models to identify what kind of word actually it is. For example –
‘Washington’ can be a geographical location as well as the last name of any
person. A good NER system can identify this. Kinds of objects that a typical NER
system can tag:
People’s names.
Company names.
Geographical locations
Product names.
Date and time.
Amount of money.
Events.
Step #8: Coreference Resolution: San Pedro is a town on the southern part of the
island of Ambergris Caye in the Belize District of the nation of Belize, in Central
America. According to 2015 mid-year estimates, the town has a population of
about 16, 444. It is the second-largest town in the Belize District and largest in the
Belize Rural South constituency. Here, we know that ‘it’ in the sentence 6 stands for
San Pedro, but for a computer, it isn’t possible to understand that both the tokens
are same because it treats both the sentences as two different things while it’s
processing them. Pronouns are used with a high frequency in English literature and
it becomes difficult for a computer to understand that both things are same.
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 7/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks
ADVANTAGES OR DISADVANTAGES:
Important points:
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 9/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks
2. Feature Extraction: Feature extraction is the process of representing the text data
as a set of features that can be used in machine learning models.
3. Word Embeddings: Word embeddings are a type of feature representation that
captures the semantic meaning of words in a high-dimensional space.
4. Neural Networks: Deep learning models, such as neural networks, have shown
promising results in NLP tasks, such as language modeling, sentiment analysis, and
machine translation.
5. Evaluation Metrics: It is important to use appropriate evaluation metrics for NLP
tasks, such as accuracy, precision, recall, F1 score, and perplexity.
Here are some important points to keep in mind when it comes to Natural
Language Processing:
1. NLP is a subfield of computer science and artificial intelligence that deals with the
interaction between computers and human languages.
2. The primary goal of NLP is to enable computers to understand, interpret, and
generate natural language, the way humans do.
3. NLP involves a variety of techniques, including computational linguistics, machine
learning, and statistical modeling.
4. NLP is used in a wide range of industries, including finance, healthcare, education,
and entertainment.
5. Some of the main applications of NLP include language translation, speech
recognition, sentiment analysis, text classification, and information retrieval.
6. NLP is a rapidly evolving field that is driving new advances in computer science and
artificial intelligence.
7. NLP has the potential to transform the way we interact with technology in our daily
lives.
Similar Reads
Natural Language Processing - ML | Natural Language
Overview Processing using Deep
Learning
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 10/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks
Related Tutorials
Computer Vision Tutorial Pandas AI: The Generative AI
Python Library
Previous Next
Article Contributed By :
Jaydeep1998
J Jaydeep1998
Follow
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 11/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks
Company Explore
About Us Job-A-Thon Hiring Challenge
Legal Hack-A-Thon
Careers GfG Weekly Contest
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 12/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 13/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks
Commerce UPSC
Accountancy Polity Notes
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 14/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 15/15