0% found this document useful (0 votes)
187 views

Introduction To Natural Language Processing - GeeksforGeeks

Natural Language Processing (NLP) is a field of computer science that deals with interactions between computers and humans using natural language. NLP involves analyzing, understanding, and generating text and speech. Common NLP tasks include text classification, named entity recognition, and sentiment analysis.

Uploaded by

RAPTER GAMING
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
187 views

Introduction To Natural Language Processing - GeeksforGeeks

Natural Language Processing (NLP) is a field of computer science that deals with interactions between computers and humans using natural language. NLP involves analyzing, understanding, and generating text and speech. Common NLP tasks include text classification, named entity recognition, and sentiment analysis.

Uploaded by

RAPTER GAMING
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

Machine Learning Tutorial Data Analysis Tutorial Python – Data visualization tutorial NumPy Pandas

Introduction to Natural Language Processing


J Jaydeep1998

Read Discuss Courses

The essence of Natural Language Processing lies in making computers understand the
natural language. That’s not an easy task though. Computers can understand the
structured form of data like spreadsheets and the tables in the database, but human
languages, texts, and voices form an unstructured category of data, and it gets difficult
for the computer to understand it, and there arises the need for Natural Language
Processing. There’s a lot of natural language data out there in various forms and it
would get very easy if computers can understand and process that data. We can train
the models in accordance with expected output in different ways. Humans have been
writing for thousands of years, there are a lot of literature pieces available, and it
would be great if we make computers understand that. But the task is never going to
be easy. There are various challenges floating out there like understanding the correct
meaning of the sentence, correct Named-Entity Recognition(NER), correct prediction of
various parts of speech, coreference resolution(the most challenging thing in my
opinion). Computers can’t truly understand the human language. If we feed enough
data and train a model properly, it can distinguish and try categorizing various parts of
speech(noun, verb, adjective, supporter, etc…) based on previously fed data and
experiences. If it encounters a new word it tried making the nearest guess which can
be embarrassingly wrong few times. It’s very difficult for a computer to extract the
exact meaning from a sentence. For example – The boy radiated fire like vibes. The
boy had a very motivating personality or he actually radiated fire? As you see over
here, parsing English with a computer is going to be complicated. There are various
stages involved in training a model. Solving a complex problem in Machine Learning
means building a pipeline. In simple terms, it means breaking a complex problem into
a number of small problems, making models for each of them and then integrating
these models. A similar thing is done in NLP. We can break down the process of
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 1/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

understanding English for a model into a number of small pieces. It would be really
great if a computer could understand that San Pedro is an island in Belize district in
Central America with a population of 16, 444 and it is the second largest town in
Belize. But to make the computer understand this, we need to teach computer very
basic concepts of written language. So let’s start by creating an NLP pipeline. It has
various steps which will give us the desired output(maybe not in a few rare cases) at
the end.

Natural Language Processing (NLP) is a subfield of computer science and artificial


intelligence that deals with the interaction between computers and human languages.
The primary goal of NLP is to enable computers to understand, interpret, and generate
natural language, the way humans do.

NLP involves a variety of techniques, including computational linguistics, machine


learning, and statistical modeling. These techniques are used to analyze, understand,
and manipulate human language data, including text, speech, and other forms of
communication.

Some of the main applications of NLP include language translation, speech


recognition, sentiment analysis, text classification, and information retrieval. NLP is
used in a wide range of industries, including finance, healthcare, education, and
entertainment, to name a few.

Overall, NLP is a rapidly evolving field that is driving new advances in computer
science and artificial intelligence, and has the potential to transform the way we

https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 2/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

interact with technology in our daily lives.

Natural Language Processing (NLP) is a field of Artificial Intelligence (AI) that deals
with the interaction between computers and human languages. NLP is used to
analyze, understand, and generate natural language text and speech. The goal of NLP
is to enable computers to understand and interpret human language in a way that is
similar to how humans process language.

1. Natural Language Processing (NLP) is a field of computer science and artificial


intelligence that focuses on the interaction between computers and humans using
natural language. It involves analyzing, understanding, and generating human
language data, such as text and speech.
2. NLP has a wide range of applications, including sentiment analysis, machine
translation, text summarization, chatbots, and more. Some common tasks in NLP
include:
3. Text Classification: Classifying text into different categories based on their content,
such as spam filtering, sentiment analysis, and topic modeling.
4. Named Entity Recognition (NER): Identifying and categorizing named entities in
text, such as people, organizations, and locations.
5. Part-of-Speech (POS) Tagging: Assigning a part of speech to each word in a
sentence, such as noun, verb, adjective, and adverb.
6. Sentiment Analysis: Analyzing the sentiment of a piece of text, such as positive,
negative, or neutral.
7. Machine Translation: Translating text from one language to another.

NLP involves the use of several techniques, such as machine learning, deep learning,
and rule-based systems. Some popular tools and libraries used in NLP include NLTK
(Natural Language Toolkit), spaCy, and Gensim.

Overall, NLP is a rapidly growing field with many practical applications, and it has the
potential to revolutionize the way we interact with computers and machines using
natural language.

NLP techniques are used in a wide range of applications, including:


Speech recognition and transcription: NLP techniques are used to convert speech
to text, which is useful for tasks such as dictation and voice-controlled assistants.

https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 3/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

Language translation: NLP techniques are used to translate text from one
language to another, which is useful for tasks such as global communication and e-
commerce.
Text summarization: NLP techniques are used to summarize long text documents
into shorter versions, which is useful for tasks such as news summarization and
document indexing.
Sentiment analysis: NLP techniques are used to determine the sentiment or
emotion expressed in text, which is useful for tasks such as customer feedback
analysis and social media monitoring.
Question answering: NLP techniques are used to answer questions asked in natural
language, which is useful for tasks such as chatbots and virtual assistants.
NLP is a rapidly growing field and it is being used in many industries such as
healthcare, education, e-commerce, and customer service. NLP is also used to
improve the performance of natural language-based systems like chatbot, virtual
assistants, recommendation systems, and more. With the advancement in NLP, it
has become possible for computers to understand and process human languages in
a way that can be used for various applications such as speech recognition,
language translation, question answering, and more.
Step #1: Sentence Segmentation Breaking the piece of text in various sentences.

Input : San Pedro is a town on the southern part of the island of Ambergris Caye
in the Belize District of the nation of Belize, in Central America. According to
2015 mid-year estimates, the town has a population of about 16, 444. It is the
second-largest town in the Belize District and largest in the Belize Rural South
constituency. Output : San Pedro is a town on the southern part of the island of
Ambergris Caye in the 2.Belize District of the nation of Belize, in Central
America. According to 2015 mid-year estimates, the town has a population of
about 16, 444. It is the second-largest town in the Belize District and largest in
the Belize Rural South constituency. For coding a sentence segmentation model,
we can consider splitting a sentence when it encounters any punctuation mark.
But modern NLP pipelines have techniques to split even if the document isn’t
formatted properly.

https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 4/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

Step #2: Word Tokenization Breaking the sentence into individual words called as
tokens. We can tokenize them whenever we encounter a space, we can train a
model in that way. Even punctuations are considered as individual tokens as they
have some meaning.

Input : San Pedro is a town on the southern part of the island of Ambergris Caye
in the Belize District of the nation of Belize, in Central America. According to
2015 mid-year estimates, the town has a population of about 16, 444. It is the
second-largest town in the Belize District and largest in the Belize Rural South
constituency. Output : ‘San Pedro’, ’ is’, ’a’, ’town’ and so.

Step #3: Predicting Parts of Speech for each token Predicting whether the word is
a noun, verb, adjective, adverb, pronoun, etc. This will help to understand what the
sentence is talking about. This can be achieved by feeding the tokens( and the
words around it) to a pre-trained part-of-speech classification model. This model
was fed a lot of English words with various parts of speech tagged to them so that
it classifies the similar words it encounters in future in various parts of speech.
Again, the models don’t really understand the ‘sense’ of the words, it just classifies
them on the basis of its previous experience. It’s pure statistics. The process will
look like this:

Input : Part of speech classification model


Output : Town - common noun
Is - verb
The - determiner

And similarly, it will classify various tokens.


Step #4: Lemmatization Feeding the model with the root word.

For example –

There’s a Buffalo grazing in the field.


There are Buffaloes grazing in the field.

https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 5/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

Here, both Buffalo and Buffaloes mean the same. But, the computer can confuse it
as two different terms as it doesn’t know anything. So we have to teach the
computer that both terms mean the same. We have to tell a computer that both
sentences are talking about the same concept. So we need to find out the most
basic form or root form or lemma of the word and feed it to the model accordingly.
In a similar fashion, we can use it for verbs too. ‘Play’ and ‘Playing’ should be
considered as same.
Step #5: Identifying stop words There are various words in the English language
that are used very frequently like ‘a’, ‘and’, ‘the’ etc. These words make a lot of
noise while doing statistical analysis. We can take these words out. Some NLP
pipelines will categorize these words as stop words, they will be filtered out while
doing some statistical analysis. Definitely, they are needed to understand the
dependency between various tokens to get the exact sense of the sentence. The
list of stop words varies and depends on what kind of output are you expecting.
Step 6.1: Dependency Parsing This means finding out the relationship between the
words in the sentence and how they are related to each other. We create a parse
tree in dependency parsing, with root as the main verb in the sentence. If we talk
about the first sentence in our example, then ‘is’ is the main verb and it will be the
root of the parse tree. We can construct a parse tree of every sentence with one
root word(main verb) associated with it. We can also identify the kind of
relationship that exists between the two words. In our example, ‘San Pedro’ is the
subject and ‘island’ is the attribute. Thus, the relationship between ‘San Pedro’ and
‘is’, and ‘island’ and ‘is’ can be established. Just like we trained a Machine Learning
model to identify various parts of speech, we can train a model to identify the
dependency between words by feeding many words. It’s a complex task though. In
2016, Google released a new dependency parser Parsey McParseface which used a
deep learning approach.
Step 6.2: Finding Noun Phrases We can group the words that represent the same
idea. For example – It is the second-largest town in the Belize District and largest in
the Belize Rural South constituency. Here, tokens ‘second’, ‘largest’ and ‘town’ can
be grouped together as they together represent the same thing ‘Belize’. We can use
the output of dependency parsing to combine such words. Whether to do this step
or not completely depends on the end goal, but it’s always quick to do this if we
don’t want much information about which words are adjective, rather focus on
other important details.

https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 6/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

Step #7: Named Entity Recognition(NER) San Pedro is a town on the southern part
of the island of Ambergris Caye in the 2. Belize District of the nation of Belize, in
Central America. Here, the NER maps the words with the real world places. The
places that actually exist in the physical world. We can automatically extract the
real world places present in the document using NLP. If the above sentence is the
input, NER will map it like this way:

San Pedro - Geographic Entity


Ambergris Caye - Geographic Entity
Belize - Geographic Entity
Central America - Geographic Entity

NER systems look for how a word is placed in a sentence and make use of other
statistical models to identify what kind of word actually it is. For example –
‘Washington’ can be a geographical location as well as the last name of any
person. A good NER system can identify this. Kinds of objects that a typical NER
system can tag:

People’s names.
Company names.
Geographical locations
Product names.
Date and time.
Amount of money.
Events.

Step #8: Coreference Resolution: San Pedro is a town on the southern part of the
island of Ambergris Caye in the Belize District of the nation of Belize, in Central
America. According to 2015 mid-year estimates, the town has a population of
about 16, 444. It is the second-largest town in the Belize District and largest in the
Belize Rural South constituency. Here, we know that ‘it’ in the sentence 6 stands for
San Pedro, but for a computer, it isn’t possible to understand that both the tokens
are same because it treats both the sentences as two different things while it’s
processing them. Pronouns are used with a high frequency in English literature and
it becomes difficult for a computer to understand that both things are same.

https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 7/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

ADVANTAGES OR DISADVANTAGES:

Advantages of Natural Language Processing:

1. Improves human-computer interaction: NLP enables computers to understand and


respond to human languages, which improves the overall user experience and
makes it easier for people to interact with computers.
2. Automates repetitive tasks: NLP techniques can be used to automate repetitive
tasks, such as text summarization, sentiment analysis, and language translation,
which can save time and increase efficiency.
3. Enables new applications: NLP enables the development of new applications, such
as virtual assistants, chatbots, and question answering systems, that can improve
customer service, provide information, and more.
4. Improves decision-making: NLP techniques can be used to extract insights from
large amounts of unstructured data, such as social media posts and customer
feedback, which can improve decision-making in various industries.
5. Improves accessibility: NLP can be used to make technology more accessible, such
as by providing text-to-speech and speech-to-text capabilities for people with
disabilities.
6. Facilitates multilingual communication: NLP techniques can be used to translate
and analyze text in different languages, which can facilitate communication
between people who speak different languages.
7. Improves information retrieval: NLP can be used to extract information from large
amounts of data, such as search engine results, to improve information retrieval
and provide more relevant results.
8. Enables sentiment analysis: NLP techniques can be used to analyze the sentiment
of text, such as social media posts and customer reviews, which can help
businesses understand how customers feel about their products and services.
9. Improves content creation: NLP can be used to generate content, such as
automated article writing, which can save time and resources for businesses and
content creators.
10. Supports data analytics: NLP can be used to extract insights from text data, which
can support data analytics and improve decision-making in various industries.
11. Enhances natural language understanding: NLP research and development can
lead to improved natural language understanding, which can benefit various
industries and applications.
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 8/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

Disadvantages of Natural Language Processing:

1. Limited understanding of context: NLP systems have a limited understanding of


context, which can lead to misinterpretations or errors in the output.
2. Requires large amounts of data: NLP systems require large amounts of data to train
and improve their performance, which can be expensive and time-consuming to
collect.
3. Limited ability to understand idioms and sarcasm: NLP systems have a limited
ability to understand idioms, sarcasm, and other forms of figurative language,
which can lead to misinterpretations or errors in the output.
4. Limited ability to understand emotions: NLP systems have a limited ability to
understand emotions and tone of voice, which can lead to misinterpretations or
errors in the output.
5. Difficulty with multi-lingual processing: NLP systems may struggle to accurately
process multiple languages, especially if they are vastly different in grammar or
structure.
6. Dependency on language resources: NLP systems heavily rely on language
resources, such as dictionaries and corpora, which may not always be available or
accurate for certain languages or domains.
7. Difficulty with rare or ambiguous words: NLP systems may struggle to accurately
process rare or ambiguous words, which can lead to errors in the output.
8. Lack of creativity: NLP systems are limited to processing and generating output
based on patterns and rules, and may lack the creativity and spontaneity of human
language use.
9. Ethical considerations: NLP systems may perpetuate biases and stereotypes, and
there are ethical concerns around the use of NLP in areas such as surveillance and
automated decision-making.

Sure, here are some additional important points and


recommended reference books for NLP:

Important points:

1. Preprocessing: Before applying NLP techniques, it is essential to preprocess the


text data by cleaning, tokenizing, and normalizing it.

https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 9/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

2. Feature Extraction: Feature extraction is the process of representing the text data
as a set of features that can be used in machine learning models.
3. Word Embeddings: Word embeddings are a type of feature representation that
captures the semantic meaning of words in a high-dimensional space.
4. Neural Networks: Deep learning models, such as neural networks, have shown
promising results in NLP tasks, such as language modeling, sentiment analysis, and
machine translation.
5. Evaluation Metrics: It is important to use appropriate evaluation metrics for NLP
tasks, such as accuracy, precision, recall, F1 score, and perplexity.

Here are some important points to keep in mind when it comes to Natural
Language Processing:

1. NLP is a subfield of computer science and artificial intelligence that deals with the
interaction between computers and human languages.
2. The primary goal of NLP is to enable computers to understand, interpret, and
generate natural language, the way humans do.
3. NLP involves a variety of techniques, including computational linguistics, machine
learning, and statistical modeling.
4. NLP is used in a wide range of industries, including finance, healthcare, education,
and entertainment.
5. Some of the main applications of NLP include language translation, speech
recognition, sentiment analysis, text classification, and information retrieval.
6. NLP is a rapidly evolving field that is driving new advances in computer science and
artificial intelligence.
7. NLP has the potential to transform the way we interact with technology in our daily
lives.

Last Updated : 21 Apr, 2023 46

Similar Reads
Natural Language Processing - ML | Natural Language
Overview Processing using Deep
Learning

https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 10/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

Translation and Natural Word Sense Disambiguation in


Language Processing using Natural Language Processing
Google Cloud
Natural Language Processing: Syntax Tree - Natural Language
Moving Beyond Zeros and Ones Processing

Difference between Text Mining Ethical Considerations in


and Natural Language Natural Language Processing:
Processing Bias, Fairness, and Privacy

Natural Language Processing The Future of Natural Language


(NLP) Pipeline Processing: Trends and
Innovations

Related Tutorials
Computer Vision Tutorial Pandas AI: The Generative AI
Python Library

Top Computer Vision Projects Deep Learning Tutorial


(2023)

Top 100+ Machine Learning


Projects for 2023 [with Source
Code]

Previous Next

Article Contributed By :
Jaydeep1998
J Jaydeep1998

Follow

Vote for difficulty


Current difficulty : Easy

Easy Normal Medium Hard Expert

https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 11/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

Improved By : avinashrayz28, avinashrat55252, snehalmahasagar, chinmaya121221

Article Tags : Machine Learning

Practice Tags : Machine Learning

Improve Article Report Issue

A-143, 9th Floor, Sovereign Corporate


Tower, Sector-136, Noida, Uttar Pradesh -
201305
[email protected]

Company Explore
About Us Job-A-Thon Hiring Challenge
Legal Hack-A-Thon
Careers GfG Weekly Contest

https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 12/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

In Media Offline Classes (Delhi/NCR)


Contact Us DSA in JAVA/C++
Advertise with us Master System Design
Master CP

Languages DSA Concepts


Python Data Structures
Java Arrays
C++ Strings
PHP Linked List
GoLang Algorithms
SQL Searching
R Language Sorting
Android Tutorial Mathematical
Dynamic Programming

DSA Roadmaps Web Development


DSA for Beginners HTML
Basic DSA Coding Problems CSS
Complete Roadmap To Learn DSA JavaScript
DSA for FrontEnd Developers Bootstrap
DSA with JavaScript ReactJS
Top 100 DSA Interview Problems AngularJS
All Cheat Sheets NodeJS
DSA Roadmap by Sandeep Jain Express.js
Lodash

Computer Science Python


GATE CS Notes Python Programming Examples
Operating Systems Django Tutorial
Computer Network Python Projects
Database Management System Python Tkinter

https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 13/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

Software Engineering OpenCV Python Tutorial


Digital Logic Design Python Interview Question
Engineering Maths

Data Science & ML DevOps


Data Science With Python Git
Data Science For Beginner AWS
Machine Learning Tutorial Docker
Maths For Machine Learning Kubernetes
Pandas Tutorial Azure
NumPy Tutorial GCP
NLP Tutorial
Deep Learning Tutorial

Competitive Programming System Design


Top DSA for CP What is System Design
Top 50 Tree Problems Monolithic and Distributed SD
Top 50 Graph Problems Scalability in SD
Top 50 Array Problems Databases in SD
Top 50 String Problems High Level Design or HLD
Top 50 DP Problems Low Level Design or LLD
Top 15 Websites for CP Top SD Interview Questions

Interview Corner GfG School


Company Wise Preparation CBSE Notes for Class 8
Preparation for SDE CBSE Notes for Class 9
Experienced Interviews CBSE Notes for Class 10
Internship Interviews CBSE Notes for Class 11
Competitive Programming CBSE Notes for Class 12
Aptitude Preparation English Grammar

Commerce UPSC
Accountancy Polity Notes
https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 14/15
8/30/23, 10:37 AM Introduction to Natural Language Processing - GeeksforGeeks

Business Studies Geography Notes


Economics History Notes
Management Science and Technology Notes
Income Tax Economics Notes
Finance Important Topics in Ethics
Statistics for Economics UPSC Previous Year Papers

SSC/ BANKING Write & Earn


SSC CGL Syllabus Write an Article
SBI PO Syllabus Improve an Article
SBI Clerk Syllabus Pick Topics to Write
IBPS PO Syllabus Write Interview Experience
IBPS Clerk Syllabus Internships
Aptitude Questions
SSC CGL Practice Papers

@geeksforgeeks , Some rights reserved

https://www.geeksforgeeks.org/introduction-to-natural-language-processing/ 15/15

You might also like