0% found this document useful (0 votes)
13 views

NLP

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

NLP

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Natural Language Processing

 Natural Language Processing (NLP) is a field that


combines computer science, linguistics, and machine
learning to study how computers and humans
communicate in natural language.
 The goal of NLP is for computers to be able to interpret
and generate human language. This not only improves
the efficiency of work done by humans but also helps in interacting with the machine.
 NLP bridges the gap of interaction between humans and electronic devices.

Time line of NLP


1. 1940-The Natural Languages Processing started in the year 1940s.
2. 1948- In the Year 1948, the first recognizable NLP application was introduced in Birkbeck
College, London.
3. 1950s-In the Year 1950s, there was a conflicting view between linguistics and computer
science. Now, Chomsky developed his first book syntactic structures and claimed that language
is generative in nature.
4. (1960-1980)-Flavored with Artificial Intelligence (AI). Some of the key developments are
 Augmented Transition Networks (ATN) - is a finite state machine that is capable of
recognizing regular languages.
 Case Grammar uses languages such as English to express the relationship between nouns
and verbs by using the preposition.
o SHRDLU - helps users to communicate with the computer and moving objects. It
can handle instructions such as "pick up the green boll" and also answer the
questions like "What is inside the black box"
o LUNAR is the classic example of a Natural Language database interface system
that is used ATNs and Woods' Procedural Semantics.
5. 1980- Current: Till the year 1980, natural language processing systems were based on
complex sets of hand-written rules. After 1980, NLP introduced machine learning algorithms for
language processing.
Components of NLP
1. Natural Language Understanding (NLU)
 Natural Language understanding (NLU) helps the
machine to understand and analyses human
language by extracting the metadata from content
such as concepts, entities, keywords, emotion,
relations, and semantic roles.
 NLU mainly used in Business applications to
understand the customer's problem in both spoken
and written language.
 NLU involves the following tasks-
 It is used to map the given input into useful representation.
 It is used to analyze different aspects of the language.
2. Natural Language Generation (NLG)
Natural Language Generation (NLG) acts as a translator that converts the computerized data into
natural language representation. It mainly involves Text planning, Sentence planning, and Text
Realization.
NLU NLG
NLU is the process of reading and interpreting NLG is the process of writing or generating
language language.
It produces non-linguistic outputs from natural It produces constructing natural language
language inputs outputs from non-linguistic inputs.

Ambiguity in NLP
NLU is naturally harder than NLG tasks. There is lot of ambiguity while learning or trying to
interpret a language. NLP has the following types of ambiguities:
1. Lexical Ambiguity: The ambiguity of a single word is called lexical ambiguity. For example,
treating the word silver as a noun, an adjective, or a verb.
2. Syntactic Ambiguity: This kind of ambiguity occurs when a sentence is parsed in different
ways. For example, the sentence "The man saw the girl with the telescope". It is ambiguous
whether the man saw the girl carrying a telescope or he saw her through his telescope.
3. Semantic Ambiguity: This kind of ambiguity occurs when the meaning of the words
themselves can be misinterpreted. In other words, semantic ambiguity happens when a sentence
contains an ambiguous word or phrase. For example, the sentence "The car hit the pole while it
was moving" is having semantic ambiguity because the interpretations can be "The car, while
moving, hit the pole" and "The car hit the pole while the pole was moving.
4. Anaphoric Ambiguity: This kind of ambiguity arises due to the use of anaphora entities in
discourse. For example, the horse ran up the hill. It was very steep. It soon got tired. Here, the
anaphoric reference of "it" in two situations cause ambiguity.
5. Pragmatic ambiguity: Such kind of ambiguity refers to the situation where the contract fa
phrase gives it multiple interpretations. In simple words, we can say that pragmatic ambi arises
when the statement is not specific. For example, the sentence "I like you too" can ha multiple
interpretations like I like you (just like you like me), I like you (just like someone dose).

Steps to build an NLP pipeline


Step1: Sentence Segmentation
Sentence Segment is the first step for building the NLP pipeline. It breaks the paragraph im
separate sentences.
Example: Consider the following paragraph -
Independence Day is one of the important festivals for every Indian citizen. It is celebrated on
the 15th of August each year ever since India got independence from the British rule. The day
celebrates independence in the true sense.
Sentence Segment produces the following result:
1. "Independence Day is one of the important festivals for every Indian citizen."
2. "It is celebrated on the 15th of August each year ever since India got independence from the
British rule."
3. "This day celebrates independence in the true sense."
Step2: Word Tokenization:
Word Tokenizer is used to break the sentence into separate words or tokens.
Example:
JavaTrainSoft offers Corporate Training, Summer Training, Online Training, and Winter
Training.
Word Tokenizer generates the following result:
"JavaTrainSoft", "offers", "Corporate", "Training", "Summer", "Training", "Online", "Training"
"and", "Winter", "Training",""
Step3: Stemming:
Stemming is used to normalize words into its base form or root form. For example, celebrates.
celebrated and celebrating, all these words are originated with a single root word "celebrate" The
big problem with stemming is that sometimes it produces the root word which may not have any
meaning.
For Example, intelligence, intelligent, and intelligently, all these words are originated with a
single root word "intelligen." In English, the word "intelligen" do not have any meaning.
Step 4: Lemmatization:
Lemmatization is quite similar to the Stamming. It is used to group different inflected forms of
the word, called Lemma. The main difference between Stemming and lemmatization is that it
produces the root word, which has a meaning.
Example:
In lemmatization, the words intelligence, intelligent, and intelligently has a root word intelligent,
which has a meaning
Step 5: Identifying Stop Words:
In English, there are a lot of words that appear very frequently like "is", "and", "the", and "a".
NLP pipelines will flag these words as stop words. Stop words might be filtered out before doing
any statistical analysis.
Step 6: Dependency Parsing:
Dependency Parsing is used to find that how all the words in the sentence are related to each
other.
Step 7: POS tags:
POS stands for parts of speech, which includes Noun, verb, adverb, and Adjective. It indicates
that how a word functions with its meaning as well as grammatically within the sentences. A
word has one or more parts of speech based on the context in which it is used.
Example:
"Google" something on the Internet.
In the above example, Google is used as a verb, although it is a proper noun.
Step 8: Named Entity Recognition (NER)
Named Entity Recognition (NER) is the process of detecting the named entity such as person
name, movie name, organization name, or location.
Example:
Steve Jobs introduced iPhone at the Macworld Conference in San Francisco, California.
Step 9: Chunking
Chunking is used to collect the individual piece of information and grouping them into bigger
pieces of sentences.

Advantages of NLP
1. Improved Human-Computer Interaction: A specific advantage of NLP is that it enables
computers to understand and process human language. This makes it easier for people to interact
with their computers and vice versa. NLP results in more efficient and effective human-computer
interaction and communication.
2. Time-Efficiency and Cost-Effectiveness: Language-based task automation is one of the
applications of NLP. Examples of these tasks include text and speech processing, morphological
and syntactic analyses, and lexical and relational semantics. This can result in time and cost
savings for individuals and organizations
3. Data Generation and Content Creation: Another advantage of natural language processing is
that it aids in the generation of data and even in the creation of new and original content such as
texts and images based on text-based training datasets and using natural language as command
input.
4. Professional and Business Applications: NLP can also benefit individuals. It can improve
and Optimize sales and after-sales services through chatbots. It can aid professionals with their
tasks at demonstrated through generative Al products such as ChatGPT from OpenAl or writing
tools sach at Grammarly
5. Advances Artificial Intelligence: Remember that NLP is one of the main goals and fields of
artificial Intelligence. Developments in natural language processing mark further developments
in Al. It is abo important to note that advancing large language models are critical to advancing
artificial intelligence applications.
Disadvantages of NLP
1. Possible Issues with Context and Meanings: One of the more specific limitations of NLP is
its limited understanding of context and meanings. A particular model may not always
understand the nuances of human language. It may not be able to identify sarcasm and idioms.
This can lead to errors, inaccuracies, or irrelevance.
2. Biased Results from Biased Training Data: It is important to note that the quality of an NLP
model depends on its training data. A dataset containing biases or inaccuracies would result in
this particular model producing biased or inaccurate results. This can further result in
controversial outcomes such as prejudicial claims.
3. Issues with Rare or Out-of-Vocabulary Words: NLP models and their applications may
struggle to process certain words such as jargons and slangs that are not included in its training
data. This can lead to unreliable outcomes of specific NLP tasks such as text classification and
named entity recognition.
4. Technical and Computational Requirements: Another disadvantage of natural language
processing is that its high-level applications depend on large language models which require high
computational power. It is impossible for an individual or small organization with limited
resources to deploy n-house NLP capabilities.
5. Possible Ethical Concerns and Legal Issues: Developing an NLP model requires using data.
Some of these data are obtained from the personal or private data of individuals and
organizations. The deployment of NLP applications raises concerns over data ownership, privacy
rights, and intellectual property infringement, among others.

Applications of NLP
1. Question Answering: NLP can be seen in action by using Google Search or Siri Services. A
major use of NLP is to make search engines understand the meaning of what we are asking and
generate natural language in return to give us the answers.
2. Spam Filters: One of the most irritating things about email is spam. Gmail uses natural
language processing (NLP) to discern which emails are legitimate and which are spam. These
spam filters look at the text in all the emails you receive and try to figure out what it means to
see if it's spam or not.
3. Algorithmic Trading: Algorithmic trading is used for predicting stock market conditions.
Using NLP, this technology examines news headlines about companies and stocks and attempts
to comprehend their meaning in order to determine if you should buy, sell, or hold certain stocks
4. Summarizing Information: On the internet, there is a lot of information, and a lot of it comes
in the form of long documents or articles. NLP is used to decipher the meaning of the data and
then provides shorter summaries of the data so that humans can comprehend it more quickly.

Five Phases of NLP


The process of NLP can be divided into five distinct phases: Lexical Analysis, Syntactic
Analysis, Semantic Analysis, Discourse Integration, and Pragmatic Analysis.

Lexical and Morphological Analysis


 The lexical phase in Natural Language Processing (NLP) involves scanning text and breaking
it down into smaller units such as paragraphs, sentences, and words. This process, known
as tokenization, converts raw text into manageable units called tokens or lexemes.
Tokenization is essential for understanding and processing text at the word level.
 Morphological analysis is another critical phase in NLP, focusing on identifying morphemes,
the smallest units of a word that carry meaning and cannot be further divided. Understanding
morphemes is vital for grasping the structure of words and their relationships.
Steps to design morphological parser
Lexicon
Morphotactic
Orthographic Rules

Lexicon Morphotactic Orthographic rules


 Stores basic information Set of Rules to make decision Set of rules used to
Decides a word appears before, after decide spelling changes.
about a word
or in between other words
* For ex:
 Word is stem or affix Lady + s-→ Ladys
 If stem then, whether a (invalid meaning)

verb stem or Noun stem.


Lady+s ->Ladies
 If affix then, whether it is (y & s changes to ies to

use prefix, infix or suffix give a valid word -


change in spelling)

Kinfes -> Kinves

Thus Morphology means study of word/making of word


Some words has their own meaning for eg:
Camera Board Pen
Some words are there by which when divided into different words, those new words have their
own meaning for ex:

Some words are which does not have their own meaning but when they are combined with
other words, they become meaningful for ex:
Thus, the detailed study of making of word is called as Morphological Analysis
Some more examples:
Input Morphological Explanation
Cats Cat + N + PL Root word(cat) + Noun + Plural
Geese Goose + N + PL Root word(goose) + Noun + Plural
Caught Catch + V + PAST Root word(catch) + Verb + Past speech
Merging Merge + V + Pre_part Root word(merge) + Verg + Present speech

Syntax Analysis or Parsing


Refer class notes
Semantic Analysis
1. Semantic analysis
It is the process of looking for meaning in a statement. It concentrates mostly on the literal
meaning of words, phrases, and sentences is the main focus.
It also deals with putting words together to form sentences. It extracts the text's exact meaning
or dictionary definition.
The meaning of the text is examined. It is accomplished by mapping the task domain's
syntactic structures and objects.
For example: "The guava ate an apple." The line is syntactically valid, yet it is illogical
because guavas cannot eat.
2. Parts of Semantic Analysis:
Semantic Analysis of Natural Language can be classified into two broad parts:
* Lexical Semantic Analysis: Lexical Semantic Analysis involves understanding the meaning
of each word of the text individually. It basically refers to fetching the dictionary meaning that
a word in the text is deputed to carry.
* Compositional Semantics Analysis: Although knowing the meaning of each word of the text
is essential, it is not sufficient to completely understand the meaning of the text. For example,
consider the following two sentences:
Sentence 1: Students love GeeksforGeeks.
Sentence 2: GeeksforGeeks loves Students.
Although both these sentences 1 and 2 use the same set of root words (student, love,
geeksforgeeks), they convey entirely different meanings. Hence, under Compositional
Semantics Analysis, we try to understand how combinations of individual words form the
meaning of the text.
3. Tasks involved in Semantic Analysis:
In order to understand the meaning of a sentence, the following are the major processes
involved in Semantic Analysis:
1. Word Sense Disambiguation: In Natural Language, the meaning of a word may vary as per
its usage in sentences and the context of the text. Word Sense Disambiguation involves
interpreting the meaning of a word based upon the context of its occurrence in a text.
For example, the word 'Bark' may mean 'the sound made by a dog' or 'the outermost layer of a
tree. Likewise, the word 'rock' may mean 'a stone' or 'a genre of music - hence, the accurate
meaning of the word is highly dependent upon its context and usage in the text. Thus, the
ability of a machine to overcome the ambiguity involved in identifying the meaning of a word
based on its usage and context is called Word Sense Disambiguation.
2. Relationship Extraction: Another important task involved in Semantic Analysis is
Relationship Extracting. It involves firstly identifying various entities present in the sentence
and then extracting the relationships between those entities.
For example, consider the following sentence:
Semantic Analysis is a topic of NLP which is explained on the university blog. The entities
involved in this text, along with their relationships, are shown below.
Semantic Analysis is a topic of NLP which is explained on the university blog

4. Elements of Semantic Analysis:


Some of the critical elements of Semantic Analysis that must be scrutinized and taken into
account while processing Natural Language are:
Hyponymy: Hyponymys refers to a term that is an instance of a generic term. They can be
understood by taking class-object as an analogy. For example: "Color' is a hypernymy while
'grey', 'blue', 'red', etc, are its hyponyms.
Homonymy: Homonymy refers to two or more lexical terms with the same spellings but
completely distinct in meaning. For example: "Rose' might mean thnost form of rise of 'a
flower, - same spelling but different meanings; hence, 'rose' is a homonymy.
Synonymy: When two or more lexical terms that might be spelt distinctly have the same or
similar meaning, they are called Synonymy. For example: (Job, Occupation), (Large, Big),
(Stop, Halt).
Antonymy: Antonymy refers to a pair of lexical terms that have contrasting meanings - they are
symmetric to a semantic axis. For example: (Day, Night), (Hot, Cold), (Large, Small).
Polysemy: Polysemy refers to lexical terms that have the same spelling but multiple closely
related meanings. It differs from homonymy because the meanings of the terms need not be
closely related in the case of homonymy. For example: 'man' may mean 'the human species' or
'a male human' or 'an adult male human' - since all these different meanings bear a close
association, the lexical term 'man' is a polysemy.
Meronomy: Meronomy refers to a relationship wherein one lexical term is a constituent of
some larger entity. For example: 'Wheel' is a meronym of 'Automobile
5. Meaning Representation:
While, as humans, it is pretty simple for us to understand the meaning of textual information, it
is not so in the case of machines. Thus, machines tend to represent the text in specific formats
in order to interpret its meaning. This formal structure that is used to understand the meaning
of a text is called meaning representation.
In order to accomplish Meaning Representation in Semantic Analysis, it is vital to understand
the building units of such representations. The basic units of semantic systems are explained
below:
* Entity: An entity refers to a particular unit or individual in specific such as a person or a
location. For example BangaloreUniversity, Delhi, etc.
* Concept: A Concept may be understood as a generalization of entities. It refers to a broad
class of individual units. For example Learning Portals, City, Students.
* Relations: Relations help establish relationships between various entities and concepts. For
example: 'Bangalorelearninghub is a Learning Portal', 'Delhi is a City, etc.
* Predicate: Predicates represent the verb structures of the sentences. In Meaning
Representation, we employ these basic units to represent textual information. Approaches to
Meaning Representations:
Now that we are familiar with the basic understanding of Meaning Representations, here are
some of the most popular approaches to meaning representation:
First-order predicate logic (FOPL), Semantic Nets, Frames, Conceptual dependency (CD),
Rule-based architecture, Case Grammar & Conceptual Graphs
6. Semantic Analysis Techniques:
Based upon the end goal one is trying to accomplish, Semantic Analysis can be used in various
ways. Two of the most common Semantic Analysis techniques are:
* Text Classification: In-Text Classification, our aim is to label the text according to the
insights we intend to gain from the textual data.
* Text Extraction: In-Text Extraction, we aim at obtaining specific information from our text.
7. Significance of Semantics Analysis:
Semantics Analysis is a crucial part of Natural Language Processing (NLP). In the ever
expanding era of textual information, it is important for organizations to draw insights from
such data to fuel businesses. Semantic Analysis helps machines interpret the meaning of texts
and extract useful information, thus providing invaluable data while reducing manual efforts.

Discourse Integration
Discourse Integration is the fourth phase of Natural Language Processing (NLP). This phase
deals with comprehending the relationship between the current sentence and earlier sentences
or the larger context. Discourse integration is crucial for contextualizing text and
understanding the overall message conveyed.
Role of Discourse Integration
Discourse integration examines how words, phrases, and sentences relate to each other within a
larger context. It assesses the impact a word or sentence has on the structure of a text and how
the combination of sentences affects the overall meaning. This phase helps in understanding
implicit references and the flow of information across sentences.
Importance of Contextualization
In conversations and texts, words and sentences often depend on preceding or following
sentences for their meaning. Understanding the context behind these words and sentences is
essential to accurately interpret their meaning.
Example of Discourse Integration
Consider the following examples:
 Contextual Reference: "This is unfair!"
To understand what "this" refers to, we need to examine the preceding or following sentences.
Without context, the statement's meaning remains unclear.
 Anaphora Resolution: "Taylor went to the store to buy some groceries. She realized she
forgot her wallet."
In this example, the pronoun "she" refers back to "Taylor" in the first sentence. Understanding
that "Taylor" is the antecedent of "she" is crucial for grasping the sentence's meaning.
Application of Discourse Integration
Discourse integration is vital for various NLP applications, such as machine translation,
sentiment analysis, and conversational agents. By understanding the relationships and context
within texts, NLP systems can provide more accurate and coherent responses.
Pragmatic Analysis
Pragmatic Analysis is the fifth and final phase of Natural Language Processing (NLP),
focusing on interpreting the inferred meaning of a text beyond its literal content. Human
language is often complex and layered with underlying assumptions, implications, and
intentions that go beyond straightforward interpretation. This phase aims to grasp these deeper
meanings in communication.
Role of Pragmatic Analysis
Pragmatic analysis goes beyond the literal meanings examined in semantic analysis, aiming to
understand what the writer or speaker truly intends to convey. In natural language, words and
phrases can carry different meanings depending on context, tone, and the situation in which
they are used.
Importance of Understanding Intentions
In human communication, people often do not say exactly what they mean. For instance, the
word "Hello" can have various interpretations depending on the tone and context in which it is
spoken. It could be a simple greeting, an expression of surprise, or even a signal of anger.
Thus, understanding the intended meaning behind words and sentences is crucial.
Examples of Pragmatic Analysis
Consider the following examples:
 Contextual Greeting: "Hello! What time is it?"
o "Hello!" is more than just a greeting; it serves to establish contact.
o "What time is it?" might be a straightforward request for the current time, but it
could also imply concern about being late.
 Figurative Expression: "I'm falling for you."
o The word "falling" literally means collapsing, but in this context, it means the
speaker is expressing love for someone.
Application of Pragmatic Analysis
Pragmatic analysis is essential for applications like sentiment analysis, conversational AI, and
advanced dialogue systems. By interpreting the deeper, inferred meanings of texts, NLP
systems can understand human emotions, intentions, and subtleties in communication, leading
to more accurate and human-like interactions.

You might also like