Unit-3 Notes Part-1
Unit-3 Notes Part-1
MODULE III
Understanding Natural Languages
B.Tech.(CSE), VII
Artificial Intelligence (CSE 401)
2
3
4
Processing information
• Any expression carries huge amounts of
information.
• Any type of information can be interpreted.
• Predicting human behaviour.
5
NLP Pipeline
(Real-Time Classification of Airline Twitter Data)
6
Tokenization
• Segmentation of running text into
sentences and words.
• Cutting a text into pieces called tokens.
• Removing certain characters, such as
punctuation.
7
Stop Words Removal
• Removing common language articles,
pronouns and prepositions such as “and”,
“the” or “to” in English.
• Adopting pre-defined stop words.
8
Stemming
• Slicing the end or the beginning of words.
• Intent of removing affixes.
9
Lemmatization
• Reduction of a word to its base form.
• Grouping together different forms of the
same word.
10
Traditional algorithms
• Bag of Words
• Allows counting all words in a piece of
text.
• Creates a occurrence matrix.
• Occurrences are used as features for
classifier training.
11
Bag of words
• Amitians are flowing out like endless rain
into a paper cup,
• They slither while they pass, they slip
away across the hurdles.
12
Topic Modeling
• Uncovering hidden structures in sets of
texts or documents.
• Groups texts to discover latent topics.
• Assumes each document consists of a
mixture of topics and that each topic
consists of a set of words.
13
Topic Modeling
(Example)
14
Parsing
• Breaking down a given sentence into its
grammatical constituents.
• Example:
• “Who won the cricket worldcup in 2019?”
• “The swift black cat jumps over the wall”
15
Part-of-speech (POS) tagging
16
Constituency parsing
• Need to identify and define commonly
seen grammatical patterns.
• Divide words into groups, called
constituents, based on their grammatical
role in the sentence.
• Example:
• ‘Amitian — read — an article on Syntactic
Analysis’
17
Dependency Parsing
• Dependencies are established between
words themselves.
• Example:
• ‘Amitians attend classes’
18
Co-reference resolution
• Coreference resolution is the task of
finding all expressions that refer to the
same entity in a text.
Example:Two entities as ‘Michael Cohen’
and ‘Mr. Trump’
19
Word sense
disambiguation
• NLP involves resolving different kinds of
ambiguity.
• A word can take different meanings
making it ambiguous to understand.
• Word sense disambiguation (WSD) means
selecting the correct word sense for a
particular word.
20
Word sense
disambiguation
• Example:
• The word “bank”. It can refer to a financial
institution or the land alongside a river.
• These different meanings are called word
senses.
• Context can be used effectively to perform
WSD.
21
Named entity
recognition
• Identification of named entities such as
persons, locations, organisations which
are denoted by proper nouns.
• Example:
• “Michael Jordan is a professor at
Berkeley.”
22
Context free grammars
• It is the grammar that consists rules with a
single symbol on the left-hand side of the
rewrite rules. Let us create grammar to
parse a sentence
• “The bird pecks the grains”
23
Context free grammars
24
Context free grammars
• The parse tree breaks down the sentence
into structured parts so that the computer
can easily understand and process it.
• In order for the parsing algorithm to
construct this parse tree, a set of rewrite
rules, which describe what tree structures
are legal, need to be constructed.
25
Context free grammars
• These rules say that a certain symbol may
be expanded in the tree by a sequence of
other symbols.
• According to first order logic rule, if there
are two strings Noun Phrase (NP) and
Verb Phrase (VP), then the string
combined by NP followed by VP is a
sentence.
26
Context free grammars
• The rewrite rules for the sentence are as
follows −
27
Context free grammars
• The parse tree can be created as shown −
28
Context free grammars
• Now consider the above rewrite rules.
Since V can be replaced by both, "peck" or
"pecks", sentences such as "The bird peck
the grains" can be wrongly permitted.
i. e. the subject-verb agreement error is
approved as correct.
29
Context free grammars
• Merit − The simplest style of grammar,
therefore widely used one.
• Demerits −
They are not highly precise. For example,
“The grains peck the bird”, is a
syntactically correct according to parser,
but even if it makes no sense, parser
takes it as a correct sentence.
30
Context free grammars
• Demerits
To bring out high precision, multiple sets of
grammar need to be prepared.
It may require a completely different sets
of rules for parsing singular and plural
variations, passive sentences, etc., which
can lead to creation of huge set of rules
that are unmanageable.
31
Transformational
Grammar
• These are the grammars in which the
sentence can be represented structurally
into two stages.
• Obtaining different structures from
sentences having the same meaning is
undesirable in language understanding
systems.
• Sentences with the same meaning should
always correspond to the same internal
knowledge structures. 32
Transformational
Grammar
• In one stage the basic structure of the
sentence is analyzed to determine the
grammatical constituent parts and in the
second stage just the vice versa of the first
one.
• This reveals the surface structure of the
sentence, the way the sentence is used in
speech or in writing.
33
Transformational Grammar
34
Transformational Grammar
35
• Both of the above sentences are two
different sentences but they have same
meaning.
• Thus it is an example of a transformational
grammar.
• These grammars were never widely used in
computational models of natural language.
• The applications of this grammar are
changing of voice (Active to Passive and
Passive to Active) change a question to
declarative form etc. 36
TRANSITION NETWORK
38
39
• The transition from N1 to N2 will be made if
an article is the first input symbol.
• If successful, state N2 is entered.
• The transition from N2 to N3 can be made if
a noun is found next.
• If successful, state N3 is entered.
• The transition from N3 to N4 can be made if
an auxiliary is found and so on.
40
• Suppose consider a sentence “A boy is
eating a banana”.
• So if the sentence is parsed in the above
transition network then, first ‘A’ is an
article.
• So successful transition to the node N1 to
N2. Then boy is a noun (so N2 to N3), “is” is
an auxiliary (N5 to N6) and finally “banana”
is a noun (N 6 to N7) is done successfully.
• So the above sentence is successfully
41
TYPES OF TRANSITION
NETWORK
• There are generally two types of transition
networks like
1.Recursive Transition networks (RTN)
2.Augmented Transition networks (ATN)
42
Recursive Transition Networks (RTN)
43
• It permits arc labels to refer to other
networks and they in turn may refer back
to the referring network rather than just
permitting word categories.
• It is a modified version of transition
network.
• It allows arc labels that refer to other
networks rather than word category.
44
Augmented Transition Network
(ATN)
• An ATN is a modified transition network.
• It is an extension of RTN.
• The ATN uses a topdown parsing
procedure to gather various types of
information to be later used for
understanding system.
• It produces the data structure suitable for
further processing and capable of storing
semantic details.
45
• An augmented transition network (ATN) is
a recursive transition network that can
perform tests and take actions during arc
transitions.
• An ATN uses a set of registers to store
information.
• A set of actions is defined for each arc and
the actions can look at and modify the
registers.
• An arc may have a test associated with it.
46
• The arc is traversed (and its action) is
taken only if the test succeeds.
• When a lexical arc is traversed, it is put in
a special variable (*) that keeps track of
the current word.
• The ATN was first used in LUNAR system.
• In ATN, the arc can have a further arbitrary
test and an arbitrary action.
47
The structure of ATN
48