0% found this document useful (0 votes)
7 views

Unit-3 Notes Part-1

The document discusses various natural language processing (NLP) techniques used to understand human language, including tokenization, stop words removal, stemming, lemmatization, bag-of-words, topic modeling, parsing, part-of-speech tagging, constituency parsing, dependency parsing, co-reference resolution, word sense disambiguation, named entity recognition, context free grammars, transformational grammar, and transition networks. Context free grammars and transformational grammars are types of grammars used in NLP to represent language structures, while transition networks represent language using directed graphs.

Uploaded by

Toxic Lucien
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Unit-3 Notes Part-1

The document discusses various natural language processing (NLP) techniques used to understand human language, including tokenization, stop words removal, stemming, lemmatization, bag-of-words, topic modeling, parsing, part-of-speech tagging, constituency parsing, dependency parsing, co-reference resolution, word sense disambiguation, named entity recognition, context free grammars, transformational grammar, and transition networks. Context free grammars and transformational grammars are types of grammars used in NLP to represent language structures, while transition networks represent language using directed graphs.

Uploaded by

Toxic Lucien
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Amity School of Engineering and Technology

MODULE III
Understanding Natural Languages
B.Tech.(CSE), VII
Artificial Intelligence (CSE 401)
2
3
4
Processing information
• Any expression carries huge amounts of
information.
• Any type of information can be interpreted.
• Predicting human behaviour.

5
NLP Pipeline
(Real-Time Classification of Airline Twitter Data)

6
Tokenization
• Segmentation of running text into
sentences and words.
• Cutting a text into pieces called tokens.
• Removing certain characters, such as
punctuation.

7
Stop Words Removal
• Removing common language articles,
pronouns and prepositions such as “and”,
“the” or “to” in English.
• Adopting pre-defined stop words.

8
Stemming
• Slicing the end or the beginning of words.
• Intent of removing affixes.

9
Lemmatization
• Reduction of a word to its base form.
• Grouping together different forms of the
same word.

10
Traditional algorithms
• Bag of Words
• Allows counting all words in a piece of
text.
• Creates a occurrence matrix.
• Occurrences are used as features for
classifier training.

11
Bag of words
• Amitians are flowing out like endless rain
into a paper cup,
• They slither while they pass, they slip
away across the hurdles.

12
Topic Modeling
• Uncovering hidden structures in sets of
texts or documents.
• Groups texts to discover latent topics.
• Assumes each document consists of a
mixture of topics and that each topic
consists of a set of words.

13
Topic Modeling
(Example)

14
Parsing
• Breaking down a given sentence into its
grammatical constituents.
• Example:
• “Who won the cricket worldcup in 2019?”
• “The swift black cat jumps over the wall”

15
Part-of-speech (POS) tagging

• According to the role of a word in a


sentence, it can be tagged as a noun,
verb, adjective, adverb, preposition, etc.
• Correct tags such as nouns, verbs,
adjectives, etc. should be assigned.

16
Constituency parsing
• Need to identify and define commonly
seen grammatical patterns.
• Divide words into groups, called
constituents, based on their grammatical
role in the sentence.
• Example:
• ‘Amitian — read — an article on Syntactic
Analysis’

17
Dependency Parsing
• Dependencies are established between
words themselves.
• Example:
• ‘Amitians attend classes’

18
Co-reference resolution
• Coreference resolution is the task of
finding all expressions that refer to the
same entity in a text.
Example:Two entities as ‘Michael Cohen’
and ‘Mr. Trump’

19
Word sense
disambiguation
• NLP involves resolving different kinds of
ambiguity.
• A word can take different meanings
making it ambiguous to understand.
• Word sense disambiguation (WSD) means
selecting the correct word sense for a
particular word.

20
Word sense
disambiguation
• Example:
• The word “bank”. It can refer to a financial
institution or the land alongside a river.
• These different meanings are called word
senses.
• Context can be used effectively to perform
WSD.

21
Named entity
recognition
• Identification of named entities such as
persons, locations, organisations which
are denoted by proper nouns.
• Example:
• “Michael Jordan is a professor at
Berkeley.”

22
Context free grammars
• It is the grammar that consists rules with a
single symbol on the left-hand side of the
rewrite rules. Let us create grammar to
parse a sentence
• “The bird pecks the grains”

23
Context free grammars

24
Context free grammars
• The parse tree breaks down the sentence
into structured parts so that the computer
can easily understand and process it.
• In order for the parsing algorithm to
construct this parse tree, a set of rewrite
rules, which describe what tree structures
are legal, need to be constructed.

25
Context free grammars
• These rules say that a certain symbol may
be expanded in the tree by a sequence of
other symbols.
• According to first order logic rule, if there
are two strings Noun Phrase (NP) and
Verb Phrase (VP), then the string
combined by NP followed by VP is a
sentence.

26
Context free grammars
• The rewrite rules for the sentence are as
follows −

27
Context free grammars
• The parse tree can be created as shown −

28
Context free grammars
• Now consider the above rewrite rules.
Since V can be replaced by both, "peck" or
"pecks", sentences such as "The bird peck
the grains" can be wrongly permitted.
i. e. the subject-verb agreement error is
approved as correct.

29
Context free grammars
• Merit − The simplest style of grammar,
therefore widely used one.
• Demerits −
They are not highly precise. For example,
“The grains peck the bird”, is a
syntactically correct according to parser,
but even if it makes no sense, parser
takes it as a correct sentence.

30
Context free grammars
• Demerits
 To bring out high precision, multiple sets of
grammar need to be prepared.
 It may require a completely different sets
of rules for parsing singular and plural
variations, passive sentences, etc., which
can lead to creation of huge set of rules
that are unmanageable.

31
Transformational
Grammar
• These are the grammars in which the
sentence can be represented structurally
into two stages.
• Obtaining different structures from
sentences having the same meaning is
undesirable in language understanding
systems.
• Sentences with the same meaning should
always correspond to the same internal
knowledge structures. 32
Transformational
Grammar
• In one stage the basic structure of the
sentence is analyzed to determine the
grammatical constituent parts and in the
second stage just the vice versa of the first
one.
• This reveals the surface structure of the
sentence, the way the sentence is used in
speech or in writing.

33
Transformational Grammar

• Alternatively, we can also say that


application of the transformation rules can
produce a change from passive voice to
active voice and vice versa.

34
Transformational Grammar

35
• Both of the above sentences are two
different sentences but they have same
meaning.
• Thus it is an example of a transformational
grammar.
• These grammars were never widely used in
computational models of natural language.
• The applications of this grammar are
changing of voice (Active to Passive and
Passive to Active) change a question to
declarative form etc. 36
TRANSITION NETWORK

• It is a method to represent the natural


languages. It is based on applications of
directed graphs and finite state automata.
• The transition network can be constructed
by the help of some inputs, states and
outputs.
• A transition network may consist of some
states or nodes, some labeled arcs from
one state to the next state through which it
will move. 37
• The arc represents the rule or some
conditions upon which the transition is
made from one state to another state.
• For example, a transition network is used
to recognize a sentence consisting of an
article, a noun, an auxiliary, a verb, an
article, a noun would be represented by
the transition network as follows.

38
39
• The transition from N1 to N2 will be made if
an article is the first input symbol.
• If successful, state N2 is entered.
• The transition from N2 to N3 can be made if
a noun is found next.
• If successful, state N3 is entered.
• The transition from N3 to N4 can be made if
an auxiliary is found and so on.
40
• Suppose consider a sentence “A boy is
eating a banana”.
• So if the sentence is parsed in the above
transition network then, first ‘A’ is an
article.
• So successful transition to the node N1 to
N2. Then boy is a noun (so N2 to N3), “is” is
an auxiliary (N5 to N6) and finally “banana”
is a noun (N 6 to N7) is done successfully.
• So the above sentence is successfully
41
TYPES OF TRANSITION
NETWORK
• There are generally two types of transition
networks like
1.Recursive Transition networks (RTN)
2.Augmented Transition networks (ATN)

42
Recursive Transition Networks (RTN)

• RTNs are considered as development for


finite state automata with some essential
conditions to take the recursive
complexion for some definitions in
consideration.
• A recursive transition network consists of
nodes (states) and labeled arcs
(transitions).

43
• It permits arc labels to refer to other
networks and they in turn may refer back
to the referring network rather than just
permitting word categories.
• It is a modified version of transition
network.
• It allows arc labels that refer to other
networks rather than word category.

44
Augmented Transition Network
(ATN)
• An ATN is a modified transition network.
• It is an extension of RTN.
• The ATN uses a topdown parsing
procedure to gather various types of
information to be later used for
understanding system.
• It produces the data structure suitable for
further processing and capable of storing
semantic details.
45
• An augmented transition network (ATN) is
a recursive transition network that can
perform tests and take actions during arc
transitions.
• An ATN uses a set of registers to store
information.
• A set of actions is defined for each arc and
the actions can look at and modify the
registers.
• An arc may have a test associated with it.
46
• The arc is traversed (and its action) is
taken only if the test succeeds.
• When a lexical arc is traversed, it is put in
a special variable (*) that keeps track of
the current word.
• The ATN was first used in LUNAR system.
• In ATN, the arc can have a further arbitrary
test and an arbitrary action.

47
The structure of ATN

48

You might also like