0% found this document useful (0 votes)

8 views

Lecture_5_Part_Of_Speech_Tagging

The document provides an overview of Part-of-Speech (POS) tagging, detailing various tagging methods including rule-based, statistical, and transformation-based approaches. It describes the eight parts of speech in English, their classifications, and the significance of tagsets like the Penn Treebank and Brown corpus. Additionally, it discusses the algorithms used for POS tagging, emphasizing the importance of disambiguation rules and the challenges posed by ambiguous words.

Uploaded by

Rasha Elsayed Sakr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Lecture_5_Part_Of_Speech_Tagging

Uploaded by

Rasha Elsayed Sakr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Part-of-Speech Tagging

 Part-of-speech tagging,  Rule-based Part-of-speech Tagging,

- Rule-based tagging, - first stage used a dictionary,
- Statistical model tagging, - second stage used large lists of hand-
- Transformation-based tagging, written disambiguation rules,
 (Mostly) English Word Classes,  HMM Part-of-Speech Tagging,
- Prior probability,
- Closed class, Open class,
- likelihood of tag sequence,
- Noun, proper/ common nouns,
- Computing the Most likely Tag
- Verb, Adjectives, adverbs sequence: An Example,
 Tagsets for English, - Formalizing Hidden Markov Model
- Penn Treebank part-of-speech tags, Taggers,
 Part-of Speech Tagging,  Transformation-based Tagging,
- Tagging, Ambiguous, - Transformation-based learning,
- How TBL Rules are Applied,
- Brown corpus,

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
1. Part-of-Speech Tagging (Some Concepts)
 A description of 8 parts-of-speech as;
- Noun: a word (other than a pronoun) used to identify any of a class of people,
places, or things ( common noun ). e.g.,
- Verb: a word used to describe an action, state, or occurrence, and forming the
main part of the predicate of a sentence, e.g., hear, become, happen.
- Pronoun: a word that can function as a noun phrase used by itself and that
refers either to the participants in the discourse (e.g., I, you, she, it, this )
- Preposition: Prepositions are usually used in front of nouns or pronouns and
they show the relationship between the noun or pronoun and other words in a
sentence. e.g., after, in, to, on, and with.
- Adverb: a word or phrase that modifies the meaning of an adjective, verb, or
other adverb, expressing manner, place, time, or degree
(e.g., gently, here, now, very ).
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
1. Part-of-Speech Tagging (Some Concepts) (Cont…)
 A description of 8 parts-of-speech as;
- conjunction: a word used to connect clauses or sentences or to coordinate
words in the same clause (e.g., and, but, if ).
- participle: a word formed from a verb (e.g., going, gone, being, been ) and
used as an adjective (e.g., working woman, burnt toast ) or a noun (e.g., good
breeding ).
In English; participles are also used to make compound verb forms
(e.g., is going, has been ).
- Article: Articles are words that define a noun as specific or unspecific.
Consider the following examples:
Example- 1: After the long day, the cup of tea tasted particularly good.
Example-2: After a long day, a cup of tea tastes particularly good.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
1. Part-of-Speech Tagging (Some Concepts) (Cont…)
 More recent lists of parts-of-speech (or tagsets) have many more words
classes;
- 45 for the Penn Treebank (Marcus et al., 1993).
- 87 for the Brown corpus (Francis, 1979).
- 146 for the C7 tagset (Garside et al., 1997).

 The significance of Parts-of-speech (POS) or tagsets includes;

- large amount of information that give about a word and its neighbor information.
For Example; tagsets distinguish between possessive pronouns ( my, your, his,
her, its) and personal pronouns (I, you, he, me).
- Knowing whether a word is possessive pronoun or a personal pronoun can
tell us what words are likely to occur in its vicinity.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
1. Part-of-Speech Tagging (Some Concepts) (Cont…)

Part-of-speech tagging (or just tagging for short) is the process tagging of
assigning a part-of-speech or other syntactic class marker to each word in a corpus.
 Because tags are generally also applied to punctuation, tagging requires that the
punctuation marks (period, comma, etc) be separated off of the words.

Computational algorithms for assigning parts-of-speech to words (part-of-speech

tagging) divided into three algorithms:
1- Hand-written rules (rule-based tagging),
2- Statistical methods (HMM tagging and maximum entropy tagging),
3. Transformation-based tagging and memory-based tagging.
Rule-based taggers generally involve a large database of handwritten disambiguation
rules which specify,
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
2. English Word Classes
 Parts-of-speech can be divided into two broad supercategories;
(1) Closed class type:
- Closed class words are also generally function words
- which tend to be very short, occur frequently, and often have structuring uses in grammar.
e.g., of, it, and, or, you.

(2) Open Class type:

- 4 major open classes occur in the languages of the world: nouns, verbs, adjectives, and
adverbs.
(a) Noun : is the name given to the synthetic class
- Grouped into (i) proper nouns and (ii) common nouns.
(i) Proper nouns like Regina, Colorodo, and IBM.

1) Count nouns are those that allow grammatical enumeration; that is;
- they can occur in both the singular and plural
- e.g., ( goat/goats, relationship/relationships) and they can be counted (
one goat, two goat).

2) Mass nouns are used when something is conceptualized as a homogenous

group.
- e.g., works like snow, salt, and water are not counted (i.e., *two snows or *
two water)
2. English Word Classes (Cont…)
(b) Verb : includes most of the words referring to actions and processes.
- English verbs have a number of
(i) Morphological forms (non-third-person-sg (eat))
(ii) Third-person-sg (third-person-sg (eats))
(iii) Progressive (progressive (eating))
(iv) Past participle (eaten)

(c) Adjectives : includes many terms that describe properties or qualities.

- e.g., color (white, black), age (old, young), and value (good, bad).

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
2. English Word Classes (Cont…)
(d) The final open class form, is rather a hodge-podge, both semantically and
formally.
For example; all the italicized words are adverbs:
Unfortunately, John walked home extremely slowly yesterday.
(i) Directional adverbs or locative adverbs;
- Specify the direction or location of some action. e.g., home, downhill, etc.
(ii) Degree adverbs;
- Specify the extent of some action, process, or property. e.g., extremely, very,
somewhat, etc.
(iii) Manner adverbs;
- Describe the manner of some action or process. e.g., slowly, delicately, etc.
(iv) Temporal adverbs;
- Describe the time that some action or event took place. e.g., yesterday, monday, etc.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Tagsets for English
 Most of the popular tagsets for English;
- 87-tag tagset used for the Brown corpus.
 Two of the most commonly used targets are;
- small 45-tag Penn Treebank tagset.
- medium-sized 61-tag C5 tagset.

Example;
- Some examples of tagged sentences from the Penn Treebank
version of the Brown corpus is;
a) The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.
b) There/EX are/VBP 70/CD children/NNS there/RB

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Tagsets for English (Class Participation)
a) Although preliminary findings were
reported more than a year ago, the
latest results appear in today ‘s New
England Journal of Medicine.
b) Mrs. Shaefer never got around to
joining.
c) All we gotta do is go around the
corner.
d) She told off her friends.
e) She stepped off the train.
f) They were married by the Justice of
the Peace yesterday at 5:00.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
4. Part-of-Speech Tagging
 Part-of-speech tagging;
- is the process of assigning a part of speech or other synthetic class marker to each word in a corpus.
Problem:
 Book/VB that/DT flight/NN ./.
 Does/VBZ that/DT flight/NN serve/VB dinner/NN ?/.

Book is ambiguous.
- That is, it has more than one possible usage and part-of-speech.
(i) It can be a verb ( as in book that flight or to book the suspect).
(ii) or a noun (as in hand me that book or a book of matches).

Solution:
 The problem of POS-tagging is to resolve these ambiguities, choosing the proper tag for the context.
 Upgrade version of POS-tagging is used as 87-tag Brown corpus tagset.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
4. Part-of-Speech Tagging (Brown Corpus Tags) (Cont…)
Example
 Go away!
 He sometimes goes to the cafe.
 All the cakes have gone.
 We went on the excursion

Figure: 87-tag Brown corpus tagset.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
4. Part-of-Speech Tagging (Penn Treebank tagset vs Brown Corpus Tags)
(Class Participation)
 My aunt’s can opener can open a drum  Natural disasters – storms, flooding,
should look like this: hurricanes – occur infrequently but cause
 The old car broke down in the car park devastation that strains resources to breaking
point
 At least two men broke in and stole my TV
 The horses were broken in and ridden in two  Letters delivered on time by old-fashioned
weeks means are increasingly rare, so it is as well
that that is not the only option available
 Kim and Sandy both broke up with their
partners  It won’t rain but there might be snow on high
 The horse which Kim sometimes rides is more ground if the temperature stays about the
bad tempered than mine same over the next 24 hours
 The horse as well as the rabbits which we  The long and lonely road to redemption
wanted to eat has escaped begins with self-reflection: the need to delve
 It was my aunt’s car which we sold at auction inwards to deconstruct layers of psychological
last year in February obfuscation
 The only rabbit that I ever liked was eaten by  My wildest dream is to build a POS tagger
my parents one summer which processes 10K words per second and
 The veterans who I thought that we would uses only 1MB of RAM, but it may prove too
meet at the reunion were dead hard
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
5. Rule-Based Part-Of-Speech Tagging (Algorithm) (Cont…)

• The earliest algorithms for automatically assigning part-of-speech were based

on a two stages architecture (Harris, 1962; Klein and Simmons).
(1) The first stage used a dictionary to assign each word a list of potential parts-
of-speech.

(2) The second stage used large lists of hand-written disambiguation rules to
narrow down this list to a single part-of-speech for each word.

(1) First stage: dictionary to assign each word

 Choose the most likely tag for each ambiguous word, independent of
previous words.
- i.e., assign each token the POS category it occurred as most often in the
training set
- e.g., race – which POS is more likely in a corpus?

 This strategy gives you 90% accuracy in controlled tests

- So, this “unigram baseline” must always be compared against

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
5. Rule-Based Part-Of-Speech Tagging (Algorithm) (Cont…)
(1) First stage: dictionary to assign each word ( Example)
Which POS is more likely in a corpus (1,273,000 tokens)?
NN VB Total
race 400 600 1000
P(NN|race) = P(race&NN) / P(race) by the definition of conditional probability
- P(race) ≅ 1000/1,273,000 = .0008
- P(race&NN) ≅ 400/1,273,000 =.0003
- P(race&VB) ≅ 600/1,273,000 = .0005
And so we obtain:
- P(NN|race) = P(race&NN)/P(race) = .0003/.0008 =.375
- P(VB|race) = P(race&VB)/P(race) = .0004/.0008 = .625
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
5. Rule-Based Part-Of-Speech Tagging (Algorithm) (Cont…)
(2) Second stage: hand-written disambiguation rules
Adj = attribute of a noun. e.g., sweet color,
 Uses 56,000-word lexicon which lists parts-of-speech for red car, sixteen candles.
each word (using two-level morphology)
Uses up to 3,744 rules, or constraints, for POS ADV = modifies a verb. e.g., very tall, too
disambiguation. quickly.

QUANT = a determiner or pronoun indicative

ADV-that rule [sentence = it isn’t that old] of quantity. e.g., all people, both party.
Given input “that” (ADV/PRON/DET/COMP)
PRON = a word that can function as a noun
If (+1 A/ADV/QUANT) #next word is adj, adverb, or quantifier (e.g., I, you)
(+2 SENT_LIM) #and following word is a sentence
boundary DET = acting as determiner. A modifying
word that determines the kind of reference a
(NOT -1 SVOC/A) #and the previous word is not a verb like noun or group has. (e.g., a person, the game,
#consider which allows adjs as object every moment)
complements
COMP = acting as complement. A word which
Then eliminate non-ADV tags complete the meaning of an expression. (e.g.,
Else eliminate ADV tag He is weak, He is old.)

Algorithm Description :-
The first two clauses of this rule check to see that the that directly precedes a
sentence-final adjective, adverb, or quantifier.

In all other cases the adverb reading is eliminated.

The last clause eliminates cases preceded by verbs like consider or believe
which can take a noun and an adjective; this is to avoid tagging the following
instance of that as an adverb:

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
6. Statistical Tagging (HMM Part-of-speech tagging)
 During Part-of-speech tagging, probability-based tagging play major rule instead
of rule-based tagging or hand written rule tagging.
 Machines can learn from examples
- Learning can be supervised or unsupervised.
 Given training data, machines analyze the data, and learn rules which generalize
to new examples.
- Can be sub-symbolic (rule may be a mathematical function) e.g., neural nets.
- Or it can be symbolic (rules are in a representation that is similar to
representation used for hand-coded rules).
 In general, machine learning approaches allow for more tuning to the needs of a
corpus, and can be reused across corpora.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
6. Statistical Tagging (HMM Part-of-speech tagging) (Cont…)
 In a classification task, we are given some observation(s) and our job is to
determine which of a set of classes it belongs to.
Part-of-speech tagging is generally treated as a sequence classification task.
- the observation is a sequence of words (let’s say a sentence), and it is our job to
assign them a sequence of part-of-speech tags.
For example, say we are given a sentence like
- “He will race”.
• What is the best sequence of tags which corresponds to this sequence of
words?
- The Bayesian interpretation of this task starts by considering all possible
sequences of classes in this case, all possible sequences of tags.
- Out of this universe of tag sequences, we want to choose the tag sequence which
is most probable given the observation sequence of n words 𝑤1𝑛 .
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
6. Statistical Tagging (HMM Part-of-speech tagging) [Example]
What you want to do is find the “best Example: He will race
sequence” of POS tags T=T1..Tn for a Possible sequences:
sentence W=W1..Wn.
- He/PRP will/MD race/NN
- (Here T1 is pos_tag(W1)). - He/PRP will/NN race/NN
find a sequence of POS tags T that - He/PRP will/MD race/VB
maximizes P(T|W)
- He/PRP will/NN race/VB 4 different probabilities
Using Bayes’ Rule, we can say sequences values for
W = W1 W2 W3 W4 “He will race”
P(T|W) = P(W|T)*P(T)/P(W)
= He will race
We want to find the value of T T = T1 T2 T3 T4
which maximizes the RHS - Choices:
=> denominator can be discarded • T= PRP MD NN
(same for every T) • T= PRP NN NN
=> Find T which maximizes • T = PRP MD VB
P(W|T) * P(T) • T = PRP NN VB
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
6. Statistical Tagging (HMM Part-of-speech tagging) [Independence Assumptions]

Assumption : Case 1:
Assume that current event is based only on previous n-1 events (for a bigram
model, it’s based only on previous 1 event)
 P(T1….Tn) ≅ Πi=1, n P(Ti| Ti-1)
- assumes that the event of a POS tag occurring is independent of the event of any
other POS tag occurring, except for the immediately previous POS tag.
=> From a linguistic standpoint, this seems an unreasonable assumption, due to
long-distance dependencies. {e.g., Ali and his friends (go or goes?????)}
Assumption : Case 2:
 P(W1….Wn | T1….Tn) ≅ Πi=1, n P(Wi| Ti)
- assumes that the event of a word appearing in a category is independent of the
event of any surrounding word or tag, except for the tag at this position.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
6. Statistical Tagging (HMM Part-of-speech tagging) [Independence Assumptions]
(Cont…)

 Linguists know both these assumptions are incorrect!

- But, nevertheless, statistical approaches based on these assumptions work
pretty well for part-of-speech tagging.
 In particular, with Hidden Markov Models (HMMs)
- Very widely used in both POS-tagging and speech recognition, among
other problems.
- A Markov model, or Markov chain, is just a weighted Finite State
Automaton.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
6. Statistical Tagging (HMM Part-of-speech tagging)
POS Tagging Based on Bigrams
 Problem: Find T which maximizes P(W | T) * P(T)
- Here W=W1..Wn and T=T1..Tn
 Using the bigram model, we get:
(a) Transition probabilities (prob. of transitioning from one state/tag to
another):
• P(T1….Tn) ≅ Πi=1, n P(Ti|Ti-1)
(b) Emission probabilities (prob. of emitting a word at a given state):
• P(W1….Wn | T1….Tn) ≅ Πi=1, n P(Wi| Ti)
 So, we want to find the value of T1..Tn which maximizes:
Πi=1, n P(Wi| Ti) * P(Ti| Ti-1)
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
6. Statistical Tagging (HMM Part-of-speech tagging)
POS Tagging Based on Bigrams
(a) Transition probabilities P(T1….Tn) ≅
Πi=1, n P(Ti|Ti-1)
Example: He will race
Choices for T=T1..T3
- T= PRP MD NN
- T= PRP NN NN
- T = PRP MD VB
- T = PRP NN VB
POS bigram probs from training corpus
can be used for P(T)
P(PRP-MD-NN)=1*.8*.4 =.32
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
6. Statistical Tagging (HMM Part-of-speech tagging) (Cont…)
(a) Transition probabilities
 From the training corpus, we need to find the Ti which maximizes
Πi=1, n P(Wi| Ti) * P(Ti| Ti-1)
 So, we’ll need to factor the lexical generation (emission)
probabilities, somehow:
 Choices for T=T1..T3
- T= PRP MD NN
- T= PRP NN NN
- T = PRP MD VB
- T = PRP NN VB

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
7. HMM Part-of-speech tagging (Tag Transition Probability)
 HMM part-of-speech tagging contains two kinds of probabilities,
(a) Tag transition probabilities
(b) Word likelihood probabilities
(a) Tag transition probabilities
 Tag and Tag combination.
 (The tag transition probabilities, P(ti|ti−1), represent the probability of a tag
given the previous tag.)

For Example ( This/DT book/NN is interesting)

 In the 45-tag Treebank Brown corpus, the tag DT occurs 116,454 times. Of these, DT is followed by NN
56,509
Thus the MLE estimate of the transition probability is calculated as follows:
𝐶 𝐷𝑇,𝑁𝑁 56059
P(NNlDT)= = =.49
𝐶(𝐷𝑇) 116454
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
7. HMM Part-of-speech tagging (Word likelihood Probability)

(b) Word likelihood probabilities

 Word and their tag combination.

 (The word likelihood probabilities, P(wi|ti), represent the probability, given
that we see a given tag, that it will be associated with a given word)

For Example ( This book is/VBZ interesting)

In Treebank Brown corpus, the tag VBZ occurs 21,627 times, and VBZ is the
tag for “is” 10,073 times. Thus

𝐶 𝑉𝐵𝑍,𝑖𝑠 10073
P(islVBZ)= = =.47
𝐶(𝑉𝐵𝑍) 21627
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
7. HMM Part-of-speech tagging (Example)
Example of Tagging word “ to race” as VB as well NN.

Tag-to-tag combination Word-to-tag combination Tag-to-tag combination

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
8. Formalizing Hidden Markov Model Taggers

 The HMM is an extension of the finite automata a finite automaton is defined by a set of states, and a set of
transitions between states that are taken based on the input observations.
 A weighted finite-state automaton is a simple augmentation of the finite automaton in which each arc is
associated with a probability, indicating how likely that path is to be taken. The probability on all the arcs
leaving a node must sum to 1.
 A Markov chain is a special case of a weighted automaton in which the input sequence uniquely determines
which states the automaton will go through. Because they can’t represent inherently ambiguous problems, a
Markov chain is only useful for assigning probabilities to unambiguous sequences. While the Markov chain
is appropriate for situations where we can see the actual conditioning events, it is not appropriate in part-of-
speech tagging. This is because in part-of-speech tagging, while we observe the words in the input, we do not
observe the part-of-speech tags.
 Thus we can’t condition any probabilities on, say, a previous part-of-speech tag, because we cannot be
completely certain exactly which tag applied to the previous word.
 A Hidden Markov Model (HMM) allows us to talk about both Model observed events (like words that we
see in the input) and hidden events (like part-of-speech tags) that we think of as causal factors in our
probabilistic model. An HMM is specified by the following components:

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
8. Formalizing Hidden Markov Model Taggers (Variable definations)

a31
a01
NNP
VB a34
a14
End

Start a03
a12
a32
a02

Apply single chain rule of HMM taggers over following NLP sentences of;
• Secretariat is expected to race tomorrow.
• is Secretariat expected to race tomorrow.
• expected Secretariat is to race tomorrow.
• to Secretariat is expected race tomorrow.
• race Secretariat is expected to tomorrow.
• tomorrow Secretariat is expected to race .

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
9. The Viterbi Algorithm for HMM Tagging
 For any model, such as HMM, that contains hidden variables,
- the task of determining which sequence of variables is the underlying source of some
sequence of observations is called the decoding task.

 The Viterbi algorithm is decoding perhaps the most common algorithm used for HMMs,
whether for part-of-speech tagging or for speech recognition.
- looks a lot like the minimum edit distance algorithm.

The slightly simplified version of the Viterbi algorithm that we will present an input of a
single HMM and includes;
- a set of observed words O = (o1o2o3 . . .oT ) and
- returns the most probable state/tag sequence Q = (q1q2q3 . . .qT), together with its
probability.

Let the HMM be defined by the two Tables (next slide) expresses the aij probabilities,
- the transition probabilities between hidden states (i.e. part-of-speech tags).
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
9. The Viterbi Algorithm for HMM Tagging (Cont…)

Tag-to-tag combination Matrix

Word-to-tag combination Matrix

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
9. The Viterbi Algorithm for HMM Tagging (Cont…)
 Figure expresses the Bi(ot ) probabilities, the observation likelihoods of words given tags.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

The Pixar Theory
100% (1)
The Pixar Theory
14 pages
Key Links C1-C2 Test 2
No ratings yet
Key Links C1-C2 Test 2
6 pages
A Supreme Case of Coolness
No ratings yet
A Supreme Case of Coolness
8 pages
6 Hats Thinking Skills: Edward de Bono
50% (2)
6 Hats Thinking Skills: Edward de Bono
13 pages
A. Trend Percentages: Required
No ratings yet
A. Trend Percentages: Required
5 pages
Lec-5 POStagging
No ratings yet
Lec-5 POStagging
24 pages
NLP_Lecture_9_and_10_Week_5
No ratings yet
NLP_Lecture_9_and_10_Week_5
10 pages
lec04-2-PartOfSpeechTagging
No ratings yet
lec04-2-PartOfSpeechTagging
56 pages
Chapter Two Natural Language Processing
No ratings yet
Chapter Two Natural Language Processing
141 pages
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
No ratings yet
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
63 pages
Lect6 Pos
No ratings yet
Lect6 Pos
62 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
POS Tagging 2.0
No ratings yet
POS Tagging 2.0
14 pages
NLP 3
No ratings yet
NLP 3
25 pages
Lecture 16-17-18-19
No ratings yet
Lecture 16-17-18-19
42 pages
Ilak Pos Tagging
No ratings yet
Ilak Pos Tagging
48 pages
nlp-unit-iii-notes
No ratings yet
nlp-unit-iii-notes
30 pages
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
No ratings yet
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
108 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
NLP - Pos and N-Gram Models
No ratings yet
NLP - Pos and N-Gram Models
21 pages
POS Tagging: Introduction: Heng Ji
No ratings yet
POS Tagging: Introduction: Heng Ji
35 pages
Lec3-posner intro
No ratings yet
Lec3-posner intro
30 pages
17
No ratings yet
17
27 pages
Altaf POS Guideline 2009
No ratings yet
Altaf POS Guideline 2009
73 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
Hmm
No ratings yet
Hmm
94 pages
Sequence Labeling For Parts of Speech and Named Entities: To Each Word A Warbling Note A Midsummer Night's Dream, V.I
No ratings yet
Sequence Labeling For Parts of Speech and Named Entities: To Each Word A Warbling Note A Midsummer Night's Dream, V.I
27 pages
08 Sequence Labelling
No ratings yet
08 Sequence Labelling
27 pages
Chapter Four - 2
No ratings yet
Chapter Four - 2
118 pages
Speech and Language Processing: SLP Chapter 5
No ratings yet
Speech and Language Processing: SLP Chapter 5
56 pages
Sequence Labeling For Parts of Speech and Named Entities: To Each Word A Warbling Note A Midsummer Night's Dream, V.I
No ratings yet
Sequence Labeling For Parts of Speech and Named Entities: To Each Word A Warbling Note A Midsummer Night's Dream, V.I
27 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
unit-3
No ratings yet
unit-3
50 pages
Module 2 HMMppt
No ratings yet
Module 2 HMMppt
31 pages
Module-2_NLP (1)
No ratings yet
Module-2_NLP (1)
50 pages
Natural Language Processing: Dr. G. Bharadwaja Kumar
No ratings yet
Natural Language Processing: Dr. G. Bharadwaja Kumar
44 pages
NLP UNIT 2 Part 2
No ratings yet
NLP UNIT 2 Part 2
6 pages
IS 7118 Unit-5 POS Tagging
No ratings yet
IS 7118 Unit-5 POS Tagging
89 pages
Module 4: Lecture-15
No ratings yet
Module 4: Lecture-15
44 pages
3 Natural Language Processing-PoS Tagging
No ratings yet
3 Natural Language Processing-PoS Tagging
14 pages
Grammar Morphological Process
No ratings yet
Grammar Morphological Process
7 pages
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
No ratings yet
Introduction Machine Learning & NLP: 17B1NCI731 (Credits:3, Contact Hours: 3)
93 pages
Syntactic Processing - Lecture Notes
No ratings yet
Syntactic Processing - Lecture Notes
56 pages
lecture7-pos-tagging
No ratings yet
lecture7-pos-tagging
33 pages
Tagging and its types
No ratings yet
Tagging and its types
3 pages
NLP Techmax NLP
100% (1)
NLP Techmax NLP
137 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
Parts of Speech and Morphology - Phrase Structure - Semantics and Pragmatics
No ratings yet
Parts of Speech and Morphology - Phrase Structure - Semantics and Pragmatics
39 pages
Speech and Language Processing
No ratings yet
Speech and Language Processing
26 pages
pos tagging and chunking
No ratings yet
pos tagging and chunking
29 pages
Speech Recognition Architecture
No ratings yet
Speech Recognition Architecture
13 pages
Automatic tagging. Project, Holovko Yana
No ratings yet
Automatic tagging. Project, Holovko Yana
9 pages
Lecture 9: Part of Speech: Kai-Wei Chang CS at University of Virginia
No ratings yet
Lecture 9: Part of Speech: Kai-Wei Chang CS at University of Virginia
21 pages
4.Chapter5_ Syntactic and Semantic Representations
No ratings yet
4.Chapter5_ Syntactic and Semantic Representations
47 pages
10 - POS Tagging
No ratings yet
10 - POS Tagging
75 pages
Parts of Speech Using Hidden Markov Models
No ratings yet
Parts of Speech Using Hidden Markov Models
5 pages
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
No ratings yet
NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing
28 pages
Lecture 20-23 Part of Speech Tagging
No ratings yet
Lecture 20-23 Part of Speech Tagging
36 pages
Natural Language Processing: Parts of Speech Tagging - Pos
No ratings yet
Natural Language Processing: Parts of Speech Tagging - Pos
20 pages
NLP Part1
No ratings yet
NLP Part1
67 pages
Electronic Language Resources
No ratings yet
Electronic Language Resources
67 pages
Part of Speech Tagging
No ratings yet
Part of Speech Tagging
13 pages
Arabic Pos Tagging Khoja
No ratings yet
Arabic Pos Tagging Khoja
6 pages
module-3
No ratings yet
module-3
33 pages
Grammar and Linguistics: Core Concepts
From Everand
Grammar and Linguistics: Core Concepts
Saraswati Saini
No ratings yet
lect33-textcat (1)
No ratings yet
lect33-textcat (1)
70 pages
Syntactic and Dependency Parsing
No ratings yet
Syntactic and Dependency Parsing
159 pages
ch07-consistency-replication (1)
No ratings yet
ch07-consistency-replication (1)
30 pages
reduction proofs
No ratings yet
reduction proofs
9 pages
bag_of_words nlp
No ratings yet
bag_of_words nlp
23 pages
Tut4_WordEmb nlp
No ratings yet
Tut4_WordEmb nlp
30 pages
Primes
No ratings yet
Primes
39 pages
10-estimators-pre-lecture
No ratings yet
10-estimators-pre-lecture
109 pages
slides08-lr-parsing
No ratings yet
slides08-lr-parsing
25 pages
new trends for authentication
No ratings yet
new trends for authentication
5 pages
2DI90_chID190-CH5
No ratings yet
2DI90_chID190-CH5
62 pages
2DI90_ch11 (1)
No ratings yet
2DI90_ch11 (1)
54 pages
Jarrar.LectureNotes.Ch1.Introduction
No ratings yet
Jarrar.LectureNotes.Ch1.Introduction
18 pages
NLP-LLM
No ratings yet
NLP-LLM
47 pages
2DI90_ch9 (1)
No ratings yet
2DI90_ch9 (1)
83 pages
CSE538 sp25 (4) Lexical and Vector Semantics 2-25 nlp
No ratings yet
CSE538 sp25 (4) Lexical and Vector Semantics 2-25 nlp
126 pages
01-introduction plc
No ratings yet
01-introduction plc
53 pages
13-oo-opolymorphism plc
No ratings yet
13-oo-opolymorphism plc
15 pages
imc_shift-cipher
No ratings yet
imc_shift-cipher
17 pages
ML4D-L6 nlp2
No ratings yet
ML4D-L6 nlp2
58 pages
13-neuralcrf pos tagging
No ratings yet
13-neuralcrf pos tagging
40 pages
4_slides Regualer expression
No ratings yet
4_slides Regualer expression
75 pages
2.BasicTextProcessing NEW
No ratings yet
2.BasicTextProcessing NEW
39 pages
61799956 POS tagging
No ratings yet
61799956 POS tagging
63 pages
3_slides corpus3
No ratings yet
3_slides corpus3
88 pages
04-textcat text class
No ratings yet
04-textcat text class
77 pages
07-covariance-answers-hidden-lecture
No ratings yet
07-covariance-answers-hidden-lecture
62 pages
Ch. 1 Notes
No ratings yet
Ch. 1 Notes
11 pages
02 Random Vars All Handout
No ratings yet
02 Random Vars All Handout
23 pages
01-bayes-all-handout prob
No ratings yet
01-bayes-all-handout prob
28 pages
Mrunal (Economy) Fiscal Consolidation, Fiscal Deficit - Meaning, Implications, Explained Why Vijay Kelkar Committee Was Formed - Print
No ratings yet
Mrunal (Economy) Fiscal Consolidation, Fiscal Deficit - Meaning, Implications, Explained Why Vijay Kelkar Committee Was Formed - Print
13 pages
Scheme of Studies For Bachelor of Business Administration (Fall 2015-19) PDF
No ratings yet
Scheme of Studies For Bachelor of Business Administration (Fall 2015-19) PDF
162 pages
Car Pollution Essay
100% (3)
Car Pollution Essay
5 pages
A Satire On The Victorian Age in The Importance of Being Earnest
100% (1)
A Satire On The Victorian Age in The Importance of Being Earnest
3 pages
Grand Alumni Homecoming 2025 Emcee Script
No ratings yet
Grand Alumni Homecoming 2025 Emcee Script
4 pages
To Kill A Mockingbird Final Exa
100% (1)
To Kill A Mockingbird Final Exa
6 pages
.Laureano v. CA, 324 SCRA 414 (2010)
No ratings yet
.Laureano v. CA, 324 SCRA 414 (2010)
5 pages
Today Rag
No ratings yet
Today Rag
8 pages
Cetin 2016
No ratings yet
Cetin 2016
10 pages
Sample Complaint Affidavit For Rape
100% (1)
Sample Complaint Affidavit For Rape
6 pages
Resume and Application Letter and Other Requirements
No ratings yet
Resume and Application Letter and Other Requirements
16 pages
MT Chapter 1 Evaluation Activity
No ratings yet
MT Chapter 1 Evaluation Activity
5 pages
MSE Walls For Support of Highway Bridges
50% (2)
MSE Walls For Support of Highway Bridges
28 pages
Msce Bk II Final
No ratings yet
Msce Bk II Final
7 pages
Engluish SBA P.O.I
100% (1)
Engluish SBA P.O.I
6 pages
Ida Vega-Landow’s New Book, "The Zombie Godmother," Follows a Fairy Godmother Who Makes a Regrettable Deal with the Devil That Threatens the Lives of Those She Loves
No ratings yet
Ida Vega-Landow’s New Book, "The Zombie Godmother," Follows a Fairy Godmother Who Makes a Regrettable Deal with the Devil That Threatens the Lives of Those She Loves
3 pages
Gold_Exp_B2_Unit_7_Skills_Test_B.doc
No ratings yet
Gold_Exp_B2_Unit_7_Skills_Test_B.doc
4 pages
Potency of FP and MP
No ratings yet
Potency of FP and MP
5 pages
Agility in HR (Icp-Ahr) by Management 3.0: Icagile-Accredited Course For People Working With People
No ratings yet
Agility in HR (Icp-Ahr) by Management 3.0: Icagile-Accredited Course For People Working With People
7 pages
PR2 Recitation
No ratings yet
PR2 Recitation
7 pages
Art Integration Lesson Plan
No ratings yet
Art Integration Lesson Plan
4 pages
Rasan Aycox Complete Resume 1 3
No ratings yet
Rasan Aycox Complete Resume 1 3
2 pages
John_Hollander
No ratings yet
John_Hollander
5 pages
Universiti Teknologi Mara (Uitm) : IMD311: GROUP PROJECT (30%) Assessment
No ratings yet
Universiti Teknologi Mara (Uitm) : IMD311: GROUP PROJECT (30%) Assessment
4 pages
Talusan v. Tayag
100% (1)
Talusan v. Tayag
2 pages

Lecture_5_Part_Of_Speech_Tagging

Uploaded by

Lecture_5_Part_Of_Speech_Tagging

Uploaded by

Part-of-Speech Tagging

 Part-of-speech tagging,  Rule-based Part-of-speech Tagging,

 The significance of Parts-of-speech (POS) or tagsets includes;

Computational algorithms for assigning parts-of-speech to words (part-of-speech

(2) Open Class type:

2) Mass nouns are used when something is conceptualized as a homogenous

(c) Adjectives : includes many terms that describe properties or qualities.

Figure: 87-tag Brown corpus tagset.

• The earliest algorithms for automatically assigning part-of-speech were based

(1) First stage: dictionary to assign each word

 This strategy gives you 90% accuracy in controlled tests

QUANT = a determiner or pronoun indicative

In all other cases the adverb reading is eliminated.

 Linguists know both these assumptions are incorrect!

For Example ( This/DT book/NN is interesting)

(b) Word likelihood probabilities

 Word and their tag combination.

For Example ( This book is/VBZ interesting)

Tag-to-tag combination Word-to-tag combination Tag-to-tag combination

Tag-to-tag combination Matrix

Word-to-tag combination Matrix

You might also like