0% found this document useful (0 votes)
76 views

Part of Speech Tagging

Part-of-speech (POS) tagging is the process of assigning a POS tag like noun, verb, adjective to each word in a sentence. It is useful for tasks like information retrieval, text-to-speech, and word sense disambiguation. Choosing an appropriate tagset and training a statistical model on a tagged corpus are important for POS tagging. Statistical tagging aims to assign tags that maximize the probability of a tag sequence given words, based on tag and word probabilities derived from a training corpus. Ambiguity makes POS tagging challenging.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Part of Speech Tagging

Part-of-speech (POS) tagging is the process of assigning a POS tag like noun, verb, adjective to each word in a sentence. It is useful for tasks like information retrieval, text-to-speech, and word sense disambiguation. Choosing an appropriate tagset and training a statistical model on a tagged corpus are important for POS tagging. Statistical tagging aims to assign tags that maximize the probability of a tag sequence given words, based on tag and word probabilities derived from a training corpus. Ambiguity makes POS tagging challenging.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Part-of-Speech Tagging

The process of assigning a part-of-speech to each word in a sentence

WORDS heat water in a large vessel TAGS


N V P DET ADJ

Example
Word heat water in a large vessel Tag verb (noun) noun (verb) prep (noun, adv) det (noun) adj (noun) noun

What is POS tagging good for?


Useful in Information Retrieval Text to Speech: object(N) vs. object(V); discount(N) vs. discount(V) Word Sense Disambiguation Useful as a preprocessing step of parsing Unique tag to each word reduces the number of parses

Choosing a tagset
Need to choose a standard set of tags to do POS tagging One tag for each part of speech Could pick very coarse tagset N, V, Adj, Adv, Prep. More commonly used set is finer-grained E.g., the UPenn TreeBank II tagset has 36 word tags PRP, PRP$, VBG,VBD, JJR, JJS (also has tags for phrases) Even more finely-grained tagsets exist

Why is POS tagging hard?


Ambiguity Plants/N need light and water. Each one plant/V one. Flies like a flower Flies: noun or verb? like: preposition, adverb, conjunction, noun, or verb? a: article, noun, or preposition? flower: noun or verb?

Methods for POS tagging


Rule-Based POS tagging e.g., ENGTWOL [ Voutilainen, 1995 ] large collection (> 1000) of constraints on what sequences of tags are allowable Transformation-based tagging e.g.,Brills tagger [ Brill, 1995 ] sorry, I dont know anything about this Stochastic (Probabilistic) tagging e.g., TNT [ Brants, 2000 ] Ill discuss this in a bit more detail

Stochastic Tagging
Based on probability of certain tag occurring, given various possibilities Necessitates a training corpus A collection of sentences that have already been tagged Several such corpora exist One of the best known is the Brown University Standard Corpus of Present-Day American English (or just the Brown Corpus) about 1,000,000 words from a wide variety of sources POS tags assigned to each

Approach 1
Assign each word its most likely POS tag If w has tags t1, , tk, then can use P(ti | w) = c(w,ti)/(c(w,t1) + + c(w,tk)), where c(w,ti) = number of times w/ti appears in the corpus Success: 91% for English Example heat :: noun/89, verb/5

Approach 2
Given: sequence of words W W = w1,w2,,wn (a sentence) e.g., W = heat water in a large vessel Assign sequence of tags T: T = t1, t2, , tn Find T that maximizes P(T | W)

Practical Statistical Tagger


By Bayes Rule, P(T | W) = P(W|T) P(T) / P(W) = P(W|T) P(T) So find T that maximizes P(W | T) P(T) Chain rule: P(T) = P(t1) P(t2 | t1) P(t3 | t1, t2) P(t3 | t1, t2, t3) P(tn | t1, t2, tn-1) As an approximation, use P(T) P(t1) P(t2 | t1) P(t3 | t2) P(tn | tn-1) Assume each word is dependent only on its own POS tag: given its POS tag, it is conditionally independent of the other words around it. Then P(W|T) = P(w1 | t1) P(w2 | t2) P(wn | tn) So P(T) P(W|T) P(t1) P(t2|t1) P(tn|tn-1) P(w1|t1) P(w2|t2) P(wn|tn)

Getting the Conditional Probabilties


Want to compute
P(T) P(W|T) P(t1) P(t2|t1) P(tn|tn-1) P(w1|t1) P(w2|t2) P(wn|tn)

Let c(ti) = frequency of ti in the corpus c(wi,ti) = frequency of wi/ti in the corpus c(ti-1,ti) = frequency of ti-1 ti in the corpus Then we can use P(ti|ti-1) = c(ti-1,ti)/c(ti-1), P(wi|ti) = c(wi,ti)/c(ti)

Example
Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NN to/TO race/??? People/NNS continue/VBP to/TO inquire/VB the/DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NN the/DT race/??? For each word wi, ti = argmaxt P(t|ti-1)P(wi|t) max( P(VB|TO) P(race|VB), P(NN|TO) P(race|NN) ) From the Brown corpus P(NN|TO) = .021 P(race|NN) = .00041 P(VB|TO) = .34 P(race|VB) = .00003 So P(NN|TO) P(race|NN) = .021 .00041 = .000007 P(VB|TO) P(race|VB) = .34 .00003 = .00001

UPenn TreeBank II word tags


CC - Coordinating conjunction CD - Cardinal number DT - Determiner EX - Existential there FW - Foreign word IN - Preposition or subordinating conjunction JJ - Adjective JJR - Adjective, comparative JJS - Adjective, superlative LS - List item marker MD - Modal NN - Noun, singular or mass NNS - Noun, plural NNP - Proper noun, singular NNPS - Proper noun, plural PDT - Predeterminer POS - Possessive ending PRP - Personal pronoun PRP$ - Possessive pronoun RB - Adverb RBR - Adverb, comparative RBS - Adverb, superlative RP - Particle SYM - Symbol TO - to UH - Interjection VB - Verb, base form VBD - Verb, past tense VBG - Verb, gerund or present participle VBN - Verb, past participle VBP - Verb, non-3rd person singular present VBZ - Verb, 3rd person singular present WDT - Wh-determiner WP - Wh-pronoun WP$ - Possessive wh-pronoun WRB - Wh-adverb

You might also like