0% found this document useful (0 votes)
38 views21 pages

Xu-Ly-Ngon-Ngu-Tu-Nhien - Kai-Wei-Chang - 09-Pos - (Cuuduongthancong - Com)

This document discusses parts of speech tagging. It begins by describing traditional parts of speech like nouns, verbs, adjectives, and adverbs. It then discusses parts of speech tagsets and how tagging is the process of assigning a part of speech tag to each word. Parts of speech tagging is useful for tasks like parsing, information extraction, speech recognition and machine translation. The document also discusses open and closed word classes and provides examples of tags from common tagsets like Penn Treebank. It concludes by noting that parts of speech tagging is evaluated based on the percentage of words tagged correctly compared to a test set.

Uploaded by

NeilFaver
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views21 pages

Xu-Ly-Ngon-Ngu-Tu-Nhien - Kai-Wei-Chang - 09-Pos - (Cuuduongthancong - Com)

This document discusses parts of speech tagging. It begins by describing traditional parts of speech like nouns, verbs, adjectives, and adverbs. It then discusses parts of speech tagsets and how tagging is the process of assigning a part of speech tag to each word. Parts of speech tagging is useful for tasks like parsing, information extraction, speech recognition and machine translation. The document also discusses open and closed word classes and provides examples of tags from common tagsets like Penn Treebank. It concludes by noting that parts of speech tagging is evaluated based on the percentage of words tagged correctly compared to a test set.

Uploaded by

NeilFaver
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Lecture 9: Part of Speech

Kai-Wei Chang
CS @ University of Virginia
[email protected]

Couse webpage: http://kwchang.net/teaching/NLP16

CS6501 Natural Language Processing 1


CuuDuongThanCong.com https://fb.com/tailieudientucntt
This lecture

v Parts of speech (POS)


v POS Tagsets

CS6501 Natural Language Processing 2


CuuDuongThanCong.com https://fb.com/tailieudientucntt
Parts of Speech
v Traditional parts of speech
v ~ 8 of them

CS6501 Natural Language Processing 3


CuuDuongThanCong.com https://fb.com/tailieudientucntt
POS examples

vN noun chair, bandwidth, pacing


vV verb study, debate, munch
v ADJ adjective purple, tall, ridiculous
v ADV adverb unfortunately, slowly
vP preposition of, by, to
v PRO pronoun I, me, mine
v DET determiner the, a, that, those

CS6501 Natural Language Processing 4


CuuDuongThanCong.com https://fb.com/tailieudientucntt
Parts of Speech
v A.k.a. parts-of-speech, lexical categories,
word classes, morphological classes,
lexical tags...

v Lots of debate within linguistics about the


number, nature, and universality of these

CS6501 Natural Language Processing 5


CuuDuongThanCong.com https://fb.com/tailieudientucntt
POS Tagging
v The process of assigning a part-of-speech to
each word in a collection (sentence).
WORD tag

the DET
koala N
put V
the DET
keys N
on P
the DET
table N
CS6501 Natural Language Processing 6
CuuDuongThanCong.com https://fb.com/tailieudientucntt
Why is POS Tagging Useful?
v First step of a vast number of practical tasks
v Parsing
v Need to know if a word is an N or V before you can parse
v Information extraction
v Finding names, relations, etc.
v Speech synthesis/recognition
v OBject obJECT
v OVERflow overFLOW
v DIScount disCOUNT
v CONtent conTENT
v Machine Translation

CS6501 Natural Language Processing 7


CuuDuongThanCong.com https://fb.com/tailieudientucntt
Open and Closed Classes
v Closed class: a small fixed membership
v Prepositions: of, in, by, …
v Pronouns: I, you, she, mine, his, them, …
v Usually function words (short common words which
play a role in grammar)
v Open class: new ones can be created
v English has 4: Nouns, Verbs, Adjectives, Adverbs
v Many languages have these 4, but not all!

CS6501 Natural Language Processing 8


CuuDuongThanCong.com https://fb.com/tailieudientucntt
Open Class Words

v Nouns
v Proper nouns (Boulder, Granby, Eli Manning)
v Common nouns (the rest).
v Count nouns and mass nouns
v Count: have plurals, get counted: goat/goats, one
goat, two goats
v Mass: don’t get counted (snow, salt, communism)
(*two snows)
v Verbs
v In English, have morphological affixes (eat/eats/eaten)

CS6501 Natural Language Processing 9


CuuDuongThanCong.com https://fb.com/tailieudientucntt
Closed Class Words
Examples:
vprepositions: on, under, over, …
vparticles: up, down, on, off, …
vdeterminers: a, an, the, …
vpronouns: she, who, I, ..
vconjunctions: and, but, or, …
vauxiliary verbs: can, may should, …
vnumerals: one, two, three, third, …

CS6501 Natural Language Processing 10


CuuDuongThanCong.com https://fb.com/tailieudientucntt
Prepositions from CELEX

CELEX: online dictionary


Frequency counts are from COBUILD 16-billion-word corpus

CS6501 Natural Language Processing 11


CuuDuongThanCong.com https://fb.com/tailieudientucntt
English Particles

CS6501 Natural Language Processing 12


CuuDuongThanCong.com https://fb.com/tailieudientucntt
Conjunctions

CS6501 Natural Language Processing 13


CuuDuongThanCong.com https://fb.com/tailieudientucntt
Choosing a Tagset

v Could pick very coarse tagsets


v N, V, Adj, Adv, Other
v More commonly used set is finer grained
v E.g., “Penn TreeBank tagset”, 45 tags: PRP$, WRB,
WP$, VBG
v Brown cropus, 87 tags.
v Prague Dependency Treebank (Czech)
v 4452 tags
v AAFP3----3N----: (nejnezajímavějším)
Adj Regular Feminine Plural….Superlative [Hajic 2006, VMC tutorial]

CS6501 Natural Language Processing 14


CuuDuongThanCong.com https://fb.com/tailieudientucntt
Penn TreeBank POS Tagset

CS6501 Natural Language Processing 15


CuuDuongThanCong.com https://fb.com/tailieudientucntt
Using the Penn Tagset

v The/DT grand/JJ jury/NN


commmented/VBD on/IN a/DT number/NN
of/IN other/JJ topics/NNS ./.

CS6501 Natural Language Processing 16


CuuDuongThanCong.com https://fb.com/tailieudientucntt
Universal Tag set

v ~ 12 different tags
v NOUN, VERB, ADJ, ADV, PRON, DET, ADP,
NUM, CONJ, PRT, “.”, X

CS6501 Natural Language Processing 17


CuuDuongThanCong.com https://fb.com/tailieudientucntt
POS Tagging v.s. Word clustering

v Words often have more than one POS:


back
v The back door = JJ
v On my back = NN
v Win the voters back = RB
v Promised to back the bill = VB

These examples from Dekang Lin


CS6501 Natural Language Processing 18
CuuDuongThanCong.com https://fb.com/tailieudientucntt
How Hard is POS Tagging?

CS6501 Natural Language Processing 19


CuuDuongThanCong.com https://fb.com/tailieudientucntt
POS tag sequences

v Some tag sequences more likely occur


than others
v POS Ngram view
https://books.google.com/ngrams/graph?co
ntent=_ADJ_+_NOUN_%2C_ADV_+_NOU
N_%2C+_ADV_+_VERB_

Existing methods often model POS tagging as a


sequence tagging problem

CS6501 Natural Language Processing 20


CuuDuongThanCong.com https://fb.com/tailieudientucntt
Evaluation

v How many words in the unseen test data


can be tagged correctly?
v Usually evaluated on Penn Treebank
v State of the art ~97%
v Trivial baseline (most likely tag) ~94%
v Human performance ~97%

CS6501 Natural Language Processing 21


CuuDuongThanCong.com https://fb.com/tailieudientucntt

You might also like