nlp unit 3
nlp unit 3
1
2
SEMANTIC ANALYSIS
3
TYPES OF AMBIGUITY
• Lexical Ambiguity results when a word has more than one possible meaning such as in the
case of “board”, it could mean the verb “to get on” or it could refer to a flat slab of wood.
•Syntactic Ambiguity is present when more than one parse of a sentence exists.
Eg. “He lifted the branch with the red leaf.”
“He lifted the branch with the red leaf.”
The verb phrase may contain “with the red leaf” as part of the embedded noun phrase describing the
branch or “with the red leaf” may be interpreted as a prepositional phrase describing the action instead of the
branch, implying that he used the red leaf to lift the branch.
• Semantic Ambiguity is existent when more than one possible meaning exists for a sentence as
in “He lifted the branch with the red leaf.” It may mean that the person in question used a
red leaf to lift the branch or that he lifted a branch that had a red leaf on it.
4
• Referential Ambiguity is the result of referring to something without explicitly
naming it by using words like “it”, ‘he” and “they.” These words require the
target to be looked up and may be impossible to resolve such as in the sentence:
eg. “The interface sent the peripheral device data which caused it to break”, it
could mean the peripheral device, the data, or the interface.
5
SEMANTIC ANALYSIS
6
RELATIONS BETWEEN WORDS
• Hyponymy
It is relationship between a generic term and instances of
that generic term. The generic term is called hypernym and
its instances are called hyponyms.
Eg. the word color is hypernym and the color blue, yellow
etc. are hyponyms.
• Homonymy
It is defined as the words having same spelling or same form but
having different and
unrelated meaning.
Eg. the word “Bat” is a homonymy word because bat can be an
implement to hit a ball or bat is a nocturnal flying mammal also.
7
• Polysemy
Polysemy means “many signs”. It represents words with
different but related sense. Polysemy has the same
spelling but different and related meaning.
Eg. the word “bank” is a polysemy word having the following
meanings −
• A financial institution.
• The building in which such an institution is located.
• A synonym for “to rely on”.
• Synonymy
• It is the relation between two lexical items having different
forms but expressing the same or a close meaning. Examples
are ‘author/writer’, ‘fate/destiny’.
8
• Antonyms
• It is the relation between two lexical items having symmetry
between their semantic components relative to an axis. The
scope of antonymy is as follows −
• Application of property or not − Example is ‘life/death’,
‘certitude/incertitude’
• Application of scalable property − Example is ‘rich/poor’,
‘hot/cold’
• Application of a usage − Example is ‘father/son’, ‘moon/sun’.
9
BUILDING BLOCKS OF A SEMANTIC SYSTEM
10
HOW TO REPRESENT
MEANING
• Semantic analysis uses the following
approaches for the representation of
meaning −
• First order predicate logic (FOPL)
• Semantic Nets
• Frames
• Conceptual dependency (CD)
• Rule-based architecture
• Case Grammar
• Conceptual Graphs
11
SYNTAX DRIVEN SEMANTIC
ANALYZER
• Establish analogy with Syntax directed translation
Fig 18.4
12
LEXICAL SEMANTICS
• The first part of semantic analysis, representing the meaning of individual words is called
lexical semantics. It includes words, sub-words, affixes (sub-units), compound words and
phrases also. All the words, sub-words, etc. are collectively called lexical items.
13
Entity and event resolution
14
ENTITY AND EVENT RESOLUTION
16
COREFERENCE RESOLUTION
Coreference resolution is the task
of finding all expressions that
refer to the same entity in a text.
document summarization,
question answering, and 17
information extraction.
Predicate argument structure in NLP?
19
WORD SENSE DISAMBIGUATION (WSD)
• words have different meanings based on the context of its usage in the sentence. Word sense
disambiguation, in natural language processing (NLP), is the process that decides which meaning of word
is activated by the use of word in a particular context.
• Lexical ambiguity, syntactic or semantic, is one of the very first problem that any NLP system faces.
• Part-of-speech (POS) taggers with high level of accuracy can solve Word’s syntactic ambiguity.
• The problem of resolving semantic ambiguity is called WSD (word sense disambiguation). – the task of
selecting the correct sense for a word. Resolving semantic ambiguity is harder than resolving syntactic
ambiguity.
Eg. I can hear bass sound.
He likes to eat grilled bass.
After disambiguation with WSD, the correct meanings are assigned as follows:
I can hear bass/frequency sound. He likes to eat
grilled bass/fish.
20
– Sita has a strong interest in computational linguistics.
– Sita pays a large amount of interest on her credit card.
• Applications of WSD
- question answering
- informational retrieval
- text classification
21
• A WSD system / algorithm takes as input a word in context along with a fixed inventory (storage / dictionary)
of potential word senses; and returns as output the correct word sense for that use.
• The inventory of senses depends on the task at hand –
- For machine translation from English to Spanish,
22
DIFFERENT APPROACHES TO WSD
• Dictionary and Knowledge based methods
These rely primarily on dictionaries, thesauri, and lexical knowledge bases, without using any corpus
evidence.
• Supervised methods
These make use of sense-annotated corpora for training.
• Semi-supervised methods
These methods require very small amount of annotated text and large amount of plain unannotated text.
• Unsupervised methods
This approach works directly from raw unannotated corpora.
23
SUPERVISED WORD SENSE DISAMBIGUATION
• For a lexical sample task, a small pre-selected set of target words is chosen along
with an inventory of senses for each word.
• As set of words and set of senses are small, supervised ML approaches are used
for lexical sample.
• For each word, a number of instances (from the corpus) can be selected and
hand-labeled with the correct sense of the target word in each.
• Classifiers can be trained with the labeled examples. Trained classifiers can be
used to label the unlabeled target words.
• For the all-words task, systems are given entire text and a lexicon with an
inventory of senses for each entry. It is required to disambiguate every content
word in the text.
This task is similar to POS tagging, but the scope is larger……set of tags is larger
24
• Supervised WSD uses data hand-labeled with correct word senses.
• Hence supervised WSD approach extracts features from the text that are helpful
in
predicting particular senses
• Based on these features, a classifier is trained
• This classifier is used to assign senses to unlabeled words in context
• Commonly used classifier is Naïve Bayes
• SemCor – corpus with 234,000 examples manually tagged …..(for all words)
25
MEANING REPRESENTATION IN NLP
26
HOW TO REPRESENT THE MEANING OF A SENTENCE
27
28
VECTOR SEMANTICS AND EMBEDDINGS
29
VECTOR SEMANTICS AND EMBEDDINGS
30
VECTOR SEMANTICS AND EMBEDDINGS
31
• FOLLOWING FIGURE SHOWS A VISUALIZATION OF EMBEDDINGS LEARNED FOR SENTIMENT
ANALYSIS, SHOWING THE LOCATION OF SELECTED WORDS PROJECTED DOWN FROM 60-
DIMENSIONAL SPACE INTO A TWO DIMENSIONAL SPACE. NOTICE THE DISTINCT REGIONS
CONTAINING POSITIVE WORDS, NEGATIVE WORDS, AND NEUTRAL FUNCTION WORDS.
32
33
FEATURE EXTRACTION AND EMBEDDINGS IN NLP
1. ONE-HOT ENCODING:
• FOR BETTER ANALYSIS OF THE TEXT WE WANT TO PROCESS, THERE IS A NUMERICAL
REPRESENTATION OF EACH WORD.
• WE REQUIRE A METHOD THAT CAN CONTROL THE SIZE OF THE WORDS WE REPRESENT.
WE DO THIS BY LIMITING IT TO A FIXED-SIZED VECTOR.
ANALOGIES, ETC
37
WORD2VEC
38
• THESE MODELS WORK USING CONTEXT.
• THIS MEANS THAT THE EMBEDDING IS LEARNED BY LOOKING AT NEARBY
WORDS; IF A GROUP OF WORDS IS ALWAYS FOUND CLOSE TO THE SAME
WORDS, THEY WILL END UP HAVING SIMILAR EMBEDDINGS.
• THUS, COUNTRIES WILL BE CLOSELY RELATED, SO WILL ANIMALS, AND SO ON
40
• TO LABEL HOW WORDS ARE CLOSE TO EACH OTHER, WE FIRST SET A
WINDOW-SIZE.
41
CONSTRUCTING WORD PAIRS
42
TYPES OF ARCHITECTURES
43
• SKIP-GRAM: WORKS WELL WITH A SMALL AMOUNT OF THE TRAINING DATA, REPRESENTS
WELL EVEN RARE WORDS OR PHRASES.
CBOW: SEVERAL TIMES FASTER TO TRAIN THAN THE SKIP-GRAM, SLIGHTLY BETTER ACCURACY
FOR THE FREQUENT WORDS.
44
CBOW
45
CBOW
• AS INDICATED IN THE FIGURE , THE CONTEXT WORDS ARE INITIALLY SUPPLIED AS AN INPUT TO AN
EMBEDDING LAYER.
• THE WORD EMBEDDINGS ARE THEN TRANSFERRED TO A LAMBDA LAYER, WHERE THE WORD
EMBEDDINGS ARE AVERAGED.
• THE EMBEDDINGS ARE THEN PASSED TO A DENSE SOFTMAX LAYER, PREDICTING OUR TARGET
WORD. WE COMPUTE THE LOSS AFTER MATCHING THIS WITH OUR TARGET WORD AND THEN RUN
BACKPROPAGATION WITH EACH EPOCH TO UPDATE THE EMBEDDING LAYER IN THE PROCESS.
• ONCE THE TRAINING IS COMPLETE, WE MAY EXTRACT THE EMBEDDINGS OF THE REQUIRED WORDS
FROM OUR EMBEDDING LAYER.
46
SKIP GRAM
47
• INDIVIDUAL EMBEDDING LAYERS ARE PASSED BOTH THE TARGET AND CONTEXT WORD PAIRS,
YIELDING DENSE WORD EMBEDDINGS FOR EACH OF THESE TWO WORDS.
• THE DOT PRODUCT OF THESE TWO EMBEDDINGS IS COMPUTED USING A 'MERGE LAYER,' AND
THE DOT PRODUCT VALUE IS OBTAINED.
• THE VALUE OF THE DOT PRODUCT IS THEN TRANSMITTED TO A DENSE SIGMOID LAYER, WHICH
OUTPUTS 0 OR 1.
• THE OUTPUT IS COMPARED TO THE ACTUAL VALUE OR LABEL, AND THE LOSS IS CALCULATED,
THEN BACKPROPAGATION IS USED TO UPDATE THE EMBEDDING LAYER AT EACH EPOCH.
48
DISCOURSE PROCESSING
49
50
51
52
COHERENCE
53
COHERENCE
54
55
56
RELATIONSHIP BETWEEN ENTITIES
57
TEXT COHERENCE
58
1. RESULT
61
4. ELABORATION
63
DISCOURSE COHERENCE AND STRUCTURE
64
• FOR EXAMPLE, THE FOLLOWING PASSAGE CAN BE REPRESENTED
AS HIERARCHICAL STRUCTURE −
Rohit
65
66
REFERENCE RESOLUTION
• REFERENT − IT IS THE ENTITY THAT IS REFERRED. FOR EXAMPLE, IN THE LAST GIVEN
EXAMPLE ROHIT IS A REFERENT.
• COREFER − WHEN TWO EXPRESSIONS ARE USED TO REFER TO THE SAME ENTITY, THEY
ARE CALLED COREFERS. FOR EXAMPLE, ROHIT AND HE ARE COREFERS. 68
TERMINOLOGY USED IN REFERENCE RESOLUTION
(CONTD)
• ANTECEDENT − THE TERM HAS THE LICENSE TO USE ANOTHER TERM. FOR
EXAMPLE, ROHIT IS THE ANTECEDENT OF THE REFERENCE HE.
• ANAPHORA & ANAPHORIC − IT MAY BE DEFINED AS THE REFERENCE TO AN ENTITY
THAT HAS BEEN PREVIOUSLY INTRODUCED INTO THE SENTENCE. AND, THE REFERRING
EXPRESSION IS CALLED ANAPHORIC. [SEE NEXT SLIDE]
• DISCOURSE MODEL − THE MODEL THAT CONTAINS THE REPRESENTATIONS OF THE
ENTITIES THAT HAVE BEEN REFERRED TO IN THE DISCOURSE AND THE RELATIONSHIP
THEY ARE ENGAGED IN.
69
ANAPHORA
70
ANAPHORA
71
ANAPHORA
72
• ANAPHORA REFERS TO THE USE OF A PRONOUN TO REFER BACK TO A
PREVIOUSLY MENTIONED NOUN OR NOUN PHRASE. FOR EXAMPLE, IN THE
SENTENCE “JOHN WENT TO THE STORE. HE BOUGHT SOME MILK,” THE
PRONOUN “HE” IS USED ANAPHORICALLY TO REFER BACK TO JOHN.
1. COREFERENCE RESOLUTION
2. PRONOMINAL ANAPHORA RESOLUTION
74
1. COREFERENCE RESOLUTION
• IT IS THE TASK OF FINDING REFERRING EXPRESSIONS IN A TEXT THAT REFER TO THE
SAME ENTITY.
• IT IS THE TASK OF FINDING COREFER EXPRESSIONS.
• A SET OF COREFERRING EXPRESSIONS ARE CALLED COREFERENCE CHAIN.
• FOR EXAMPLE - HE, HIS, SHE, IT - THESE ARE REFERRING EXPRESSIONS IN EXAMPLE
75
EG: COREFERENCE RESOLUTION
76
• CONSTRAINT ON COREFERENCE RESOLUTION
78
79
TYPICAL ALGORITHM
1. A SERIES OF WORDS THAT ARE POTENTIALLY REFERRING TO REAL-
WORLD ENTITIES ARE EXTRACTED. WE CALL THESE
WORDS MENTIONS.
2. FOR EACH MENTION AND EACH PAIR OF MENTIONS, WE COMPUTE A
SET OF FEATURES. THIS IS COMMONLY DONE BY AVERAGING
THE WORD EMBEDDINGS OF THE MENTION AND ITS ADJACENT
WORDS TO CONSIDER CONTEXT INFORMATION.
3. THEN, WE INPUT THESE FEATURES INTO MACHINE LEARNING
MODELS TO FIND THE MOST LIKELY ANTECEDENT FOR EACH
MENTION (IF THERE IS ONE). 80
81
WHY COREFERENCE RESOLUTION?
82
2. PRONOMINAL ANAPHORA RESOLUTION
• UNLIKE THE COREFERENCE RESOLUTION, PRONOMINAL ANAPHORA RESOLUTION
MAY BE DEFINED AS THE TASK OF FINDING THE ANTECEDENT FOR A SINGLE
PRONOUN.
83
END
84