0% found this document useful (0 votes)

3 views

Lecture -7 PPMI

Chapter 6 of 'Speech and Language Processing' discusses vector semantics, defining words by their usage and surrounding context. It introduces embeddings as a way to represent words as vectors in a space, facilitating tasks like sentiment analysis and information retrieval. The chapter also covers methods such as tf-idf and Positive Pointwise Mutual Information (PPMI) for measuring word similarity based on their co-occurrence in contexts.

Uploaded by

Afroze Najam BSCS2021

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Lecture -7 PPMI

Uploaded by

Afroze Najam BSCS2021

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Dan Jurafsky and James Martin

Speech and Language Processing

Chapter 6:
Vector Semantics
Let's define words by their
usages
In particular, words are defined by their
environments (the words around them)

Zellig Harris (1954): If A and B have almost

identical environments we say that they are
synonyms.
What does ong choi mean?
Suppose you see these sentences:
• Ongchoi is delicious sautéed with garlic.
• Ongchoi is superb over rice
• Ongchoi leaves with salty sauces
And you've also seen these:
• …spinach sautéed with garlic over rice
• Chard stems and leaves are delicious
• Collard greens and other salty leafy greens
Conclusion:
◦ Ongchoi is a leafy green like spinach, chard, or collard
greens
Ong choi: Ipomoea aquatica
"Water Spinach"

Yamaguchi, Wikimedia Commons, public domain

Build a new model of meaning
focusing on similarity
Each word = a vector
Similar words are "nearby in space"

not good
bad
to by dislike worst
‘s
that now incredibly bad
are worse
a i you
than with is

very good incredibly good

amazing fantastic
terrific wonderful
nice
good
Define a word as a vector
Called an "embedding" because it's embedded
into a space
The standard way to represent meaning in NLP
Fine-grained model of meaning for similarity
◦ NLP tasks like sentiment analysis
◦ With words, requires same word to be in training and test
◦ With embeddings: ok if similar words occurred!!!
◦ Question answering, conversational agents, etc
2 kinds of embeddings
Tf-idf
◦ A common baseline model
◦ Sparse vectors
◦ Words are represented by a simple function of the counts
of nearby words
Word2vec
◦ Dense vectors
◦ Representation is created by training a classifier to
distinguish nearby and far-away words
Review: words, vectors, and
co-occurrence matrices
Term-document matrix
Each document is represented by a vector of words
Visualizing document vectors
40
Henry V [4,13]
15
battle

10 Julius Caesar [1,7]

5 As You Like It [36,1] Twelfth Night [58,0]

5 10 15 20 25 30 35 40 45 50 55 60
fool
Vectors are the basis of
information retrieval

Vectors are similar for the two comedies

Different than the history

Comedies have more fools and wit and

fewer battles.
Words can be vectors too

battle is "the kind of word that occurs in Julius

Caesar and Henry V"

fool is "the kind of word that occurs in

comedies, especially Twelfth Night"
More common: word-word matrix
(or "term-context matrix")
Two words are similar in meaning if their context vectors
are similar

aardvark computer data pinch result sugar …

apricot 0 0 0 1 0 1
pineapple 0 0 0 1 0 1
digital 0 2 1 0 1 0
information 0 1 6 0 4 0
4
information
3 [6,4]
result

2 digital
[1,1]
1

1 2 3 4 5 6
data
Visualizing cosines
(well, angles)
Dimension 1: ‘large’

2
apricot
1 information

digital
1 2 3 4 5 6 7
Dimension 2: ‘data’
Cosine for computing similarity Sec. 6.3

vi is the count for word v in context i

wi is the count for word w in context i.

Cos(v,w) is the cosine similarity of v and w

Cosine as a similarity metric
-1: vectors point in opposite directions
+1: vectors point in same directions
0: vectors are orthogonal

Frequency is non-negative, so cosine range 0-1

17
large data computer
apricot 1 0 0
åi=1 vi wi
N
v ·w v w digital 0 1 2
cos(v, w) = = · =
v w v w
åi=1 i åi=1 wi2
N 2 N
v information 1 6 1
Which pair of words is more similar?
1+ 0 + 0 1
cosine(apricot,information) = = = .16
1+ 0 + 0 1+ 36 +1 38
cosine(digital,information) = 0+6+2 8
= = .58
0 +1+ 4 1+ 36 +1 38 5
cosine(apricot,digital) =

0+0+0
=0
1+ 0 + 0 0 +1+ 4
18
But raw frequency is a bad
representation
Frequency is clearly useful; if sugar appears a lot
near apricot, that's useful information.
But overly frequent words like the, it, or they are
not very informative about the context
Need a function that resolves this frequency
paradox!
tf-idf: combine two factors
tf: term frequency. frequency count (usually log-transformed):

Idf: inverse document frequency: tf-

Total # of docs in collection

Words like "the" or "good" have very low idf # of docs that have word i

tf-idf value for word t in document d:

A tf-idf weighted term-document matrix for four words in
four Shakespeare plays
Summary: tf-idf
Compare two words using tf-idf cosine to see
if they are similar
Compare two documents
◦ Take the centroid of vectors of all the words in
the document
◦ Centroid document vector is:
Example of Tf*Idf Vector
Represent the word “apple” as vector using following corpus. Use TF.IDF weights.
Assume the window size for word context is 2
Document 1: I like to ride cycle often.
Document 2: Ali and Hassan ate apple and oranges.
Document 3: Ali ate apple not oranges
Example of Tf*Idf Vector
Represent the word “apples” as vector using following corpus. Use TF.IDF weights.
Assume the window size for word context is 2
Document 1: I like to ride cycle often.
Document 2: Ali and [ Hassan ate apple and oranges ].
Document 3: [ Ali ate apple not oranges ].
Context words of “apple”= Hassan, ate, and, oranges, Ali, not

Dimens I Lik to ride cycle ofte Ali and Hassan ate apple oran n
ion e n ges
Raw 0 0 0 0 0 0 1 1 1 2 0 2
Count
TF 0 0 0 0 0 0 1 1 1 1.3 0 1.3
IDF 0.48 0.4 0.4 0.48 0.48 0.48 0.18 0.48 0.48 0.18 0.18 0.18 0
8 8

Tf.IDF 0 0 0 0 0 0 0.18 0.48 0.48 0.23 0 0.23 0

Weight
An alternative to tf-idf
Ask whether a context word is particularly
informative about the target word.
◦ Positive Pointwise Mutual Information (PPMI)

26
Pointwise Mutual Information
whether the probability of x and y occurring together is higher than what we would expect if x and
y were unrelated or independent.

Pointwise mutual information:

Do events x and y co-occur more than if they were independent?

PMI(X,Y ) = log2 P(x,y)

P(x)P(y)

PMI between two words: (Church & Hanks 1989)

Do words x and y co-occur more than if they were independent?

𝑃(𝑤𝑜𝑟𝑑1 , 𝑤𝑜𝑟𝑑2 )
PMI 𝑤𝑜𝑟𝑑1 , 𝑤𝑜𝑟𝑑2 = log 2
𝑃 𝑤𝑜𝑟𝑑1 𝑃(𝑤𝑜𝑟𝑑2 )
Positive Pointwise Mutual Information
◦ PMI ranges from −∞ to + ∞
◦ But the negative values are problematic
◦ Things are co-occurring less than we expect by chance
◦ So we just replace negative PMI values by 0
◦ Positive PMI (PPMI) between word1 and word2:
𝑃(𝑤𝑜𝑟𝑑1 , 𝑤𝑜𝑟𝑑2 )
PPMI 𝑤𝑜𝑟𝑑1 , 𝑤𝑜𝑟𝑑2 = max log 2 ,0
𝑃 𝑤𝑜𝑟𝑑1 𝑃(𝑤𝑜𝑟𝑑2 )
Computing PPMI on a term-context
matrix
Matrix F with W rows (words) and C columns (contexts)
fij is # of times wi occurs in context cj
C W

fij å fij å fij

pij = W C pi* = j=1 p* j = i=1
W C W C
åå fij åå fij åå fij
i=1 j=1 i=1 j=1 i=1 j=1

pij ìï pmi if pmiij > 0

ij
pmiij = log 2 ppmiij = í
pi* p* j ïî 0 otherwise

29
fij
pij = W C
åå fij
i=1 j=1

30
fij
pij = W C
åå fij
i=1 j=1
C W
p(w=information,c=data) = 6/19 = .32 å fij å fij
11/19 = .58 p(w ) = j=1 p(c j ) = i=1
p(w=information) = i
N N
7/19 = .37 p(w,context) p(w)
p(c=data) =
computer data pinch result sugar
apricot 0.00 0.00 0.05 0.00 0.05 0.11
pineapple 0.00 0.00 0.05 0.00 0.05 0.11
digital 0.11 0.05 0.00 0.05 0.00 0.21
information 0.05 0.32 0.00 0.21 0.00 0.58

p(context) 0.16 0.37 0.11 0.26 0.11

31
p(w,context) p(w)
computer data pinch result sugar
p apricot 0.00 0.00 0.05 0.00 0.05 0.11
pmiij = log 2 ij pineapple 0.00 0.00 0.05 0.00 0.05 0.11
pi* p* j
digital 0.11 0.05 0.00 0.05 0.00 0.21
information 0.05 0.32 0.00 0.21 0.00 0.58
p(context) 0.16 0.37 0.11 0.26 0.11

pmi(information,data) = log2 ( .32 / (.37*.58) ) = .58

(.57 using full precision)
PPMI(w,context)
computer data pinch result sugar
apricot - - 2.25 - 2.25
pineapple - - 2.25 - 2.25
digital 1.66 0.00 - 0.00 -
information 0.00 0.57 - 0.47 -

32
Weighting PMI
PMI is biased toward infrequent events
◦ Very rare words have very high PMI values
Two solutions:
◦ Give rare words slightly higher probabilities

35
Weighting PMI: Giving rare
context words slightly higher
probability
Raise the context probabilities to 𝛼 = 0.75:

This helps because 𝑃𝛼 𝑐 > 𝑃 𝑐 for rare c

Consider two events, P(a) = .99 and P(b)=.01
99.75 1.75
𝑃𝛼 𝑎 = = .97 𝑃𝛼 𝑏 = = .03
99.75 +1.75 99.75 +1.75

36
Summary for Part I
• Idea of Embeddings: Represent a word as a
function of its distribution with other words
• Tf-idf
• Cosines
• PPMI

2306 8MA0-01 AS Pure Mathematics - June 2023 (Word)
100% (6)
2306 8MA0-01 AS Pure Mathematics - June 2023 (Word)
10 pages
Aviator Technique's
88% (16)
Aviator Technique's
5 pages
Lecture 3. Vector Semantics
No ratings yet
Lecture 3. Vector Semantics
51 pages
Wordembedding
No ratings yet
Wordembedding
25 pages
Reference Material For NLP - 1
No ratings yet
Reference Material For NLP - 1
40 pages
Unit 2a
No ratings yet
Unit 2a
51 pages
21 Word2Vec 24 09 2024
No ratings yet
21 Word2Vec 24 09 2024
63 pages
Lecture -7 MSDS
No ratings yet
Lecture -7 MSDS
32 pages
Week5
No ratings yet
Week5
26 pages
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
No ratings yet
Frontiers of Computational Journalism - Columbia Journalism School Fall 2012 - Week 3: Document Topic Modeling
48 pages
Ling571 Class14 Distr Thes
No ratings yet
Ling571 Class14 Distr Thes
122 pages
Unit 2: Word Associations and Relation Discovery
No ratings yet
Unit 2: Word Associations and Relation Discovery
32 pages
ML UNIT-II
No ratings yet
ML UNIT-II
27 pages
Vector Semantics
No ratings yet
Vector Semantics
18 pages
Lecture 3. 6 - Vector - Apr18 - 2021
No ratings yet
Lecture 3. 6 - Vector - Apr18 - 2021
106 pages
Lecture12 - Word RepEmb
No ratings yet
Lecture12 - Word RepEmb
28 pages
Vector Semantics
No ratings yet
Vector Semantics
83 pages
Vector Semantics 3
No ratings yet
Vector Semantics 3
5 pages
Vector Semantics and Embedding (part 1)
No ratings yet
Vector Semantics and Embedding (part 1)
66 pages
Qta Lse Day2.PDF
No ratings yet
Qta Lse Day2.PDF
55 pages
Week11 WordEmbedding
No ratings yet
Week11 WordEmbedding
20 pages
Coword analysis
No ratings yet
Coword analysis
7 pages
Vector Semantics 2 Word Embeddings (Vector Semantics)
No ratings yet
Vector Semantics 2 Word Embeddings (Vector Semantics)
5 pages
4.Machine Learning Word Embedding-1
No ratings yet
4.Machine Learning Word Embedding-1
36 pages
Distributional Semantics
No ratings yet
Distributional Semantics
9 pages
Lec 6
No ratings yet
Lec 6
2 pages
Computational Journalism 2016 Week 2: Text Analysis
No ratings yet
Computational Journalism 2016 Week 2: Text Analysis
68 pages
Text Similarity Cosine BOW TF-IDF Lecture
No ratings yet
Text Similarity Cosine BOW TF-IDF Lecture
6 pages
5b. Word Vectors
No ratings yet
5b. Word Vectors
24 pages
An Introduction To Random Indexing: Magnus Sahlgren
No ratings yet
An Introduction To Random Indexing: Magnus Sahlgren
2 pages
Vector Space Model
No ratings yet
Vector Space Model
7 pages
Word Embeddings
No ratings yet
Word Embeddings
59 pages
Lect04
No ratings yet
Lect04
44 pages
Module 5 Document Clustering
No ratings yet
Module 5 Document Clustering
33 pages
Tf-Idf: David Kauchak cs160 Fall 2009
No ratings yet
Tf-Idf: David Kauchak cs160 Fall 2009
51 pages
Word and Document Embeddings
No ratings yet
Word and Document Embeddings
94 pages
Semantic Density Analysis: Comparing Word Meaning Across Time and Phonetic Space
No ratings yet
Semantic Density Analysis: Comparing Word Meaning Across Time and Phonetic Space
8 pages
Word Embedding
No ratings yet
Word Embedding
60 pages
NLP-UNIT-4 (1) (1)
No ratings yet
NLP-UNIT-4 (1) (1)
23 pages
TF Idf
No ratings yet
TF Idf
27 pages
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
No ratings yet
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
18 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
week2and3
No ratings yet
week2and3
76 pages
06 VectorSpaceModel PDF
No ratings yet
06 VectorSpaceModel PDF
75 pages
3 Termweighting
No ratings yet
3 Termweighting
40 pages
4 Word Representation
No ratings yet
4 Word Representation
41 pages
6 Vector Apr18 2021
No ratings yet
6 Vector Apr18 2021
106 pages
3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
Module03 Embeddings
No ratings yet
Module03 Embeddings
102 pages
Widc Tfidf
No ratings yet
Widc Tfidf
20 pages
Module 3 Indexing Part A
No ratings yet
Module 3 Indexing Part A
46 pages
Topic Models Dsi Talk March 2017
No ratings yet
Topic Models Dsi Talk March 2017
24 pages
AI6122 Topic 3.2 - Ranking
No ratings yet
AI6122 Topic 3.2 - Ranking
27 pages
Session 4 Text Feature
No ratings yet
Session 4 Text Feature
40 pages
Lec_3
No ratings yet
Lec_3
51 pages
2 Vector Semantics
No ratings yet
2 Vector Semantics
64 pages
Vector-Based Models of Semantic Composition: Jeff Mitchell and Mirella Lapata
No ratings yet
Vector-Based Models of Semantic Composition: Jeff Mitchell and Mirella Lapata
9 pages
Vector Semantics and Embedding (part 2)
No ratings yet
Vector Semantics and Embedding (part 2)
47 pages
The Vector Space Model of Word Meaning: Informatics 1 CG: Lecture 13
No ratings yet
The Vector Space Model of Word Meaning: Informatics 1 CG: Lecture 13
46 pages
Holy Bible Manifesto Art Book Philosophers La De Expresso Sonata Symphony of Silence
From Everand
Holy Bible Manifesto Art Book Philosophers La De Expresso Sonata Symphony of Silence
Bernard Dortch
No ratings yet
The Truth About the 396-Matrix
From Everand
The Truth About the 396-Matrix
Luc De Smet
No ratings yet
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
LabVIEW Data Types & Conversions Between These Types
100% (4)
LabVIEW Data Types & Conversions Between These Types
23 pages
Normal Distribution and Standard Normal Distribution
No ratings yet
Normal Distribution and Standard Normal Distribution
46 pages
Cs2353 Object Oriented Analysis and Design
No ratings yet
Cs2353 Object Oriented Analysis and Design
4 pages
Chapter 4 - Production Theory
No ratings yet
Chapter 4 - Production Theory
10 pages
OOP MCQs
60% (15)
OOP MCQs
4 pages
Supply Chain Integration in Vendor-Managed Inventory: Yuliang Yao, Philip T. Evers, Martin E. Dresner
No ratings yet
Supply Chain Integration in Vendor-Managed Inventory: Yuliang Yao, Philip T. Evers, Martin E. Dresner
12 pages
ACES Lesson Plan
No ratings yet
ACES Lesson Plan
2 pages
Optical Physics 4ed Edition Lipson A. - Own the ebook now with all fully detailed content
100% (2)
Optical Physics 4ed Edition Lipson A. - Own the ebook now with all fully detailed content
47 pages
1stQ Elective L2 KeyConcepts
No ratings yet
1stQ Elective L2 KeyConcepts
36 pages
07.08.2024_JR ELITE_2022_JEE ADVANCED_PAPER (1)
No ratings yet
07.08.2024_JR ELITE_2022_JEE ADVANCED_PAPER (1)
22 pages
Algebra Assignment 5 - Quadratic 3 - Medium Level
No ratings yet
Algebra Assignment 5 - Quadratic 3 - Medium Level
3 pages
Dividing by 10, 100 and 1000 Activity Sheet
No ratings yet
Dividing by 10, 100 and 1000 Activity Sheet
4 pages
Equations Reducible To Quadratic Equations Exercise 4.2: Solve The Following Equations
No ratings yet
Equations Reducible To Quadratic Equations Exercise 4.2: Solve The Following Equations
23 pages
1D Finite Difference Method
No ratings yet
1D Finite Difference Method
6 pages
Lesson Plan in Mathematics 2-Q4
No ratings yet
Lesson Plan in Mathematics 2-Q4
5 pages
Uniform Circular Motion Full Lab 3
No ratings yet
Uniform Circular Motion Full Lab 3
8 pages
MOde Frontier Tutorial
No ratings yet
MOde Frontier Tutorial
35 pages
Agustina 2013 (Pengaruh Persepsi Pengembangan Karir Dan Job Characteristic Terhadap Turnover Intention Pada Karyawan Terindikasi Hobo Syndrome)
No ratings yet
Agustina 2013 (Pengaruh Persepsi Pengembangan Karir Dan Job Characteristic Terhadap Turnover Intention Pada Karyawan Terindikasi Hobo Syndrome)
1 page
Chapter 6 (8) : Foundations of Risk Analysis
No ratings yet
Chapter 6 (8) : Foundations of Risk Analysis
12 pages
Nit New Test Series NT
No ratings yet
Nit New Test Series NT
7 pages
IAL Bio SB2 Practs CP16 Student
No ratings yet
IAL Bio SB2 Practs CP16 Student
3 pages
FEA of Nonlinear Problems
No ratings yet
FEA of Nonlinear Problems
62 pages
A Review Paper
No ratings yet
A Review Paper
19 pages
Donald T. Campbell - Methods For The Experimenting Society - 1
No ratings yet
Donald T. Campbell - Methods For The Experimenting Society - 1
38 pages
PYTHON Complete Notes With QB
No ratings yet
PYTHON Complete Notes With QB
165 pages
Material Balance John - Mcmullan - Presentation
No ratings yet
Material Balance John - Mcmullan - Presentation
46 pages
Engineering Physics Module - 1 and 2 Chapterwise Qu - 241022 - 113818
No ratings yet
Engineering Physics Module - 1 and 2 Chapterwise Qu - 241022 - 113818
8 pages
Conflict Between NPV and Irr
75% (4)
Conflict Between NPV and Irr
3 pages

Lecture -7 PPMI

Uploaded by

Lecture -7 PPMI

Uploaded by

Dan Jurafsky and James Martin

Speech and Language Processing

Zellig Harris (1954): If A and B have almost

Yamaguchi, Wikimedia Commons, public domain

very good incredibly good

10 Julius Caesar [1,7]

5 As You Like It [36,1] Twelfth Night [58,0]

Vectors are similar for the two comedies

Comedies have more fools and wit and

battle is "the kind of word that occurs in Julius

fool is "the kind of word that occurs in

aardvark computer data pinch result sugar …

vi is the count for word v in context i

Cos(v,w) is the cosine similarity of v and w

Frequency is non-negative, so cosine range 0-1

Idf: inverse document frequency: tf-

tf-idf value for word t in document d:

Tf.IDF 0 0 0 0 0 0 0.18 0.48 0.48 0.23 0 0.23 0

Pointwise mutual information:

PMI(X,Y ) = log2 P(x,y)

PMI between two words: (Church & Hanks 1989)

fij å fij å fij

pij ìï pmi if pmiij > 0

p(context) 0.16 0.37 0.11 0.26 0.11

pmi(information,data) = log2 ( .32 / (.37*.58) ) = .58

This helps because 𝑃𝛼 𝑐 > 𝑃 𝑐 for rare c

You might also like