A Mathematical Model For Universal Semantics
A Mathematical Model For Universal Semantics
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1
Abstract—We characterize the meaning of words with language-independent numerical fingerprints, through a mathematical analysis
of recurring patterns in texts. Approximating texts by Markov processes on a long-range time scale, we are able to extract topics,
discover synonyms, and sketch semantic fields from a particular document of moderate length, without consulting external
knowledge-base or thesaurus. Our Markov semantic model allows us to represent each topical concept by a low-dimensional vector,
interpretable as algebraic invariants in succinct statistical operations on the document, targeting local environments of individual words.
These language-independent semantic representations enable a robot reader to both understand short texts in a given language
(automated question-answering) and match medium-length texts across different languages (automated word translation). Our
semantic fingerprints quantify local meaning of words in 14 representative languages across 5 major language families, suggesting a
universal and cost-effective mechanism by which human languages are processed at the semantic level. Our protocols and source
codes are publicly available on https://github.com/yajun-zhou/linguae-naturalis-principia-mathematica
Index Terms—recurring patterns in texts, semantic model, recurrence time, hitting time, word translation, question answering
F
1 I NTRODUCTION
0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2
HAPPINESS
Lij
Fig. 1. Counting long-range transitions between word patterns. A transition from Wi to Wj counts towards long-range statistics, if the underlined
text fragment in between contains no occurrences of Wi , and lasts strictly longer than the longest word in Wi ∪ Wj . For each long-range transition,
the effective fragment length Lij discounts the length of the longest word in Wi ∪ Wj .
Wi = Jane(∅|’s)
10 AREA FORBIDDEN BY
Wj = than JENSEN’S INEQUALITY
(a)
Chapter
# blocks
d
200
y
un
lit
na
bo
100
ba
’s
0 beaut(iful|ies|y)
hlog Lii i
en
an
0 1 2 3 4 5 6 7 8 9
# than in a block
ns
ni
handsome(∅|ly|r|st)
(b)
so
8
Je
Bourgh(∅|’s)
102 Mary(∅|’s)
is
Wi = Jane(∅|’s) Wj = than Rosings
William(∅|’s)
Po
Counts
Lizzy(∅|’s) Hurst(∅|’s|s)
101 Lucas(∅|’s|es|es’)
Forster(∅|’s|s)
lov(e|e’|ed|ely|es|ing|eliness|e-making|er|ers) Kitty(∅|’s)
happ(ily|iness|y|ier|iest)
100 Charlotte(∅|’s)
Gardiner(∅|’s|s) Fitzwilliam(∅|’s)
0 5 000 10 000 15 000 0 5 000 10 000 15 000 7 Catherine(∅|’s)
Lii Ljj Lydia(∅|’s)
Wickham(∅|’s)
Jane(∅|’s) Collins(∅|’s)
(c) (d) Bennet(∅|’s|s)
Darcy(∅|’s) Bingley(∅|’s|s|s’) danc(e|ed|es|ing)
102
Wi = Jane(∅|’s) Wj = than Eliza(∅|beth|beth’s)
Counts
101
6
100
0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 7 8 9 10 11
log Lii log Ljj loghLii i
(c0 ) (d0 ) (e)
Fig. 2. Statistical analysis of recurrence times and topicality. (a) Barcode representations (adapted from [5, Fig. 2]) for the coverage of Wi =
Jane(∅|’s) (291 occurrences) and Wj = than (282 occurrences) in the whole text of Pride and Prejudice. Horizontal axis scales linearly with
respect to the text length measured in the number of constituting letters, spaces and punctuation marks. (b) Counts of the word than within a
consecutive block of 1217 words (spanning about 1% of the entire text), drawn from 1000 randomly chosen blocks, fitted to a Poisson distribution
with mean 2.776 (blue curve). (c) Histogram of effective fragment length Lii (see Fig. 1 for its definition) for the topical pattern Wi = Jane(∅|’s),
fitted to an exponential distribution (blue line in the semi-log plot) and a weighted mixture of two exponential distributions c1 k1 e−k1 t + c2 k2 e−k2 t
(red curve, with c1 : c2 ≈ 1 : 3, k1 : k2 ≈ 1 : 7). (d) Histogram of Ljj for the function word Wj = than, fitted to an exponential distribution (blue line
in the semi-log plot). All the parameter estimators in panels b–d are based on maximum likelihood. (c0 )–(d0 ) Reinterpretations of panels c–d, with
logarithmic binning on the horizontal axes, to give fuller coverage of the dynamic ranges for the statistics. (e) Recurrence statistics for word patterns
in Jane Austen’s Pride and Prejudice, where h· · · i denotes averages over nii samples of long-range transitions. Data points in gray, green and red
have radii 4√1n . Labels for proper names and some literary motifs are attached next to the corresponding colored dots. Jensen’s bound (green
ii
dashed line) has unit slope and zero intercept. Exponentially distributed recurrence statistics reside on the line of Poissonian banality (blue line),
with unit slope and negative intercept. Red (resp. green) dots mark significant downward (resp. upward) departure from the blue line.
of the effective fragment lengths Lii (Figs. 1, 2a). Here, while with probability 95% (see Theorem 1 in Appendix
PnA for1 a
counting as in Fig. 1, we ignore contacts between short- two-sigma rule). Here, γ0 := limn→∞ − log n + m=1 m
range neighbors, which may involve language-dependent is the Euler–Mascheroni constant.
redundancies.2 As a working definition, we consider a word pattern Wi
non-topical if its nii counts of effective fragment lengths Lii
2.1.1 Recurrence of non-topical patterns
are exponentially distributed P(Lii > t) ∼ e−kt , within 95%
In a memoryless (hence banal) Poisson process (Fig. 2b), margins of error [that is, satisfying (1) above].
recurrence times are exponentially distributed (Fig. 2d,d0 ).
The same is also true for word recurrence in a randomly
reshuffled text [5]. If we have nii independent samples of
exponentially distributed random variables Lii , then the 2.1.2 Recurrence of topical patterns
statistic δi := loghLii i − hlog Lii i − γ0 + 2n1ii satisfies an
inequality In contrast, we consider a word pattern Wi topical if its
s diagonal statistics nii , Lii constitute significant departure
2 π2 1 from the Poissonian line hlog Lii i−loghLii i+γ0 = 0 (Fig. 2e,
|δi | < √ −1− (1)
nii 6 2nii blue line), violating the bound in (1).
Notably, most data points for topics (colored dots on
2. For example, a German phrase liebe Studentinnen und Studenten Fig. 2e) in Jane Austen’s Pride and Prejudice mark system-
with short-range recurrence is the gender-inclusive equivalent of the
English expression dear students. Some Austronesian languages (such atic downward departures from the Poissonian line. This
as Malay and Hawaiian) use reduplication for plurality or emphasis. suggests that the topical recurrence times τ = Lii follow
0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 3
Word Counts
felt 100 Word Counts Word Counts
feelings 86 word count censorship felt 100 alphabetic sorting feel 39
feel 39 ============⇒ feelings 86 ==========⇒ feeling 38
feeling 38 feel 39 feelings 86
feels 4 feeling 38 felt 100
feelingly 1
F EE L
FEEL 39 39
INGS
FEELING 38 38
===========⇒FEELINGS 86 =vertical
sequence alignment merger 86
(a)
MR
Wjfr
DEMOISELLES
RNELLE
NETHERFIELD
FITZWILLIAM
ELIZABETH MR ELIZABETH
DERBYSHIRE
ELIZABETH
CATHERINE
CHARLOTTE
PEMBERLEY
S
GARDINER
URE
RS
E
BRIGHTON
AID DARCY DI DARCY
’S
BINGLEY
WICKHAM
COLLINS
EZ
T
COUSINE S
US
COLONEL
MERYTON
WILLIAM
ROSINGS
FORSTER
SEMAINE
PHILIPS
RE
T
SAIT
BENNET
IE
LETTRE
RE
E
VI
R S
BOURGH
X
SAIT
AY
SEE KNOW
ES
DARCY
LYDIA
CHÈRE
AMAN
FRÈRE
LIZZY
TANTE
T
LUCAS
ONCLE
DANSE
EÉ
KITTY
NIÈCE
HURST
NEVEU
’S
MRS
ING
E
VOIR
AW IT
SAGE
ES
RE
JANE
MIS S
PÈRE
MÈ RE
AT
ÉCRI
MARY
ING
DIT
MRS
SIR
N
ING
Wien
N
UE
MAKE D
BENNET
MR
OUGHT
ING
BENNET
S ING
BETTER
JANE
S
GI
OOD
MISS MR
DEMOISELLES
BINGLEYGO LADY
’S
ES
ELIZABETH
BONNE
SŒUR ALL
AIT
ER
’S
MIEUX SAID
JANE
S
Z
WICKHAM
’S
ONS
SHIP
S VAIS
SISTER
AY
FILLE
ING
ING
NT
E
S PARTI DARCY
MISS
’S
S WE E
’S
MRS
F L SAV
’S
EE EN IS
S
A AIT
INGS
AN
I
T
DRE E
ÉE
T
AIT S
BENNET
EZ
LADY
EZ
E T
E
E OIR
S
S R
COME
N T Z
JEUNES
D MES
U
A
I G
TIME GREAT
’S
BINGLEY
MARRI DEUX SENTI
ING
S
’S
ES
R
LONG
AGE
S LY SE
AIT
IT
S
E TH
ED
1
N
AMI
É
ER
Y
VEN JANE
WICKHAM
R
T
ER AIT
ABILITÉ
ING
IR
EZ
LITTLE TAKE
E
COLLINS
’S
US
WICKHAM
IN
EN
T
T
S
A
TIÉ
IMABLE
I G
OO
WISH FRIEND
’S
HEARD
’S
COLLINS
IT
S
E
LYDIA
ES MOUR
É
COLLINS
D
ING S E UX ’S
DEAR
YOUNG
ING HIP
L
DRAIS EST
0.8
AIT
LYDIA
EZ
LOOKED
E ’S AIT
AIT
NESS
ST
EST
E
É ’S
E
ER
Y MOTHER
PARLERRE MARIAGE
É
ER
LYDIA
Z S
ING
PASS
i ,bj )
S EST
A AIT ’S
HOPE DAY
N
FAMILY FATHER
GE E
É
E É
ÉE
ER
ES
ER
fr
ION Z ÉS
RA
EIT
’S
SP K LIKE I
’S
BELIEVE LETTER
D
EA
S S
RE
F
D A
O E
IT S
LY D
TOLD CATHERINE
N
E
E L E
N NÉE
F ND
ING
LEAV FIRST
S
S
E
CERTA IN LAISSE 0.6
S
E
È R ENT
A
OU EMENT
T
ING I
ING ’S
EFT
N
MOTHER GARDINER
E
É
LETTRE
R
sR (ben
S
ING
ENTENDRE
E
’S
S
LIZZY
CERTAIN Y FATHER
D AIT
DAUGHTER
S
IT
LY
ING T ’S
PÈRE
É IE S U
L
CHOSE FAMILLE TEMPS WRITE
LETTER WALK
S
’S
T ’S ING
TEN
ESPÈRE NOUVELLE
S
AUNT
AMAN
ED
É ANCES AU
IT
CATHERINE CATHERINE
É R
ING
0.4
Y S OIR S ’S
REALLY RECEIVE D
HOMME FEMME CHARLOTTE ’S
ROOM CONNAISSA
’S
HOUSE TALK
PT ON
S
WAY
S
IT CE
ED NCE ATION
ES
ÎTRE
E
EZ
É
’
BROTHER
ING ER
HEUREUX
E
GARDINER
MENT
’S
ED
S
ATTENTION ANSWER
I NG R
BELLEAU
DÉSIR
GARDINER
S S ED TÉ
DANCE
VE X
SOIRÉECAUS 0.2
AIT
NATUREL
E
APPEAR
S
KIND
R
AIT ÎT D
E
S
ANCE ING
UNCLE
É
ED LEMENT
VISITE
’S NESS ER
LY
A
ING
ISSAIT
ED ATION
BLE
ÎTRE
ION
LIZZY
NG OISS
ED ’S
LONDRES
R
ÉE
KITTY
T
GENTLEMAN REÇ
S ING S
PREMIÈRE
S
URS
CEVOIR
ME T
E
EE
É PT ONS
U
ING
WR ITE
LS
IKE
E
’
ATTENDAIT
S
E S
IT
BLES ENT
S
OH
E
SUBJECT INDEED REVEN R
RE
REGARDA
N
ING
LONGBOURN
TEN
S MBLABLE
AI ENT
T
O E
CRIED COUSIN
CHARLOTTE
U IT
INT
ENDRAIT N
IAL
ER
ANCE
RENT
ITY S
S
ES ’S
S
VISIT MERYTON
D
PART MONDE
ŒIL
BEGAN
LIZZY
ORS
ING
FRÈRE
ED
I
SUPPOSE PEMBERLEY
S
ACQUAINTANCE OBJECT
ING D NING
QUITTER
S S S
AT RNELLE
A NT
É
E
D S
IONS
ASKED RELATIONS
ED
AFFECTION
ED
S
COUR ROSINGS
ED
HEURE
NG
CHARLOTTE
ING HIP
ANT
IR
RIER
S S
TURNED
CTION
CALLED WEEK
I T
ITE
INVITATION
RE N
S ER
LUCAS
’S S
MAISON INTÉRÊT
ING ING
GIRLS
ESSA
AUNT
I
N
ER
T
WILLIAM
ÉCRI
S
ÉE
SENSI
E ER
S
LUCAS
E
I
ENTION
VI
UR
RE
ÉS
COUSINE
BLE
ES
CT
FORSTER
ION E
IE D
ORY
MENT VE
URE
Z
RENCONTRE
S S
MORNING
’S
PRESENT PARTICULAR
ING Y S S
CE
MARY
LY É
ER
VOYA
MENT
IMP
È ENT
T
ES
POSSIBLE WORDS
ENDRE
ILITY ANCE I
N
T
E
RESSION
MANIÈRES CRAINTE
SIR QUESTION EXPRIMER BOURGH
S
EZ
ONS
’S Y S AIT
É
N
GNAIT
DRE
S
EZ
ONCLE
E
DANC
AIR
’S
CHARMA PORTE
ED
DE
E
NT
DANS
AIT
ABLE FUL E
E
S S LY S
ÉE
E
S ER
IÈRE
NETHERFIELD E FITZWILLIAM
D É
SI
A
T ING
S
R
S
US ETEN DÉCIDÉ
’S
RS
HURST
DRESSE
E IR
CONTINUED
EZ
E
BONHEUR
ER
ILY U
ALLY
FORT
E
Y
POSSIBLE MIT IDÉE
UNCLE
É D
AIOT
SE
RENT
È E
ESSI N
SOURIRIESSA MAL
E
PROU
IDE KITTY CHARACTER SMILE
ANT
S A
I
R
NT
I
’S LY
I
E
E
RENT
’S
T
VÉRITABLE NIECE
S
REPRIT
T ADE RE
RECONNA
UX
ENDREA T
NCE SES
É
MENT
T
ÎTRE
NETHERFIELD
D
S
’S
UT
OCCASION
E
NEPHEW
S
ATION
ING
ED
COLONELMATIN
ING
ED ALLY S
FOLLOW PERFECTLY
OICE
INSTANT PETITE
’S
ED
S
DERBYSHIRE
COLONEL
S
O ING
S
ED
N YING S
CR
S
OY
S
RE
AIT
EZ
HOME MENTIONED ING UNDERSTAND OOD
ING ÉE ÉE U AUTÉ
IÈME
BRIGHTON
(b) (b ) 0
(c)
Fig. 3. Automated topic extraction and raw alignment across bilingual corpora. (a) Schematic diagram illustrating our graphical representation
of morphologically related words (identified by supervised algorithms in Supplementary Materials) in a word pattern. To avoid unprintably small
characters, rarely occurring forms (less than 5% of the total sum of all the words ranked above) are ignored in graphical display. To enhance the
visibility of word stems, we print shared letters only once, and compress other letters vertically, with heights proportional to their corresponding
word counts. (b) Word patterns Wi in Jane Austen’s Pride and Prejudice, sorted by descending nii , with font size proportional to the square
root of e−hlog Lii i (a better indicator of reader’s impression than the number of recurrences nii ∝ e− loghLii i ). Topical (that is, significantly non-
Poissonian) patterns painted in red (resp. green) reside below (resp. above) the critical line of Poissonian banality (blue line in Fig. 2e), where
the deviations exceed the error margin prescribed in (1) of §2.1. (b0 ) A similar service on a French version of Pride and Prejudice (tr. Valentine
Leconte & Charlotte Pressoir). (c) A low-cost and low-yield word translation, based on chapter-wise word counts ben i and bj . Ružička similarities
fr
sR (bi , bj ) between selected topics (sorted by descending nii ≥ 20) in English and French versions of Pride and Prejudice. Rows and columns
en fr
with maximal sR (ben i , bj ) less than 0.7 are not shown. Correct matchings are indicated by green cross-hairs.
fr
weighted mixtures of exponential distributions (Fig. 2c,c0 ): between two vectors with non-negative entries, allow us to
X align some topics found in parallel versions of the same doc-
P(τ > t) ∼ cm e−km t , (2) ument, in languages A and B (Fig. 3c). Here, in the definition
m
P of the Ružička similarity, ∧ (resp. ∨) denotes component-
(where cm , km > 0, and m cm = 1), which impose an in- wise minimum (resp. maximum) of vectors; kbk1 sums over
equality constraint on the recurrence time τ = Lii : all the components in b.
hlog Lii i − loghLii i + γ0
X 1 X cm 2.2 Markov text model
= cm log − log ≤ 0. (3)
m
km m
km 2.2.1 Transition probabilities via pattern analysis
The diagonal statistics nii , Lii (Fig. 1) have enabled us to
2.1.3 Raw alignment of topical patterns
extract topics automatically through recurrence time anal-
If a word pattern Wi qualifies as a topic by our definition ysis (Figs. 2e and 3b,b0 ). The off-diagonal statistics nij , Lij
(Fig. 3b,b0 ), then the signals in its coarse-grained timecourse (Fig. 1) will allow us to determine how strongly one word
(say, a vector bi = (bi,1 , . . . , bi,61 ) representing word counts pattern Wi binds to another word pattern Wj , through hit-
in each chapter of Pride and Prejudice) are not overwhelmed ting time analysis. In an empirical Markov matrix P = (pij ),
by Poisson noise. the long-range transition rate pij is estimated by
This vectorization scheme, together with the Ružička
similarity [6] nij e−hlog Lij i
pij := N
, (5)
X
kbA B
i ∧ bj k1 nik e −hlog Lik i
sR (bA B
i , bj ) := (4)
kbi ∨ bB
A
j k1 k=1
0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 4
(n)
Vector components
π π π π
becomes closer as we go to higher iterates P = n
(pij ),
0.04 π∗ π∗ π∗ π∗ where n is a small positive integer (Fig. 4b).
On an ergodic Markov chain with detailed balance, one
0.02
can show that recurrence times are distributed as weighted
mixtures of exponential decays (see Theorem 3 in Ap-
0
pendix B.3), thus offering a theoretical explanation for (2).
0 50 100 0 50 100 0 50 100 0 50 100
word count rank word count rank word count rank word count rank
(a) 2.2.3 Spectral invariance under translation
10−1
The spectrum σ(P) (collection of eigenvalues) is approxi-
10−3 mately invariant against translations of texts (Fig. 4c), which
rn
0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 5
DARCY MISS
’S SE
E
REFLECTION
F LING M AN
W4 W2
’S S
W3 W2 EE
LOOKED MISS
ING
N
’S
S
BINGLEY
’S
BINGLEY E T
LOOKED W1 WICKHAM Danish
’S
COUNTRY E
ROOM
’S
PEMBERLEY
CATHERINE
SPEAK PLEAS
ING
WALK
ING ANT
’S
ED
CIVILITY PEOPLE
MEAN CHOOSE PR DE
ING
O E
I
OICE
ED
PLEASE ROOM
S
W4
ING
ING
HANDSOME PLAY ING
GENTLEMAN
S
ASK
O
German
T
EVIDENT OU
NETHERFIELDFORTUNE
N
E
ED POWER A NT
ATE
AMIABLE
DELIGHT
PURPOSE ING
GENTLEM N
D
A
SIT PEOPLE
ED IN G
A DIFFERENT TURNED INTEREST
FUL
TEMPER
E
FACE CARRIAGE
EYESAGREEABL E ED
TING ING
AUNT VAI
ANITY PAIN HANDSOME
ING
EVENING
FUL
PEMBERLEY
CAUSE SORTSOCIETY
Dutch
LAUGH DOOR ANXI APPLI
BROTHER DISLIKE
CDATION
ETY E
OUS
CIVILITY PR OUD
ED
I E
ING
MASTER
’S
DISLIKE INTRODUCE
S
2 MARY RUN
I RESS A TION
S
NING
WINDOW DISLIKE FANCY VA
AI
NITY DERBYSHIRE PHILLIPS
W4 W1 Spanish
W3 French
0 Latin
5.5 6.0 6.5 7.0 7.5 6.5 7.0 7.5 8.0 8.5 8.5 9.0 9.5 10.0 Polish
` ` ` Russian
W1 = Eliza(∅|beth|beth’s), W2 = Darcy(∅|’s), W3 = pr(ide|ided|oud|oudly|oudest), W4 = Jane(∅|’s) Koreanic
Korean
(a)
Turkic
ELIZABETH ELIZABETH
CHÂTEAU PARCPHILIPS
HIERMOTIF
Turkish
FIERTÉ
E
MR
È E
S
PROPRIÉTÉ
AIRE
ORGUEIL VOLONT É
IERS
FORTUNE
DARCY MISS
40 DARCYBINGLEY
SYMPATHIE
MISS
QUE
CARACTÈRE
BINGLEY
Cumul. counts
NETHERFIELD
VOITUREYEUX REGARD
OBSERVA
AIR
ER
TABLE
FRANCHISE
ŒIL
SOURIRE
ARRÊT
INSTANT TANTE
A
ROUTE
T
T
A IT
A
ER
AMI A
PRÉSENŒ
SOURI
T
A
E
S
SENTIMENTS SOIRÉEABAL
IL
CONNAISS
REGARDA
T
CE
ORGUEIL
ER
IMABLE
MOUR
ÎTRE
NCE
S
LONGUE
CAUS
SENT IMENTS
AIT
E
É
ER
Uralic
DARCY
RE
YEUX FRÈRE
CHÂTEAU T T
ROSINGSPOLITESSE
S
PERSONNES PEMBERLEY
AIRINSTANT
Finnish
MOT
GEORGIANA
PEMBERLEYSALON TÉMOIGNA IT
CHANGEMENT
S
DISTANCE
EMBARRAS
RI RE
CHÂTEAU
FIERTÉSYMPATHIE CARACTÈRE BAL
ATTACH
EMENT
VEILLE PEMBERLEY POLITEDSSE DEVENIR
JOUER DISTANCE A E PROMEN
ЭЛИЗАБЕТ
ATTACHEMENT CHEMIN GUÈRE
ДАРСИ
NEVEU MARCHE HURST
ЭЛИЗАБЕТ Hungarian
НЕЛОВКО
ДОСАД ПАРК А
ПОХВАЛ
20 МИСТЕР
ПРОВОДИТ
ДАРСИ
ПОМОАЛ
Ь
БЕСПОКОЙСТВО Г Л
ПРЕДЕ
ДАРСИ
А
ПЕМБЕРЛИ
А
ELIZABETH СКАЗАЛ ДРУ УИКХЕМ
ГОВОРИЛ
Г
ELIZABETH Vasconic
ТЬ
БИНГЛИМИСССЕСТР
Я
А А А
ОМ
У ЕР
HRA
ЕЙ ОМ
Е ТЬ
DARCY
Я
Ы IN
А
У
АХ
DARCY PUH
ERRA
TUNT
СИДЕ УЛЫБК ЕЕ
МЫСЛ
ОЙ NEN М
О ASUM
Л
И SI
ОБЪЯСН
ПОСТУПИ SUVAITSIT
У I NNU
MENETTELY AN
УДИВЛЕНЕ N
ПОСТУПИ Е
ПОДНЯЛ EI
A
Л
Ь
ВСТРЕЧ
И A
О HELLÄ SYVÄ
DARCY
SIEVÄN
ГОСТИН
YYT
ТИ U Л
И
Л
ДРУ
ЕНИЯ
КОМНАТУПРЕДЛОЖ PUHU
STI
N
ЕЙ Я
Г
Ä
ТЬСЯ
ИТЬ
ЕНИЕ
E ГОРДОСТЬ
Ä
EL
E К
РАССКАЗАЛ TUNNUSTA
А E
ПОВЕДЕНИ
УЮ
L ОК A
ПРОВЕ
И
ОБЪЯСНИ
L
A И
Л А ПРЕДПОЛОЖИ
AJ
ЫПУТЕШЕСТВИ ЕНИЕ A
L
О
ЛЮБЕЗНОСТ
A
IK A A U А
ПЕШ
AJA NETHERFIELDI
A
СТИ ЕН И З
NT ITI
ПОС IKN
AA ТЬ
УЛЫБК ВЫЗ
ВА НРАВИЛ
ХАРАКТЕР
N
A
ТЬ И ЫВА
ПРОВЕЛ
Basque
E
BINGLEY
N
НРАВИЛ
Л S Ä
СМОТРЕТЬВЫ
ТСЯ
ГАРДИНЕР ПО TELLA
YLPEÄ
ОЙ Л
YSTÄVÄN
ТЬ
TELLA
ПО
ТСС
ЯЬ
Y S
МОЛЧАНИ BINGLEY
У
СТИ
ЛЮБЕЗНОСТ ПЛЕМЯННИЦ
А
МОЛЧАНИ
N KOHTELIA
N СЯ
ДОВОЛЬНО
К A
ГОРДОСТ МАНЕРЫ
Е
ГЛЯДЕТ
Е Л SÄ
ТЕТ А KATSE
Е
MIE
БЕДН ТЯЖЕЛО
N
НЕДОСТА
А
TOSI
Ы TEL
ОКН
N
СМУЩЕН ПРИВЕТ В ТК
A IO O MA
ДЕРБИШИРЕ
SILMÄ ИЗУМЛЕНИЕ ПЕМБЕРЛИ AJATUS
HEN
I
МУЗЫК ЛИ
KSE TULIJA АЯ
I ARVO
N
N
SSÄ СТ И A ОЙ DEKSI
БЕСПОКОЙСТВО
OMA T
БАЛ
N
ТЩЕСЛАВИЕ
ХВАТИ KOHTELIAISUU KAUNIAS HÄMMÄSTY E
НЕПРИЯЗНЬ
A TI
GARDINER I А
У
SÄ N
ILMAIS
S
I
ГЛУБОКЕН ПРЕНЕБРЕ TTA
TA А
О
ЛО
HYMYIL KUVA
KUVA AJATU HENKILÖ
ПОМЕСТЬЕЕ
У
E ДЕРБИШИРЕНЕПРИЯЗНЬ
E ЧЬ
ИСПОЛНИ
N
LEN
MUIST N
ДЕРБИШИРЕТЩЕСЛАВ
WICKHAMI
VASTENMIELIS
S
KOHTELIAISUU
A TI
СЫН ОТЗЫВ ПОЗНАКОМИТЬ
TUTTAVUUT I
А
И
Я
IHASTU
TÄ
ERO IL
YLPEÄ
MIEL
ДЖОРДЖИАН VAIKUT
HYMYIL
E
ON
Y
N
Y
U
SÄVYYN
L TTÄVÄ
TÄ
SISÄLL
N
LEN SISARENPOIKA
SÄVYYN ISTUI
S SÄ Y
PARI
KEHOIT KYSYMYKS
A E N
ÄN
KAUNIS
KIUSA
MATKAA TARK
JOUT
MAHDOTONTA HIENO ONNET ON
A
IT
TÄTI SALATA HELLÄ IN
VAUNU J
HUONEES EA
EN
KÄYTTÄYTYM S
N
SSA
#Topics:
VOITT A
INEN
SALI IN
S A
PEMBERLEY
ASETTUI VASTENMIELIS E
KIUSAU KÄRSI M YY
YLISTY
0 20 40 60 80 100120
MAHTAVA Ä
ELY
KEHOIT
T
S
0
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8
[i] [i] [i]
correct close incorrect
− log |λ(R )| − log |λ(R )| − log |λ(R )|
(b) (c)
Fig. 5. Semantic cliques and their applications to word translation. (a) Empirical distributions of hlog Lij i in Pride and Prejudice, as gray and colored
dots with radii 4√1n , compared to Gaussian model αij (`) (colored curves parametrized by (7) and (8)). The numerical samplings of Wj ’s exhaust
ij
all the textual patterns available in the novel, including topical word patterns, non-topical word patterns and function words. Only those textual
patterns with over 40 occurrences are displayed as data points. Inset of each frame shows the semantic clique Si surrounding topic Wi (painted in
black ), color-coded by the αij (hlog Lij i) score. The areas of the bounding boxes for individual word patterns are proportional to the components of
π [i] (the equilibrium state of P[i] ). (b) Distributions for the magnitudes of eigenvalues (LISF) in the recurrence matrices R[i] , for three concepts from
four versions of Pride and Prejudice. The color encoding for languages follows Fig. 4. The largest beηi c magnitudes of eigenvalues are displayed
as solid lines, while the remaining terms are shown in dashed lines. Inset of each frame shows the semantic clique Si , counterclockwise from
top-left, in French, Russian and Finnish. (c) Yields from bipartite matching of LISF (see Fig. 6 for English-French) for topical words between the
English original of Pride and Prejudice and its translations into 13 languages out of 5 language families.
P(hlog Lij i > `) by a Gaussian model αij (`) (colored curves We invite a topical pattern Wj to the semantic
in Fig. 5a) clique Si (Figs. 5a and b, insets) surrounding Wi , if
r Z ∞ n (x−` )2 min{αij (hlog Lij i), αji (hlog Lji i)}R > α∗ for a standard
nij − ij 2β i 1 2
P(hlog Lij i > `) ≈ αij (`) := e i d x, Gaussian threshold α∗ := √12π −∞ e−x /2 d x ≈ 0.8413.
2πβi `
This operation emulates the brainstorming procedure of a
(7)
human reader, who associates one word with another only
whose mean and variance are deducible from nij and Lii when they stay much closer than two randomly picked
(see Theorem 4 in Appendix B.4): words, according to his/her impression.
hLii log Lii i hLii (`i − log Lii )2 i Indeed, by numerical brainstorming from Wi , our se-
`i := − 1, βi := . (8) mantic cliques Si (Figs. 5a and b, insets) inform us about
hLii i hLii i
their center word Wi , through several types of semantic
The parameters in the Gaussian model are justified by the relations, including, but not limited to
relation between hitting and recurrence times [8] on an
ergodic Markov chain with detailed balance, and become • Synonyms (pride and vanity in English, orgeuil and
asymptotically exact if distinct word patterns are statisti- fierté in French, etc.);
cally independent (such as α13 , α24 , α31 , α34 in Fig. 5a). • Temperaments (Elizabeth, a delightful girl, often
Here, statistical independence justifies additivity of vari- laughs, corresponding to French verbs sourire and
√
ances, hence the nij factor in (7); sums of independent rire);
samples of log Lij become asymptotically Gaussian, thanks • Co-references (e.g. Darcy as a personification of
to the central limit theorem. Failing that, the actual ranking pride);
of hlog Lij i may deviate from the Gaussian model prediction • Causalities (such as pride based on fortune).
in (7), such as the intimately related pairs of words Eliz-
abeth/Darcy, Elizabeth/Jane, Darcy/Elizabeth, Darcy/pride and On a local graph with vertices Si = {Wi1 = Wi , Wi2 ,
pride/Darcy. . . . , WiNi }, we specify the connectivity of each directed
[i]
edge by a localized Markov matrix P[i] = (pjk )1≤j,k≤Ni .
2.3.3 Markov criteria for semantic cliques This localized Markov matrix is the row-wise normaliza-
Empirically, we find that higher αij (`) scores point to closer tion of an Ni × Ni subblock of P with the same set of
[i] [i]
affinities between word patterns (Fig. 5a), attributable to vertices as Si . Resetting the entries p1k and pj1 as zero,
kinship (Elizabeth, Jane), courtship (Darcy, Elizabeth), disposi- one arrives at the localized recurrence matrix R[i] . We call
tion (Darcy, pride) and so on. Our robot reader automatically R[i] a recurrence matrix, because one can use it to compute
detects such affinities, without references other than the the distribution for recurrence times to the Markov state
novel itself. Therefore, we can use the αij (`) scores as guides Wi in Si . As we will see soon in the applications below,
to numerical approximations of semantic fields, hereafter the eigenvalues of R[i] , when properly arranged, become
referred to as semantic cliques. language-independent semantic fingerprints.
0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 6
TIONS
ILITÉ
MENTS
Wjfr
DEMOISELLES
RNELLE
NETHERFIELD
FITZWILLIAM
SENTIMENTS
INVITATION
IMPOSSIBLE
IT
DERBYSHIRE
ELIZABETH
CATHERINE
URS
CHARLOTTE
PEMBERLEY
NCE
I
N
ER
POLITESSE
OFFICIERS
N
RE
S
SE
GARDINER
URE
ANT
IT
S
FÉLICITA É
E
ALES
NT
REMERCIE
A
ERRAI
PROMESSE TTA
TÉ
RÉGIMENT
DISTANCE
BRIGHTON
BINGLEY
WICKHAM
A IT
COLLINS
MARIAGE
S
R
S
LONDRES
ÉE
ER
ÉS
EZ
T
COUSINE S
COLONEL
MERYTON
VOITURE
ER Z
ENFANTS
CE
WILLIAM
ROSINGS
R
FORTUNE
SOCIÉTÉ
FORSTER
SEMAINE
ENVOYER E
PHILIPS
ABSENCE TE
T
BENNET
JEUNES
A ME
R
T
ÉE
ER ÉS
IE
LETTRE
SOIRÉE
VISITE
RE
E
VI
NÉES
IGNORAE
S
É
IE
A
BOURGH
PAUVRE
MOTIFS
s(Wien ,Wjfr )
DARCY
BONNE
MIEUX
MES
LYDIA
CHÈRE
AMAN
FRÈRE
LIZZY
TANTE
T
LUCAS
ONCLE
ÉCRIA
DÎNER
KITTY
S
MILLE
REFUS
NIÈCE
ANT
TABLE
NEVEU
JANE
MISS
LADY
DEUX
PÈRE
MÈRE
YEUX
AT
ÉCRI
MARI
MARY
RIRE
EZ
PARC
MRS
Œ IL
ANS
BAL
SIR
DIX
Wien 1
MR
M
MR
ELIZABETH ’S
DARCY
MRS
’S
0.8
BENNET ’S
BETTER S
GOOD
BINGLEY
0.6
’S
LADY
IE S
SHIP
MISS
JANE ’S
FEELING
E T
S
MARRI Y
AGE
ED
ING
0.4
WICKHAM ’S
COLLINS ’S
YOUNG ER
ST
DEAR
LYDIA
EST
’S
0.2
DAY S
MOTHER ’S
FATHER
0
’S
TWO
LETTER S
CATHERINE
Fig. 6. Automated alignments
’S
GARDINER ’S
S
LIZZY
WRITE
of vectorized topics via bipartite
ING
TEN
O E
CRIED
VISIT
matching of semantic similarities.
ORS
ING
AUNT ’S
CHARLOTTE
The semantic similarities
’S
LUCAS ES
’
BROTHER ’S
EVENING
SIR
S
COLONEL
TOWN
YEARS
COUSIN ’S
S Rows and columns filled with
FORTUNE
ATE
MERYTON
INVITATION
zeros are not shown. Cross-hairs
meet at optimal nodes that solve
ED
ING
LONDON
THANK ED
FUL
ING
HUSBAND
PEMBERLEY
’S
S
SEND
T
WILLIAM
inversely proportional to the row-
’S
PROMISED S
ING
IMPOSSIBLE
LAUGH
BALL
ED
ING
S
wise (resp. column-wise) ranking of
POOR
FORSTER
CHILDREN
LY
DE
FITZWILLIAM ’S
DINNER
TEN
S
DINE ING
D
NIECE
better recall than Fig. 3c, without
S
’S
TABLE S
ABSENCE T
REFUSE
much cost of precision.
AL
D
ING
POLITENESS LY
MOTIVE S
NEPHEW ’S
S
DERBYSHIRE
BRIGHTON
CONGRATULATEIONS
D
IGNORANT
CE
REGIMENT ’S
PARK
DISTANCE T
0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 7
WikiQA-Q26: How FRANK D (1) Anne Frank and her sister, Margot , were eventually transferred to the Bergen- MAP
0.6190 CNN
MRR
0.6281
did Anne Frank die? ANNE Belsen concentration camp , where they died of typhus in March 1945. LISF∗
’S
EATH
IED
S
0.6091 0.6268
1933FILM
Frank” (Wikipedia)
BERGEN-BELSEN S
(4) As persecutions of the Jewish population increased in July 1942, the family went into hiding in the hidden rooms of Anne’s father, Otto Frank ’s, office building.
DOCUMENTARY CONCENTRATION
ACTIONPRIVATE (5) The Frank family moved from Germany to Amsterdam in 1933, the year the Nazis gained control over Germany.
0.3913 Random Sort 0.3990
(a) (b)
Fig. 7. Applications of semantic cliques to question-answering. (a) A construction of semantic clique Q ∪ Q 0 (based on Q = {Anne, Frank, die})
weighted by the PageRank equilibrium state π e and subsequent question-answering. Top 5 candidate answers, with punctuation and spacing as
given by WikiQA, are shown with font sizes proportional to the entropy production score in (11). Here, the top-scoring sentence with highlighted
background is the same as the official answer chosen by the WikiQA team. Like a human reader, our algorithm automatically detects the place
“Bergen-Belsen concentration camp”, cause “typhus”, and year “1945” of Anne Frank’s death. (b) Evaluations of our model (LISF and LISF∗ ) on the
WikiQA data set, in comparison with established algorithms.
(2) Solve
P a bipartite matching problem (Fig. 6) that maxi- weights in the nodes of Q ∪ Q 0 are computed from the
mizes i,j s(WiA , WjB ), using the Hungarian Method [13] equilibrium state π e , via the PageRank algorithm.
e of P
attributed to Jacobi–Kőnig–Egerváry–Kuhn [14]. We then test our semantic model (LISF in Fig. 7b) on all
the 1242 questions in the WikiQA data set, each of which
is accompanied by at least one correct answer located in a
3.2 Machine-assisted text comprehension on WikiQA designated Wikipedia page. Our algorithm’s performance
data set is roughly on par with LCLR and CNN benchmarks [18],
By automatically discovering related words through numer- improving upon the baseline by significant margin. This is
ical brainstorming (Figs. 5a and b, insets), our semantic perhaps remarkable, considering the relatively scant data
cliques Si are useful in text comprehension and ques- at our disposal. Unlike the LCLR approach, our numerical
tion answering. We can expand a set of question words discovery of synonyms does not draw on the WordNet
S database [19] or pre-existent corpora of question-answer
Q = {Wq1 , . . . , WqK } into Q ∪ Q 0 = K k=1 Sqk , by bringing
together the semantic cliques Sqk generated from a refer- pairs. Unlike the CNN method, we do not need pre-trained
ence text by each and every question word Wqk . word2vec embeddings [20] as semantic input.
As before, we construct a localized Markov matrix P = Moreover, our algorithm (LISF∗ in Fig. 7b) performs
(pij )1≤i,j≤N on this subset of word patterns Q ∪ Q 0 . We slightly better on a subset of 990 questions that do not re-
further use the Brin–Page damping [15] to derive an ergodic quire quantitative cues (How big? How long? How many? How
Markov matrix P e = (peij )1≤i,j≤N , where peij = 0.85pij + old? What became of? What happened to? What year? and so on).
0.15 This indicates that, with a Markov chain description of two-
N .
By analogy to the behavior of internet surfing [15], body interactions between topics, our structural model fits
[16], we model the process of associative reasoning [17] associative reasoning better than rule-based reasoning [17],
as a navigation through the nodes Q ∪ Q 0 according to while imitating human behavior in the presence of limited
e , which quantifies the click-through rate from one idea data. To enhance the reasoning capabilities of our algorithm,
P
it is perhaps appropriate to apply a Markov random field
to another. The PageRank recursion [16] ensures a unique
e . If our question Q and a [21, §4.1.3] to graphs of word patterns, to capture many-
equilibrium state π e attached to P
body interactions among different topics.
candidate answer A contain, respectively, words from WQ1 ,
. . . , WQm ∈ Q and WA1 , . . . , WAn ∈ Q ∪ Q 0 (counting
multiplicities, but excluding function words and patterns 4 C ONCLUSION
with fewer than 3 occurrences in the reference document),
In our current work, we define semantics through alge-
then we assign the following entropy production score
braic invariants that are concept-specific and language-
m X
n
X independent. To construct such invariants, we develop a
F [Q, A] := − eQi peQi Aj log peQi Aj
π (11) stochastic model that assigns a semantic fingerprint (list of
i=1 j=1
recurrence eigenvalues) to each concept via its long-range
to this question-answer pair.5 contexts. Consistently using a single Markov framework, we
A sample work flow is shown in Fig. 7a, to illustrate are able to extract topics (Figs. 2e, 3b,b’), translate topics
how our rudimentary question-answering machine handles (Figs. 3c, 4c, 5b,c, 6) and understand topics (Figs. 5a,b,
a query. To answer a question, we use a single Wikipedia 7a,b), through statistical mining of short and medium-length
page (without infoboxes and other structural data) as the texts. In view of these three successful applications, we are
only reference document and training source. Like a typical probably close to a complete set of semantic invariants, after
human reader of Wikipedia, our numerical associative rea- demystifying the long-range behavior of human languages.
soning generates a weighted set of nodes Q ∪ Q 0 (presented Notably, our algorithms apply to documents of moderate
graphically as a thought bubble in Fig. 7a), without the help lengths, similar to the experience of human readers. This
of external stimuli or knowledge feed. Here, the relative contrasts with data-hungry algorithms in machine learning
[18], [22], which utilize high-dimensional numerical repre-
5. One may compare the score F [Q, A] to sentations of words and phrases [11], [12], [20], [23] from
Pthe PKolmogorov–Sinai
entropy production rate [9, (4.27)] η(P) = − N i=1
N
j=1 πi pij log pij
large corpora. Our semantic mechanism exhibits universal-
of a Markov matrix P = (pij )1≤i,j≤N . The score F [Q, A] is modeled ity on long-range linguistic scales. This adds to our quanti-
after Boltzmann’s partition entropies, weighted by words in the ques- tative understanding of diversity on shorter-range linguistic
tion, and sifted by topics in the answer. Such a weighting and sifting
method is analogous to the definition of scattering cross-sections in scales, such as phonology [24], [25], [26], morphology [27],
particle physics. [28], [29], [30] and syntax [3], [30], [31], [32], [33].
0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 8
Thanks to the independence between semantics and [18] Y. Yang, W.-t. Yih, and C. Meek, “WikiQA: A challenge dataset
syntax [3], our current model conveniently ignores the non- for open-domain question answering,” in Proceedings of the 2015
Conference on Empirical Methods in Natural Language Processing
Markovian syntactic structures which are essential to fluent (EMNLP). Lisbon, Portugal: Association for Computational Lin-
speech. In the near future, we hope to extend our framework guistics, 2015.
further, to incorporate both Markovian and non-Markovian [19] C. Fellbaum, Ed., WordNet: An Electronic Lexical Database (Language,
features across different ranges. The Mathematical Principles Speech, and Communication). Cambridge, MA: MIT Press, 1998.
[20] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,
of Natural Languages, as we envision, must and will combine “Distributed representations of words and phrases and their com-
the statistical analysis of a Markov model with linguistic positionality,” in Advances in Neural Information Processing Systems
properties on shorter time scales that convey morphological 26. La Jolla, CA: NIPS, 2013, pp. 3111–3119.
[21] D. Mumford and A. Desolneux, Pattern theory: the stochastic analysis
[27], [28], [29], [30] and syntactical [3], [30], [31], [32], [33] of real-world signals. Natick, MA: A K Peters, 2010.
information. [22] V. Tshitoyan, J. Dagdelen, L. Weston, A. Dunn, Z. Rong,
O. Kononova, K. A. Persson, G. Ceder, and A. Jain, “Unsuper-
vised word embeddings capture latent knowledge from materials
ACKNOWLEDGMENTS science literature,” Nature, vol. 571, no. 7763, pp. 95–98, 2019.
[23] S. Arora, Y. Li, Y. Liang, T. Ma, and A. Risteski, “A latent variable
We thank N. Chomsky and S. Pinker for their inputs on model approach to PMI-based word embeddings,” Transactions of
several problems of linguistics. We thank X. Sun for discus- the Association for Computational Linguistics, vol. 4, pp. 385–399,
2016.
sions on neural networks. We thank X. Wan, R. Yan and D. [24] M. A. Nowak and D. C. Krakauer, “The evolution of language,”
Zhao for their suggestions on experimental design, during Proc. Natl. Acad. Sci. USA, vol. 96, no. 14, pp. 8028–8033, 1999.
the early stages of this work. We thank two anonymous [25] C. Everett, D. E. Blasí, and S. G. Roberts, “Climate, vocal folds,
reviewers, whose thoughtful comments help us improve the and tonal languages: Connecting the physiological and geographic
dots,” Proc. Natl. Acad. Sci. USA, vol. 112, pp. 1322–1327, 2015.
presentation of this work. [26] C. Everett, “Languages in drier climates use fewer vowels,” Fron-
tiers in Psychology, vol. 8, p. Article 1285, 2017.
[27] S. Pinker, “Words and rules in the human brain,” Nature, vol. 387,
R EFERENCES no. 6633, pp. 547–548, 1997.
[28] W. D. Marslen-Wilson and L. K. Tyler, “Dissociating types of
[1] F. de Saussure, Cours de linguistique générale, 5th ed. Paris, France: mental computation,” Nature, vol. 387, no. 6633, pp. 592–594, 1997.
Payot, 1949. [29] E. Lieberman, J.-B. Michel, J. Jackson, T. Tang, and M. A. Nowak,
[2] S. Pinker and A. Prince, “On language and connectionism: Analy- “Quantifying the evolutionary dynamics of language,” Nature, vol.
sis of a parallel distributed processing model of language acquisi- 449, no. 7163, pp. 713–716, 2007.
tion,” Cognition, vol. 28, no. 1-2, pp. 73–193, 1988. [30] M. G. Newberry, C. A. Ahern, R. Clark, and J. B. Plotkin, “Detect-
[3] N. Chomsky, Syntactic Structures, 2nd ed. Berlin, Germany: ing evolutionary forces in language change,” Nature, vol. 551, no.
Mouton de Gruyter, 2002. 7679, pp. 223–226, 2017.
[4] A. D. Friederici, “The neurobiology of language comprehension,” [31] S. Pinker, “Survival of the clearest,” Nature, vol. 404, no. 6777, pp.
in Language Comprehension: A Biological Perspective, A. D. Friederici, 441–442, 2000.
Ed. Berlin, Germany: Springer, 1999, ch. 9, pp. 265–304. [32] M. A. Nowak, J. B. Plotkin, and V. A. A. Jansen, “The evolution of
[5] J. P. Herrera and P. A. Pury, “Statistical keyword detection in syntactic communication,” Nature, vol. 404, no. 6777, pp. 495–498,
literary corpora,” Eur. Phys. J. B, vol. 63, pp. 135–146, 2008. 2000.
[6] M. Ružička, “Anwendung mathematisch-statistischer Methoden [33] M. Dunn, S. J. Greenhill, S. C. Levinson, and R. D. Gray, “Evolved
in der Geobotanik (Synthetische Bearbeitung von Aufnahmen),” structure of language shows lineage-specific trends in word-order
Biológia (Bratislava), vol. 13, pp. 647–661, 1958. universals,” Nature, vol. 473, no. 7345, pp. 79–82, 2011.
[7] Y. Zhou and X. Zhuang, “Robust reconstruction of the rate con- [34] C. Berg, J. P. R. Christensen, and P. Ressel, Harmonic Analysis on
stant distribution using the phase function method,” Biophys. J., Semigroups: Theory of Positive Definite and Related Functions, ser.
vol. 91, no. 11, pp. 4045–4053, 2006. Graduate Texts in Mathematics. New York, NY: Springer-Verlag,
[8] N. Haydn, Y. Lacroix, and S. Vaienti, “Hitting and return times in 1984, vol. 100.
ergodic dynamical systems,” Ann. Probab., vol. 33, pp. 2043–2050, [35] S. M. Ross, Stochastic Processes, 2nd ed. Hoboken, NJ: John Wiley
2005. & Sons, 1995.
[9] T. M. Cover and J. A. Thomas, Elements of Information Theory,
2nd ed. Hoboken, NJ: Wiley Interscience, 2006.
[10] M. Pollicott and M. Yuri, Dynamical Systems and Ergodic Theory, ser.
London Mathematical Society Student Texts. Cambridge, UK:
Cambridge University Press, 1998, vol. 40.
[11] A. Joulin, P. Bojanowski, T. Mikolov, H. Jégou, and E. Grave, “Loss
in translation: Learning bilingual word mapping with a retrieval
criterion,” in Proceedings of the 2018 Conference on Empirical Methods Weinan E received his B.S. degree in mathe-
in Natural Language Processing. Brussels, Belgium: Association for matics from the University of Science and Tech-
Computational Linguistics, Oct.-Nov. 2018, pp. 2979–2984. nology of China in 1982, and his Ph.D. degree
[12] X. Chen and C. Cardie, “Unsupervised multilingual word embed- in mathematics from the University of California,
dings,” in Proceedings of the 2018 Conference on Empirical Methods in Los Angeles in 1989. He has been a full profes-
Natural Language Processing. Brussels, Belgium: Association for sor at the Department of Mathematics and the
Computational Linguistics, Oct.-Nov. 2018, pp. 261–270. Program in Applied and Computational Mathe-
[13] H. W. Kuhn, “The Hungarian Method for the assignment prob- matics, Princeton University since 1999. Prof.
lem,” Nav. Res. Logist. Q., vol. 2, pp. 83–97, 1955. E has made significant contributions to numer-
[14] ——, “A tale of three eras: The discovery and rediscovery of the ical analysis, fluid mechanics, partial differential
Hungarian Method,” Eur. J. Oper. Res., vol. 219, pp. 641–651, 2012. equations, multiscale modeling, and stochastic
[15] S. Brin and L. Page, “The anatomy of a large-scale hypertextual processes.
web search engine,” Comput. Networks ISDN, vol. 30, no. 1-7, pp. His monograph Principles of Multiscale Modeling (Cambridge Univer-
107–117, 1998. sity Press, 2011) is a standard reference for mathematical modeling of
[16] L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank physical systems on multiple length scales. He was awarded the Peter
citation ranking: Bringing order to the web,” Stanford InfoLab, Henrici Prize in 2019 for his recent contributions to machine learning.
Tech. Rep., 1999, http://ilpubs.stanford.edu:8090/422/.
[17] S. A. Sloman, “The empirical case for two systems of reasoning.”
Psychol. Bull., vol. 119, no. 1, pp. 3–22, 1996.
0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 9
0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.