0% found this document useful (0 votes)
32 views

A Mathematical Model For Universal Semantics

This document presents a mathematical model for extracting universal semantics from text across languages. It describes analyzing recurring word patterns in texts through statistics of recurrence times and hitting times to characterize concepts and their connectivity. This provides language-independent semantic fingerprints that enable machines to understand texts and translate between languages by quantifying the local meaning of words. The model is tested on 14 languages from 5 language families.

Uploaded by

JTN TEC
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

A Mathematical Model For Universal Semantics

This document presents a mathematical model for extracting universal semantics from text across languages. It describes analyzing recurring word patterns in texts through statistics of recurrence times and hitting times to characterize concepts and their connectivity. This provides language-independent semantic fingerprints that enable machines to understand texts and translate between languages by quantifying the local meaning of words. The model is tested on 14 languages from 5 language families.

Uploaded by

JTN TEC
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1

A mathematical model for universal semantics


Weinan E and Yajun Zhou

Abstract—We characterize the meaning of words with language-independent numerical fingerprints, through a mathematical analysis
of recurring patterns in texts. Approximating texts by Markov processes on a long-range time scale, we are able to extract topics,
discover synonyms, and sketch semantic fields from a particular document of moderate length, without consulting external
knowledge-base or thesaurus. Our Markov semantic model allows us to represent each topical concept by a low-dimensional vector,
interpretable as algebraic invariants in succinct statistical operations on the document, targeting local environments of individual words.
These language-independent semantic representations enable a robot reader to both understand short texts in a given language
(automated question-answering) and match medium-length texts across different languages (automated word translation). Our
semantic fingerprints quantify local meaning of words in 14 representative languages across 5 major language families, suggesting a
universal and cost-effective mechanism by which human languages are processed at the semantic level. Our protocols and source
codes are publicly available on https://github.com/yajun-zhou/linguae-naturalis-principia-mathematica

Index Terms—recurring patterns in texts, semantic model, recurrence time, hitting time, word translation, question answering

F
1 I NTRODUCTION

A QUANTITATIVE model for the meaning of words not


only helps us understand how we transmit informa-
tion and absorb knowledge, but also provides foothold
singular/plural, present/past) or syntactic rôles [3] (say,
subject/object, active/passive). In short, there are no uni-
versal semantic mechanisms at the phonological, lexical or
for algorithms in machine processing of natural language syntactical levels [4]. Grammatical “rules and principles”
texts. Ideally, a universal mechanism of semantics should [2], [3], however typologically diverse, play no definitive
be based on numerical characteristics of human languages, rôle in determining the inherent meaning of a word.
transcending concrete written and spoken forms of verbal Motivated by the observations above, we will build our
messages. In this work, we demonstrate, in both theory quantitative semantic model on long-range and language-
and practice, that the time structure of recurring language independent textual features. Specifically, we will measure
patterns is a good candidate for such a universal semantic the lengths of text fragments flanked by word patterns of
mechanism. Through statistical analysis of recurrence times interest (Fig. 1). Here, a word pattern is a collection of content
and hitting times, we numerically characterize connectivity words that are identical up to morphological parameters and
and association of individual concepts, thereby devising syntactic rôles. A content word signifies definitive concepts
language-independent semantic fingerprints (LISF). (like apple, eat, red), instead of serving purely grammatical or
Concretely speaking, we define semantics through al- logical functions (like but, of, the). Fragment length statistics
gebraic invariants of a stochastic text model that approx- will tell us how tightly/loosely one concept is connected
imately governs the empirical hopping rates on a web to another. This in turn, will provide us with quantitative
of word patterns. Such a stochastic model explains the criteria for inclusion/exclusion of different concepts within
distribution of recurrence times and outputs recurrence the same (computationally constructed) semantic field. Such
eigenvalues as semantic fingerprints. Statistics of recurrence statistical semantic mining will then pave the way for ma-
times allow machines to tell non-topical words from topical chine comprehension and machine translation.
ones. A comparison of hitting and recurrence times further
generates quantitative fingerprints for topics, enabling ma- 2 M ETHODOLOGY
chines to overcome language barriers in translation tasks We quantify the time structure of an individual word pat-
and perform associative reasoning in comprehension tasks, tern Wi through the statistics of its recurrence times τii . We
like humans. characterize the dynamic impact of a word pattern Wi on
Akin to the physical world, there is a hierarchy of another word pattern Wj by the statistics of their hitting
length scales in languages. On short scales such as syllables, times τij . In what follows, we will describe the statistical
words, and phrases, human languages do not exhibit a analyses of τii and τij , on which we build a language-
universal pattern related to semantics. Except for a few independent Markov model for semantics.
onomatopoeias, the sounds of words do not affect their
meaning [1]. Neither do morphological parameters [2] (say, 2.1 Recurrence times and topicality
Assuming uniform reading speed,1 we measure the recur-
• Weinan E is with Department of Mathematics and Program in Applied
rence times τii for a word pattern Wi through nii samples
and Computational Mathematics, Princeton University, and Beijing In-
stitute of Big Data Research
1. On the scale of words (rather than phonemes), this assumption
E-mail: [email protected]
works fine in most languages that are written alphabetically. However,
• Yajun Zhou is with Beijing Institute of Big Data Research
this working hypothesis does not extend to Japanese texts, which
Email. [email protected]
interlace Japanese syllabograms (lasting one mora per written unit)
with Chinese ideograms (lasting one or more morae per written unit).

0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2

Wi := happ(ier|ily|iness|y) ≡ {happier, happily, happiness, happy}, Wj := marr(iage|ied|y) ≡ {marriage, married, marry}


... LOREM IPSUM HAPPY DOLOR SIT AMET, HAPPY, CONSECTETUR ADIPISCING UNHAPPY ELIT, HAPPINESS SED HAPPY DO HAPPY EIUSMOD TEMPOR HAPPIER, INCIDIDUNT UT LABORE ...
HAPPINESS HAPPINESS HAPPINESS
Lii Lii Lii
... LOREM IPSUM, MARRIAGE DOLOR SIT AMET, HAPPY, CONSECTETUR ADIPISCING MARRIED ELIT, MARRY SED HAPPILY DO HAPPILY EIUSMOD TEMPOR MARRIED INCIDIDUNT UT LAB ...
HAPPINESS HAPPINESS
Lij Lij

HAPPINESS
Lij

Fig. 1. Counting long-range transitions between word patterns. A transition from Wi to Wj counts towards long-range statistics, if the underlined
text fragment in between contains no occurrences of Wi , and lasts strictly longer than the longest word in Wi ∪ Wj . For each long-range transition,
the effective fragment length Lij discounts the length of the longest word in Wi ∪ Wj .

Wi = Jane(∅|’s)
10 AREA FORBIDDEN BY
Wj = than JENSEN’S INEQUALITY

(a)
Chapter
# blocks

d
200

y
un

lit
na
bo
100

ba
’s
0 beaut(iful|ies|y)

hlog Lii i

en

an
0 1 2 3 4 5 6 7 8 9
# than in a block

ns

ni
handsome(∅|ly|r|st)
(b)

so
8

Je
Bourgh(∅|’s)
102 Mary(∅|’s)

is
Wi = Jane(∅|’s) Wj = than Rosings
William(∅|’s)

Po
Counts

Lizzy(∅|’s) Hurst(∅|’s|s)
101 Lucas(∅|’s|es|es’)
Forster(∅|’s|s)
lov(e|e’|ed|ely|es|ing|eliness|e-making|er|ers) Kitty(∅|’s)
happ(ily|iness|y|ier|iest)
100 Charlotte(∅|’s)
Gardiner(∅|’s|s) Fitzwilliam(∅|’s)
0 5 000 10 000 15 000 0 5 000 10 000 15 000 7 Catherine(∅|’s)
Lii Ljj Lydia(∅|’s)
Wickham(∅|’s)
Jane(∅|’s) Collins(∅|’s)
(c) (d) Bennet(∅|’s|s)
Darcy(∅|’s) Bingley(∅|’s|s|s’) danc(e|ed|es|ing)
102
Wi = Jane(∅|’s) Wj = than Eliza(∅|beth|beth’s)
Counts

101
6
100
0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 7 8 9 10 11
log Lii log Ljj loghLii i
(c0 ) (d0 ) (e)
Fig. 2. Statistical analysis of recurrence times and topicality. (a) Barcode representations (adapted from [5, Fig. 2]) for the coverage of Wi =
Jane(∅|’s) (291 occurrences) and Wj = than (282 occurrences) in the whole text of Pride and Prejudice. Horizontal axis scales linearly with
respect to the text length measured in the number of constituting letters, spaces and punctuation marks. (b) Counts of the word than within a
consecutive block of 1217 words (spanning about 1% of the entire text), drawn from 1000 randomly chosen blocks, fitted to a Poisson distribution
with mean 2.776 (blue curve). (c) Histogram of effective fragment length Lii (see Fig. 1 for its definition) for the topical pattern Wi = Jane(∅|’s),
fitted to an exponential distribution (blue line in the semi-log plot) and a weighted mixture of two exponential distributions c1 k1 e−k1 t + c2 k2 e−k2 t
(red curve, with c1 : c2 ≈ 1 : 3, k1 : k2 ≈ 1 : 7). (d) Histogram of Ljj for the function word Wj = than, fitted to an exponential distribution (blue line
in the semi-log plot). All the parameter estimators in panels b–d are based on maximum likelihood. (c0 )–(d0 ) Reinterpretations of panels c–d, with
logarithmic binning on the horizontal axes, to give fuller coverage of the dynamic ranges for the statistics. (e) Recurrence statistics for word patterns
in Jane Austen’s Pride and Prejudice, where h· · · i denotes averages over nii samples of long-range transitions. Data points in gray, green and red
have radii 4√1n . Labels for proper names and some literary motifs are attached next to the corresponding colored dots. Jensen’s bound (green
ii
dashed line) has unit slope and zero intercept. Exponentially distributed recurrence statistics reside on the line of Poissonian banality (blue line),
with unit slope and negative intercept. Red (resp. green) dots mark significant downward (resp. upward) departure from the blue line.

of the effective fragment lengths Lii (Figs. 1, 2a). Here, while with probability 95% (see Theorem 1 in Appendix
PnA for1 a
counting as in Fig. 1, we ignore contacts between short- two-sigma rule). Here, γ0 := limn→∞ − log n + m=1 m
range neighbors, which may involve language-dependent is the Euler–Mascheroni constant.
redundancies.2 As a working definition, we consider a word pattern Wi
non-topical if its nii counts of effective fragment lengths Lii
2.1.1 Recurrence of non-topical patterns
are exponentially distributed P(Lii > t) ∼ e−kt , within 95%
In a memoryless (hence banal) Poisson process (Fig. 2b), margins of error [that is, satisfying (1) above].
recurrence times are exponentially distributed (Fig. 2d,d0 ).
The same is also true for word recurrence in a randomly
reshuffled text [5]. If we have nii independent samples of
exponentially distributed random variables Lii , then the 2.1.2 Recurrence of topical patterns
statistic δi := loghLii i − hlog Lii i − γ0 + 2n1ii satisfies an
inequality In contrast, we consider a word pattern Wi topical if its
s diagonal statistics nii , Lii constitute significant departure
2 π2 1 from the Poissonian line hlog Lii i−loghLii i+γ0 = 0 (Fig. 2e,
|δi | < √ −1− (1)
nii 6 2nii blue line), violating the bound in (1).
Notably, most data points for topics (colored dots on
2. For example, a German phrase liebe Studentinnen und Studenten Fig. 2e) in Jane Austen’s Pride and Prejudice mark system-
with short-range recurrence is the gender-inclusive equivalent of the
English expression dear students. Some Austronesian languages (such atic downward departures from the Poissonian line. This
as Malay and Hawaiian) use reduplication for plurality or emphasis. suggests that the topical recurrence times τ = Lii follow

0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 3

Word Counts
felt 100 Word Counts Word Counts
feelings 86 word count censorship felt 100 alphabetic sorting feel 39
feel 39 ============⇒ feelings 86 ==========⇒ feeling 38
feeling 38 feel 39 feelings 86
feels 4 feeling 38 felt 100
feelingly 1

F EE L
FEEL 39 39

INGS
FEELING 38 38

===========⇒FEELINGS 86 =vertical
sequence alignment merger 86

FELT 100 =======⇒ E T 100

(a)

MR

Wjfr

DEMOISELLES

RNELLE

NETHERFIELD

FITZWILLIAM
ELIZABETH MR ELIZABETH

DERBYSHIRE
ELIZABETH

CATHERINE
CHARLOTTE

PEMBERLEY
S

GARDINER

URE

RS
E

BRIGHTON
AID DARCY DI DARCY
’S

BINGLEY

WICKHAM
COLLINS

EZ
T

COUSINE S

US

COLONEL
MERYTON

WILLIAM
ROSINGS
FORSTER
SEMAINE

PHILIPS
RE
T

SAIT

BENNET

IE

LETTRE

RE
E

VI

R S

BOURGH

X
SAIT

AY
SEE KNOW

ES

DARCY

LYDIA
CHÈRE

AMAN

FRÈRE
LIZZY
TANTE
T

LUCAS
ONCLE
DANSE

KITTY

NIÈCE

HURST

NEVEU
’S

MRS
ING
E

VOIR
AW IT
SAGE
ES

RE

JANE
MIS S

PÈRE
MÈ RE

AT

ÉCRI

MARY
ING

THINK MRS BINGLEY

DIT
MRS

SIR
N
ING

Wien
N
UE

MAKE D
BENNET

MR
OUGHT
ING

BENNET
S ING

BETTER
JANE
S

GI
OOD
MISS MR
DEMOISELLES

BINGLEYGO LADY
’S

ES
ELIZABETH
BONNE
SŒUR ALL
AIT
ER
’S

MIEUX SAID
JANE
S
Z

WICKHAM
’S
ONS
SHIP
S VAIS

SISTER
AY

FILLE
ING

ING
NT
E
S PARTI DARCY
MISS
’S
S WE E
’S

MRS
F L SAV
’S

GIVE PRI PENS


R

EE EN IS
S

A AIT
INGS
AN
I
T

DRE E
ÉE
T

AIT S

BENNET
EZ

LADY
EZ

E T
E

E OIR
S
S R

COME
N T Z

JEUNES
D MES
U

A
I G

TIME GREAT
’S

BINGLEY
MARRI DEUX SENTI
ING
S
’S
ES
R

TROUVA MENT MISS


T M AME

LONG
AGE
S LY SE
AIT

IT
S
E TH
ED
1
N

AMI
É

ER
Y
VEN JANE
WICKHAM
R
T
ER AIT
ABILITÉ
ING
IR
EZ

LITTLE TAKE
E

COLLINS
’S
US

WICKHAM
IN
EN
T
T
S

A
TIÉ

IMABLE
I G

OO

WISH FRIEND
’S

HEARD
’S

COLLINS
IT
S
E

LYDIA
ES MOUR
É

COLLINS
D
ING S E UX ’S

DEAR
YOUNG
ING HIP

MAN VOU GRANDE


E T
X

L
DRAIS EST

0.8
AIT

LYDIA
EZ

HAPPI DEAR DONN DEMAND SEULEMENT


ER
OIR
U

LOOKED
E ’S AIT
AIT
NESS
ST
EST
E

É ’S
E
ER
Y MOTHER
PARLERRE MARIAGE
É
ER

LYDIA
Z S

ING

PASS

i ,bj )
S EST
A AIT ’S

HOPE DAY
N

FAMILY FATHER
GE E

É
E É
ÉE
ER
ES

ER

PLAISIR CROIS RÉPONDIT

fr
ION Z ÉS
RA
EIT

’S

SP K LIKE I
’S

BELIEVE LETTER
D

EA
S S
RE
F

JOUR POINT ARRIVÉ


ÎT
ING
UT U

D A
O E
IT S

LY D

TOLD CATHERINE
N
E

E L E
N NÉE

F ND
ING

LEAV FIRST
S

S
E
CERTA IN LAISSE 0.6
S
E
È R ENT

A
OU EMENT
T
ING I
ING ’S

EFT
N

MOTHER GARDINER
E
É

LETTRE
R

MANNER LIVE MEAN


S
ES Z
F
CHÈRE

sR (ben
S
ING

ENTENDRE
E
’S
S

LIZZY
CERTAIN Y FATHER
D AIT

DAUGHTER
S
IT
LY
ING T ’S

PÈRE
É IE S U
L
CHOSE FAMILLE TEMPS WRITE
LETTER WALK
S
’S

T ’S ING
TEN

TWO REPLIED MÈRE


S O E

ESPÈRE NOUVELLE
S

AUNT
AMAN

ED
É ANCES AU
IT

CATHERINE CATHERINE
É R
ING

0.4
Y S OIR S ’S

REALLY RECEIVE D
HOMME FEMME CHARLOTTE ’S

LOVE PRÉSENT LUCAS


ING

ROOM CONNAISSA
’S

HOUSE TALK
PT ON
S

WAY
S

IT CE
ED NCE ATION
ES

ÎTRE
E
EZ
É

BROTHER
ING ER

HEUREUX
E

PLEASED EXPECT RETURN PEINE DOUTE SE


S D
ANT
ATION
ED

GARDINER
MENT
’S
ED
S

SEEMED SURE NETHERFIELD


ING

ATTENTION ANSWER
I NG R

BELLEAU
DÉSIR
GARDINER
S S ED TÉ

DANCE
VE X

SOIRÉECAUS 0.2
AIT

NATUREL
E

APPEAR
S

KIND
R

AIT ÎT D

E
S

ANCE ING

UNCLE
É
ED LEMENT

VISITE
’S NESS ER
LY

PLEASURE EXPRESS CONSIDER PARU ENTRE


S

A
ING

ISSAIT
ED ATION
BLE
ÎTRE
ION

LIZZY
NG OISS

ED ’S

LONDRES
R
ÉE

KITTY
T

GENTLEMAN REÇ
S ING S

PREMIÈRE
S
URS

CEVOIR
ME T
E
EE
É PT ONS

U
ING

WR ITE
LS
IKE

E

SEMBLAI VRAIMENT COLONEL


T

ATTENDAIT
S
E S

IT
BLES ENT
S

OH
E
SUBJECT INDEED REVEN R
RE

REGARDA
N

ING

LONGBOURN
TEN
S MBLABLE
AI ENT
T

O E
CRIED COUSIN
CHARLOTTE
U IT

ASSURE PARTY PASSED 0


S

INT
ENDRAIT N

IAL
ER
ANCE
RENT

ITY S
S
ES ’S
S

VISIT MERYTON
D

PART MONDE
ŒIL

PLACE SUJET YEUX


ING

BEGAN
LIZZY
ORS
ING

FRÈRE
ED
I

SUPPOSE PEMBERLEY
S

ACQUAINTANCE OBJECT
ING D NING

QUITTER
S S S
AT RNELLE

A NT

É
E

D S
IONS

ASKED RELATIONS
ED

LONGBOURN TANTE CONVERSATION


ING

AFFECTION
ED
S

COUR ROSINGS
ED

HEURE
NG

CHARLOTTE
ING HIP
ANT
IR
RIER

S S

SATISFA MOMENT A AJOUT


UT

TURNED
CTION
CALLED WEEK
I T
ITE

INVITATION
RE N

S ER

LUCAS
’S S
MAISON INTÉRÊT
ING ING

GIRLS
ESSA

AUNT
I
N
ER
T

WILLIAM
ÉCRI
S
ÉE

SENSI
E ER
S

LUCAS
E
I
ENTION
VI
UR

RE
ÉS

COUSINE
BLE
ES

ENGAGED SATISFA MOMENT T


’S ’ ’S

CT
FORSTER
ION E
IE D
ORY

MENT VE
URE
Z

WOMAN SÛROERT CÔTÉ


IT

RENCONTRE
S S

MORNING
’S

PRESENT PARTICULAR
ING Y S S

CE

MARY
LY É
ER

BROTHER EVENING SURPRISE


E
’S

VOYA
MENT

IMP
È ENT

T
ES

POSSIBLE WORDS
ENDRE

ILITY ANCE I
N
T
E
RESSION

MANIÈRES CRAINTE
SIR QUESTION EXPRIMER BOURGH
S
EZ
ONS

’S Y S AIT
É
N
GNAIT
DRE
S
EZ

ONCLE
E

OPINION REASON DELIGHT


S

DANC
AIR
’S

CHARMA PORTE
ED

DE
E
NT

DANS
AIT

ABLE FUL E
E
S S LY S

ÉE

E
S ER
IÈRE

NETHERFIELD E FITZWILLIAM
D É

SI
A
T ING
S
R
S

US ETEN DÉCIDÉ
’S

SURPRISED IMMEDIATELY EAS


TING
AIT A

RS

HURST
DRESSE
E IR

CONTINUED
EZ
E

BONHEUR
ER
ILY U
ALLY

FORT
E

Y
POSSIBLE MIT IDÉE
UNCLE
É D
AIOT

SE
RENT
È E
ESSI N

GENERAL CONVERSATION ADDED THING LIVRES ÉCRIA SUIV PHILLIPS


S

SOURIRIESSA MAL
E

PROU
IDE KITTY CHARACTER SMILE
ANT
S A
I
R

NT
I
’S LY
I
E
E
RENT
’S
T

VÉRITABLE NIECE
S

REPRIT
T ADE RE

RECONNA
UX

ENDREA T

NCE SES
É
MENT
T
ÎTRE

NETHERFIELD
D
S
’S
UT

EYES ATTENTION LONGTEMPS


S

OBSERV ALLOW BROUGHT


ING

OCCASION
E

NEPHEW
S

ATION
ING
ED

COLONELMATIN
ING
ED ALLY S

WORLD CHOOSE CARRIAGE


ING ING S

FOLLOW PERFECTLY
OICE

INSTANT PETITE
’S

ED
S

DERBYSHIRE
COLONEL
S

O ING
S
ED
N YING S

CR
S

TOWN PERDU RESTE TROIS


S

OY
S

RE
AIT

EZ
HOME MENTIONED ING UNDERSTAND OOD
ING ÉE ÉE U AUTÉ
IÈME

BRIGHTON

(b) (b ) 0
(c)
Fig. 3. Automated topic extraction and raw alignment across bilingual corpora. (a) Schematic diagram illustrating our graphical representation
of morphologically related words (identified by supervised algorithms in Supplementary Materials) in a word pattern. To avoid unprintably small
characters, rarely occurring forms (less than 5% of the total sum of all the words ranked above) are ignored in graphical display. To enhance the
visibility of word stems, we print shared letters only once, and compress other letters vertically, with heights proportional to their corresponding
word counts. (b) Word patterns Wi in Jane Austen’s Pride and Prejudice, sorted by descending nii , with font size proportional to the square
root of e−hlog Lii i (a better indicator of reader’s impression than the number of recurrences nii ∝ e− loghLii i ). Topical (that is, significantly non-
Poissonian) patterns painted in red (resp. green) reside below (resp. above) the critical line of Poissonian banality (blue line in Fig. 2e), where
the deviations exceed the error margin prescribed in (1) of §2.1. (b0 ) A similar service on a French version of Pride and Prejudice (tr. Valentine
Leconte & Charlotte Pressoir). (c) A low-cost and low-yield word translation, based on chapter-wise word counts ben i and bj . Ružička similarities
fr

sR (bi , bj ) between selected topics (sorted by descending nii ≥ 20) in English and French versions of Pride and Prejudice. Rows and columns
en fr

with maximal sR (ben i , bj ) less than 0.7 are not shown. Correct matchings are indicated by green cross-hairs.
fr

weighted mixtures of exponential distributions (Fig. 2c,c0 ): between two vectors with non-negative entries, allow us to
X align some topics found in parallel versions of the same doc-
P(τ > t) ∼ cm e−km t , (2) ument, in languages A and B (Fig. 3c). Here, in the definition
m
P of the Ružička similarity, ∧ (resp. ∨) denotes component-
(where cm , km > 0, and m cm = 1), which impose an in- wise minimum (resp. maximum) of vectors; kbk1 sums over
equality constraint on the recurrence time τ = Lii : all the components in b.
hlog Lii i − loghLii i + γ0
X 1 X cm 2.2 Markov text model
= cm log − log ≤ 0. (3)
m
km m
km 2.2.1 Transition probabilities via pattern analysis
The diagonal statistics nii , Lii (Fig. 1) have enabled us to
2.1.3 Raw alignment of topical patterns
extract topics automatically through recurrence time anal-
If a word pattern Wi qualifies as a topic by our definition ysis (Figs. 2e and 3b,b0 ). The off-diagonal statistics nij , Lij
(Fig. 3b,b0 ), then the signals in its coarse-grained timecourse (Fig. 1) will allow us to determine how strongly one word
(say, a vector bi = (bi,1 , . . . , bi,61 ) representing word counts pattern Wi binds to another word pattern Wj , through hit-
in each chapter of Pride and Prejudice) are not overwhelmed ting time analysis. In an empirical Markov matrix P = (pij ),
by Poisson noise. the long-range transition rate pij is estimated by
This vectorization scheme, together with the Ružička
similarity [6] nij e−hlog Lij i
pij := N
, (5)
X
kbA B
i ∧ bj k1 nik e −hlog Lik i
sR (bA B
i , bj ) := (4)
kbi ∨ bB
A
j k1 k=1

0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 4
(n)
Vector components
π π π π
becomes closer as we go to higher iterates P = n
(pij ),
0.04 π∗ π∗ π∗ π∗ where n is a small positive integer (Fig. 4b).
On an ergodic Markov chain with detailed balance, one
0.02
can show that recurrence times are distributed as weighted
mixtures of exponential decays (see Theorem 3 in Ap-
0
pendix B.3), thus offering a theoretical explanation for (2).
0 50 100 0 50 100 0 50 100 0 50 100
word count rank word count rank word count rank word count rank
(a) 2.2.3 Spectral invariance under translation
10−1
The spectrum σ(P) (collection of eigenvalues) is approxi-
10−3 mately invariant against translations of texts (Fig. 4c), which
rn

10−5 can be explained by a matrix equation


1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
n n n n
(b) PA TA→B = TA→B PB . (6)
100 Here, both sides of the identity above quantify the transition
Cumul. counts

probabilities from words in language A to words in lan-


50 guage B, from the impressions of Alice and Bob, two mono-
lingual readers in a thought experiment. On the left-hand
0 side, Alice first processes the input in her native language A
−6 −4 −2 0 −1 −0.5
1
0 0.5 1
by a Markov matrix PA , and then translates into language
log |λ(P)| arg λ(P)
(c)
π
B, using a dictionary matrix TA→B ; on the right-hand side,
Bob needs to first translate the input into language B,
–— English –— French –— Russian –— Finnish
using the same dictionary TA→B , before brainstorming in
Fig. 4. Quantitative properties of Markov text model. (a) Dominant his own native language, using PB . Putatively, the matrix
eigenvector π of a 100 × 100 Markov matrix P, computed from one equation holds because semantic content is shared by native
of the four versions of Pride and Prejudice, in comparison with π ∗ , the
list of normalized frequencies for top 100 word patterns. (b) Precipitous speakers of different languages. In the ideal scenario where
P (n) (n) translation is lossless (with invertible TA→B ), the Markov
decays of rn := 21 1≤i,j≤100 πi pij − πj pji from the initial value
r1 ≈ 0.07, for matrix powers Pn = (pij )1≤i,j≤100 constructed from
(n) matrices PA and PB are indeed linked to each other by a
four versions of Pride and Prejudice. (In contrast, one has r1 ≈ 0.33 similarity transformation that leaves their spectrum intact.
for a random 100 × 100 Markov matrix.) Such quick relaxations support
our working hypothesis about detailed balance πi∗ p∗ij = πj∗ p∗ji . (c) Dis-
tributions of eigenvalues λ of empirical Markov matrices P, with nearly 2.3 Localized Markov matrices and semantic cliques
language-independent modulus |λ(P)| and phase-angle arg λ(P).
2.3.1 Semantic contexts for recurrent topics
Specializing spectral invariance to individual topical pat-
where nij counts the number of long-range transitions from
terns, we will be able to generate semantic fingerprints
Wi to Wj , and Lij is a statistic that measures the effective through a list of topic-specific and language-independent
fragment lengths of such transitions (Fig. 1).
eigenvalues. Here, we will be particularly interested in
recurrence eigenvalues of individual topical patterns, which
2.2.2 Equilibrium state and detailed balance correspond to multiple decay rates in the weighted mixtures
Numerically, we find that our empirical Markov matrix of exponential distributions.
P = (pij ) defined in (5) is a fair approximation to an Unlike the single exponential decays associated to non-
ergodic3 matrix P∗ = (p∗ij ), which in turn, governs the topical recurrence patterns, the multiple exponential decay
stochastic hoppings between content word patterns during modes will enable our robot reader to easily discern one
text generation. topic from another. In general, it is numerically challenging
Each ergodic Markov matrix P∗ = (p∗ij )1≤i,j≤N pos- to recover multiple exponential decay modes from a limited
sesses a unique equilibrium state π ∗ = (πi∗ )1≤i≤N . The amount of recurrence time measurements [7]. However,
equilibrium state π ∗ represents a probability distribution in text processing, we can circumvent such difficulties by
PN
(that is, πi∗ ≥ 0 for 1 ≤ i ≤ P N and ∗
i=1 πi = 1) off-diagonal statistics nij and Lij that provide semantic
that satisfies π P = π (that is, 1≤i≤N πi pij = πj∗ for
∗ ∗ ∗ ∗ ∗
contexts for individual topical patterns.
1 ≤ j ≤ N ). In our numerical experiments, the dominant To quantitatively define the semantic content of a topical
eigenvector π (satisfying πP = π ) consistently reproduces pattern Wi , we specify a local, directed, and weighted
word frequency statistics that are proportional to the ideal graph, corresponding to a localized Markov transition ma-
equilibrium state π ∗ (Fig. 4a). trix P[i] .
Furthermore, through numerical experimentation, we
find that our empirical Markov matrix P ≈ P∗ approxi- 2.3.2 Localized Markov contexts of topical patterns
mately honors the detailed balance condition πi∗ p∗ij = πj∗ p∗ji
(n) (n) To localize, we need to remove edges between two ver-
for 1 ≤ i, j ≤ N . The approximation πi pij ≈ πj pji
tices Wi and Wj , when the hitting times Lij and Lji
are “long enough” relative to what one could naïvely ex-
3. If a Markov chain is ergodic, then there is a strictly positive
probability to transition from any Markov state (that is, any individual pect from recurrence time statistics nij , nji and Lii , Ljj .
word pattern in our model) to any other state, after finitely many steps. Here, for naïve expectation, we approximate the probability

0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 5

1 ELIZABETH ELIZABETH DARCY Indo-European


DARCY
OFFEND

P(hlog Lij i>`) KNOW


’S
ED

DARCY MISS
’S SE
E
REFLECTION

F LING M AN
W4 W2
’S S

W3 W2 EE
LOOKED MISS
ING
N
’S
S
BINGLEY
’S

BINGLEY E T
LOOKED W1 WICKHAM Danish
’S

COUNTRY E
ROOM
’S
PEMBERLEY
CATHERINE
SPEAK PLEAS
ING

WALK
ING ANT
’S

ED
CIVILITY PEOPLE
MEAN CHOOSE PR DE
ING
O E
I
OICE
ED

PLEASE ROOM
S
W4
ING

ING
HANDSOME PLAY ING
GENTLEMAN
S

ASK
O

German
T
EVIDENT OU
NETHERFIELDFORTUNE
N

E
ED POWER A NT
ATE
AMIABLE
DELIGHT
PURPOSE ING

GENTLEM N
D
A
SIT PEOPLE
ED IN G
A DIFFERENT TURNED INTEREST
FUL

TEMPER
E
FACE CARRIAGE
EYESAGREEABL E ED
TING ING

AUNT VAI
ANITY PAIN HANDSOME
ING

RESOLVED EASY PRETTY


1 COLLINS
UTION
PLAY ILY
E

EVENING
FUL
PEMBERLEY
CAUSE SORTSOCIETY
Dutch
LAUGH DOOR ANXI APPLI
BROTHER DISLIKE
CDATION
ETY E
OUS
CIVILITY PR OUD
ED

I E
ING

MASTER
’S

DISLIKE INTRODUCE
S

2 MARY RUN
I RESS A TION
S
NING
WINDOW DISLIKE FANCY VA
AI
NITY DERBYSHIRE PHILLIPS

α12 α14 α13 α21 α24 α23 α32 α31 α34


PAUSE BALL SORT
S

W4 W1 Spanish
W3 French
0 Latin
5.5 6.0 6.5 7.0 7.5 6.5 7.0 7.5 8.0 8.5 8.5 9.0 9.5 10.0 Polish
` ` ` Russian
W1 = Eliza(∅|beth|beth’s), W2 = Darcy(∅|’s), W3 = pr(ide|ided|oud|oudly|oudest), W4 = Jane(∅|’s) Koreanic
Korean
(a)
Turkic
ELIZABETH ELIZABETH
CHÂTEAU PARCPHILIPS
HIERMOTIF

Turkish
FIERTÉ
E

MR
È E
S
PROPRIÉTÉ
AIRE
ORGUEIL VOLONT É
IERS

FORTUNE

DARCY MISS

40 DARCYBINGLEY
SYMPATHIE
MISS
QUE

CARACTÈRE
BINGLEY
Cumul. counts

NETHERFIELD

VOITUREYEUX REGARD
OBSERVA

AIR
ER

TABLE
FRANCHISE
ŒIL

SOURIRE
ARRÊT
INSTANT TANTE
A

ROUTE
T
T
A IT
A
ER
AMI A
PRÉSENŒ
SOURI
T
A
E
S

SENTIMENTS SOIRÉEABAL
IL
CONNAISS
REGARDA
T
CE
ORGUEIL
ER
IMABLE
MOUR

ÎTRE
NCE
S
LONGUE
CAUS
SENT IMENTS
AIT
E
É
ER
Uralic
DARCY
RE
YEUX FRÈRE
CHÂTEAU T T

ROSINGSPOLITESSE
S
PERSONNES PEMBERLEY
AIRINSTANT
Finnish
MOT
GEORGIANA
PEMBERLEYSALON TÉMOIGNA IT
CHANGEMENT
S
DISTANCE
EMBARRAS
RI RE
CHÂTEAU
FIERTÉSYMPATHIE CARACTÈRE BAL
ATTACH
EMENT
VEILLE PEMBERLEY POLITEDSSE DEVENIR
JOUER DISTANCE A E PROMEN

ЭЛИЗАБЕТ
ATTACHEMENT CHEMIN GUÈRE

ДАРСИ
NEVEU MARCHE HURST

ЭЛИЗАБЕТ Hungarian
НЕЛОВКО
ДОСАД ПАРК А
ПОХВАЛ

20 МИСТЕР
ПРОВОДИТ

ДАРСИ
ПОМОАЛ
Ь
БЕСПОКОЙСТВО Г Л

ПРЕДЕ

ДАРСИ
А

ПЕМБЕРЛИ
А
ELIZABETH СКАЗАЛ ДРУ УИКХЕМ
ГОВОРИЛ
Г
ELIZABETH Vasconic
ТЬ

БИНГЛИМИСССЕСТР
Я

А А А
ОМ
У ЕР

БИНГЛИМИСС СИЛЬНО ЧУВСТВ


У
А
IN ЗЬ

HRA
ЕЙ ОМ
Е ТЬ

DARCY
Я

Ы IN
А
У
АХ

DARCY PUH
ERRA
TUNT
СИДЕ УЛЫБК ЕЕ

МЫСЛ
ОЙ NEN М
О ASUM
Л
И SI

ОБЪЯСН
ПОСТУПИ SUVAITSIT
У I NNU
MENETTELY AN
УДИВЛЕНЕ N
ПОСТУПИ Е
ПОДНЯЛ EI
A
Л
Ь
ВСТРЕЧ
И A
О HELLÄ SYVÄ

DARCY
SIEVÄN
ГОСТИН
YYT
ТИ U Л
И
Л

ДРУ
ЕНИЯ

КОМНАТУПРЕДЛОЖ PUHU
STI

N
ЕЙ Я

Г
Ä
ТЬСЯ
ИТЬ
ЕНИЕ
E ГОРДОСТЬ
Ä

EL
E К

РАССКАЗАЛ TUNNUSTA
А E

ПОВЕДЕНИ
УЮ
L ОК A

ПРОВЕ
И

ОБЪЯСНИ
L
A И
Л А ПРЕДПОЛОЖИ
AJ
ЫПУТЕШЕСТВИ ЕНИЕ A
L
О
ЛЮБЕЗНОСТ
A
IK A A U А
ПЕШ
AJA NETHERFIELDI
A
СТИ ЕН И З

NT ITI
ПОС IKN
AA ТЬ

УЛЫБК ВЫЗ
ВА НРАВИЛ
ХАРАКТЕР
N
A
ТЬ И ЫВА

ПРОВЕЛ

Basque
E
BINGLEY
N

НРАВИЛ
Л S Ä

СМОТРЕТЬВЫ
ТСЯ

ГАРДИНЕР ПО TELLA

YLPEÄ
ОЙ Л

YSTÄVÄN
ТЬ
TELLA
ПО
ТСС
ЯЬ
Y S

МОЛЧАНИ BINGLEY
У
СТИ
ЛЮБЕЗНОСТ ПЛЕМЯННИЦ
А

МОЛЧАНИ
N KOHTELIA
N СЯ

ДОВОЛЬНО
К A

ГОРДОСТ МАНЕРЫ
Е
ГЛЯДЕТ
Е Л SÄ
ТЕТ А KATSE
Е

ДОВОЛЬНО БРАТА Ь SUU


N А
IHMET
A TI
HUOMA
К
ПОДРУГ ПРОГУЛКУ МАНЕР L A
I
KATSEL
УШК IS I

MIE
БЕДН ТЯЖЕЛО
N

НЕДОСТА
А

TOSI
Ы TEL
ОКН
N
СМУЩЕН ПРИВЕТ В ТК
A IO O MA
ДЕРБИШИРЕ
SILMÄ ИЗУМЛЕНИЕ ПЕМБЕРЛИ AJATUS
HEN
I
МУЗЫК ЛИ
KSE TULIJA АЯ
I ARVO
N
N
SSÄ СТ И A ОЙ DEKSI
БЕСПОКОЙСТВО
OMA T

VIERAKS ПРИВЕТСТВИ AANKIN


N
NAURA UTELIAISUU
A ПРОГУЛК ЛИ
Ä S

БАЛ
N
ТЩЕСЛАВИЕ
ХВАТИ KOHTELIAISUU KAUNIAS HÄMMÄSTY E
НЕПРИЯЗНЬ
A TI
GARDINER I А
У
SÄ N
ILMAIS
S
I
ГЛУБОКЕН ПРЕНЕБРЕ TTA
TA А
О
ЛО
HYMYIL KUVA
KUVA AJATU HENKILÖ
ПОМЕСТЬЕЕ
У
E ДЕРБИШИРЕНЕПРИЯЗНЬ
E ЧЬ
ИСПОЛНИ
N
LEN
MUIST N
ДЕРБИШИРЕТЩЕСЛАВ
WICKHAMI
VASTENMIELIS
S
KOHTELIAISUU
A TI
СЫН ОТЗЫВ ПОЗНАКОМИТЬ
TUTTAVUUT I
А
И
Я

IHASTU

ERO IL
YLPEÄ
MIEL
ДЖОРДЖИАН VAIKUT
HYMYIL
E
ON
Y
N
Y
U

SÄVYYN
L TTÄVÄ


SISÄLL
N
LEN SISARENPOIKA

SÄVYYN ISTUI
S SÄ Y

SATTUI VELJENSÄ VASTAU


SÄ KS
S

PARI
KEHOIT KYSYMYKS
A E N
ÄN

KAUNIS
KIUSA
MATKAA TARK
JOUT
MAHDOTONTA HIENO ONNET ON
A
IT
TÄTI SALATA HELLÄ IN
VAUNU J
HUONEES EA
EN
KÄYTTÄYTYM S
N
SSA

NÄKÖ KOVA KASVOI KOSINTAAN


KÄYTTÄYTY
KUVA
AKKUNASTA TULIJA Y MISE

#Topics:
VOITT A
INEN
SALI IN
S A
PEMBERLEY
ASETTUI VASTENMIELIS E
KIUSAU KÄRSI M YY

YLISTY

0 20 40 60 80 100120
MAHTAVA Ä
ELY
KEHOIT
T
S

0
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8
[i] [i] [i]
correct close incorrect
− log |λ(R )| − log |λ(R )| − log |λ(R )|
(b) (c)
Fig. 5. Semantic cliques and their applications to word translation. (a) Empirical distributions of hlog Lij i in Pride and Prejudice, as gray and colored
dots with radii 4√1n , compared to Gaussian model αij (`) (colored curves parametrized by (7) and (8)). The numerical samplings of Wj ’s exhaust
ij
all the textual patterns available in the novel, including topical word patterns, non-topical word patterns and function words. Only those textual
patterns with over 40 occurrences are displayed as data points. Inset of each frame shows the semantic clique Si surrounding topic Wi (painted in
black ), color-coded by the αij (hlog Lij i) score. The areas of the bounding boxes for individual word patterns are proportional to the components of
π [i] (the equilibrium state of P[i] ). (b) Distributions for the magnitudes of eigenvalues (LISF) in the recurrence matrices R[i] , for three concepts from
four versions of Pride and Prejudice. The color encoding for languages follows Fig. 4. The largest beηi c magnitudes of eigenvalues are displayed
as solid lines, while the remaining terms are shown in dashed lines. Inset of each frame shows the semantic clique Si , counterclockwise from
top-left, in French, Russian and Finnish. (c) Yields from bipartite matching of LISF (see Fig. 6 for English-French) for topical words between the
English original of Pride and Prejudice and its translations into 13 languages out of 5 language families.

P(hlog Lij i > `) by a Gaussian model αij (`) (colored curves We invite a topical pattern Wj to the semantic
in Fig. 5a) clique Si (Figs. 5a and b, insets) surrounding Wi , if
r Z ∞ n (x−` )2 min{αij (hlog Lij i), αji (hlog Lji i)}R > α∗ for a standard
nij − ij 2β i 1 2
P(hlog Lij i > `) ≈ αij (`) := e i d x, Gaussian threshold α∗ := √12π −∞ e−x /2 d x ≈ 0.8413.
2πβi `
This operation emulates the brainstorming procedure of a
(7)
human reader, who associates one word with another only
whose mean and variance are deducible from nij and Lii when they stay much closer than two randomly picked
(see Theorem 4 in Appendix B.4): words, according to his/her impression.
hLii log Lii i hLii (`i − log Lii )2 i Indeed, by numerical brainstorming from Wi , our se-
`i := − 1, βi := . (8) mantic cliques Si (Figs. 5a and b, insets) inform us about
hLii i hLii i
their center word Wi , through several types of semantic
The parameters in the Gaussian model are justified by the relations, including, but not limited to
relation between hitting and recurrence times [8] on an
ergodic Markov chain with detailed balance, and become • Synonyms (pride and vanity in English, orgeuil and
asymptotically exact if distinct word patterns are statisti- fierté in French, etc.);
cally independent (such as α13 , α24 , α31 , α34 in Fig. 5a). • Temperaments (Elizabeth, a delightful girl, often
Here, statistical independence justifies additivity of vari- laughs, corresponding to French verbs sourire and

ances, hence the nij factor in (7); sums of independent rire);
samples of log Lij become asymptotically Gaussian, thanks • Co-references (e.g. Darcy as a personification of
to the central limit theorem. Failing that, the actual ranking pride);
of hlog Lij i may deviate from the Gaussian model prediction • Causalities (such as pride based on fortune).
in (7), such as the intimately related pairs of words Eliz-
abeth/Darcy, Elizabeth/Jane, Darcy/Elizabeth, Darcy/pride and On a local graph with vertices Si = {Wi1 = Wi , Wi2 ,
pride/Darcy. . . . , WiNi }, we specify the connectivity of each directed
[i]
edge by a localized Markov matrix P[i] = (pjk )1≤j,k≤Ni .
2.3.3 Markov criteria for semantic cliques This localized Markov matrix is the row-wise normaliza-
Empirically, we find that higher αij (`) scores point to closer tion of an Ni × Ni subblock of P with the same set of
[i] [i]
affinities between word patterns (Fig. 5a), attributable to vertices as Si . Resetting the entries p1k and pj1 as zero,
kinship (Elizabeth, Jane), courtship (Darcy, Elizabeth), disposi- one arrives at the localized recurrence matrix R[i] . We call
tion (Darcy, pride) and so on. Our robot reader automatically R[i] a recurrence matrix, because one can use it to compute
detects such affinities, without references other than the the distribution for recurrence times to the Markov state
novel itself. Therefore, we can use the αij (`) scores as guides Wi in Si . As we will see soon in the applications below,
to numerical approximations of semantic fields, hereafter the eigenvalues of R[i] , when properly arranged, become
referred to as semantic cliques. language-independent semantic fingerprints.

0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 6

TIONS

ILITÉ

MENTS
Wjfr

DEMOISELLES

RNELLE

NETHERFIELD

FITZWILLIAM
SENTIMENTS

INVITATION

IMPOSSIBLE

IT

DERBYSHIRE
ELIZABETH

CATHERINE

URS

CHARLOTTE

PEMBERLEY

NCE

I
N

ER

POLITESSE
OFFICIERS

N
RE

S
SE

GARDINER

URE

ANT

IT
S

FÉLICITA É
E

ALES

NT

REMERCIE
A

ERRAI

PROMESSE TTA

RÉGIMENT
DISTANCE

BRIGHTON
BINGLEY

WICKHAM

A IT

COLLINS
MARIAGE
S

R
S

LONDRES

ÉE
ER
ÉS

EZ
T

COUSINE S

COLONEL

MERYTON

VOITURE
ER Z

ENFANTS
CE

WILLIAM
ROSINGS
R

FORTUNE
SOCIÉTÉ

FORSTER
SEMAINE
ENVOYER E

PHILIPS

ABSENCE TE

T
BENNET

JEUNES
A ME

R
T

ÉE
ER ÉS

IE

LETTRE

SOIRÉE
VISITE

RE
E

VI

NÉES

IGNORAE
S

É
IE
A

BOURGH

PAUVRE

MOTIFS
s(Wien ,Wjfr )

DARCY

BONNE
MIEUX

MES

LYDIA
CHÈRE

AMAN

FRÈRE
LIZZY
TANTE
T

LUCAS
ONCLE
ÉCRIA

DÎNER
KITTY
S

MILLE
REFUS

NIÈCE

ANT

TABLE
NEVEU
JANE
MISS

LADY
DEUX

PÈRE
MÈRE

YEUX
AT

ÉCRI

MARI

MARY

RIRE
EZ

PARC
MRS

Œ IL

ANS

BAL
SIR

DIX
Wien 1
MR

M
MR
ELIZABETH ’S

DARCY
MRS
’S

0.8
BENNET ’S

BETTER S

GOOD
BINGLEY
0.6
’S

LADY
IE S

SHIP

MISS
JANE ’S

FEELING
E T
S

MARRI Y
AGE
ED
ING

0.4
WICKHAM ’S

COLLINS ’S

YOUNG ER
ST

DEAR
LYDIA
EST

’S
0.2
DAY S

MOTHER ’S

FATHER
0
’S

TWO
LETTER S

CATHERINE
Fig. 6. Automated alignments
’S

GARDINER ’S
S

LIZZY
WRITE
of vectorized topics via bipartite
ING
TEN
O E

CRIED
VISIT
matching of semantic similarities.
ORS
ING

AUNT ’S

CHARLOTTE
The semantic similarities
’S

LUCAS ES

BROTHER ’S

EVENING
SIR
S

s(Wien , Wjfr ) are computed


NETHERFIELD
UNCLE
KITTY
’S
for selected topics (sorted by
EYES
CARRIAGE S
descending nii ≥ 20) in two
versions of Pride and Prejudice.
ED
YING

COLONEL
TOWN
YEARS
COUSIN ’S
S Rows and columns filled with
FORTUNE
ATE

MERYTON
INVITATION
zeros are not shown. Cross-hairs
meet at optimal nodes that solve
ED
ING

LONDON
THANK ED
FUL
ING

HUSBAND
PEMBERLEY
’S
S

the bipartite matching problem.


OFFICERS
ROSINGS
IOU

The thickness of each horizontal


WEEK
(resp. vertical) cross-hair is
S

SEND
T

WILLIAM
inversely proportional to the row-
’S

PROMISED S
ING

IMPOSSIBLE
LAUGH
BALL
ED
ING

S
wise (resp. column-wise) ranking of
POOR
FORSTER
CHILDREN
LY

the similarity score for the optimal


MARY
BOURGH
node. Green (resp. amber ) cross-
hair indicates an exact (resp. a
’S

DE
FITZWILLIAM ’S

DINNER
TEN
S

close but non-exact) match. At the


SOCIETY
THOUSAND
PHILLIPS
same confidence level (0.7) for
similarities, this experiment has
’S

DINE ING
D

NIECE
better recall than Fig. 3c, without
S
’S

TABLE S

ABSENCE T

REFUSE
much cost of precision.
AL
D
ING

POLITENESS LY

MOTIVE S

NEPHEW ’S
S

DERBYSHIRE
BRIGHTON
CONGRATULATEIONS
D

IGNORANT
CE

REGIMENT ’S

PARK
DISTANCE T

3 A PPLICATIONS Via bipartite matching (Fig. 6) of word vectors vi across


languages, our algorithm translates words from parallel
3.1 Automated word translations from bilingual docu- texts at very high precision (Fig. 5c), being competitive with
ments state-of-the-art algorithms for bilingual word mapping [11],
Experimentally, we resolve the connectivity of an individ- [12].
ual pattern Wi through the recurrence spectrum σ(R[i] ) Unlike the vector bi (Fig. 3c) that captures only chapter-
(Fig. 5b). The dominant eigenvalues of R[i] are concept- scale features of Wi , the semantic fingerprint vi in (9)
specific while remaining nearly language-independent (a characterizes the kinetic behavior of Wi on all the long-
localized version of the invariance in Fig. 4c). Such empirical range time scales.
evidence motivates us to define the language-independent
semantic fingerprint (LISF) of a word pattern Wi by a de- Given a topical pattern WiA in language A, its semantic
scending list for the magnitudes of eigenvalues fingerprint viA (a descending list of recurrence eigenvalues,
as in Fig. 5b) allows us to numerically locate a semantically
vi = (|λ1 (R[i] )|, |λ2 (R[i] )|, . . . ), (9) close pattern in a parallel text written in another language
B, in two steps:
computable from its semantic clique Si . We zero-pad this
(1) Divide the document into K chapters, and define the
vector from the (beηi c + 1)st component onwards, where
semantic similarity function as s(WiA , WjB ) := sR (viA , vjB ) if
ηi is the Kolmogorov–Sinai entropy production rate of the s
( )
Markov matrix P[i] , measured in nats per word.4 √ kbA B
A B i ∧ bj k0
sR (bi , bj ) ≥ max 1 − 0.07 K, 1 −
P kbA B
i ∨ bj k1
4. The entropy production rate η(P) := − i,j πi pij log pij [9,
(4.27)] of a Markov matrix P represents the weighted average (as- (10)
signing probability mass
P πi to the ith Markov state) of Boltzmann’s
partition entropies − j pij log pij [10, §8.2]. We have η(P) ≤ log N
(which is a ballpark screening more robust than Fig. 3c, with
for an N × N Markov matrix P with strictly positive entries [10, kbk0 counting the number of non-zero components in b)
Theorem 14.1]. and sR (viA , vjB ) ≥ 0.7; s(WiA , WjB ) := 0 otherwise.

0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 7

WikiQA-Q26: How FRANK D (1) Anne Frank and her sister, Margot , were eventually transferred to the Bergen- MAP
0.6190 CNN
MRR
0.6281
did Anne Frank die? ANNE Belsen concentration camp , where they died of typhus in March 1945. LISF∗
’S

EATH
IED
S

0.6091 0.6268

“Anne ⇒ GOSLAREARL ⇒ Holocaust


TYPHUS (2) Annelies "Anne" Marie Frank (, ?, ; 12 June 1929early March 1945) is one of the most discussed Jewish victims of the
’S
0.5993 LCLR 0.6086
SURVIVEDCENT ER
0.5899 LISF 0.6060
Reference:
RERESTORED
FRANKFURT . IER HOLOCAUST
1945CAMP
Y
(3) Otto Frank, the only survivor of the family, returned to Amsterdam after the war to find that Anne’s diary had been saved, and his efforts 0.5110 PV 0.5160
led to its publication in 1947.
EXISTED 0.4891 Word Count 0.4924
S

1933FILM
Frank” (Wikipedia)
BERGEN-BELSEN S
(4) As persecutions of the Jewish population increased in July 1942, the family went into hiding in the hidden rooms of Anne’s father, Otto Frank ’s, office building.
DOCUMENTARY CONCENTRATION
ACTIONPRIVATE (5) The Frank family moved from Germany to Amsterdam in 1933, the year the Nazis gained control over Germany.
0.3913 Random Sort 0.3990

(a) (b)
Fig. 7. Applications of semantic cliques to question-answering. (a) A construction of semantic clique Q ∪ Q 0 (based on Q = {Anne, Frank, die})
weighted by the PageRank equilibrium state π e and subsequent question-answering. Top 5 candidate answers, with punctuation and spacing as
given by WikiQA, are shown with font sizes proportional to the entropy production score in (11). Here, the top-scoring sentence with highlighted
background is the same as the official answer chosen by the WikiQA team. Like a human reader, our algorithm automatically detects the place
“Bergen-Belsen concentration camp”, cause “typhus”, and year “1945” of Anne Frank’s death. (b) Evaluations of our model (LISF and LISF∗ ) on the
WikiQA data set, in comparison with established algorithms.
(2) Solve
P a bipartite matching problem (Fig. 6) that maxi- weights in the nodes of Q ∪ Q 0 are computed from the
mizes i,j s(WiA , WjB ), using the Hungarian Method [13] equilibrium state π e , via the PageRank algorithm.
e of P
attributed to Jacobi–Kőnig–Egerváry–Kuhn [14]. We then test our semantic model (LISF in Fig. 7b) on all
the 1242 questions in the WikiQA data set, each of which
is accompanied by at least one correct answer located in a
3.2 Machine-assisted text comprehension on WikiQA designated Wikipedia page. Our algorithm’s performance
data set is roughly on par with LCLR and CNN benchmarks [18],
By automatically discovering related words through numer- improving upon the baseline by significant margin. This is
ical brainstorming (Figs. 5a and b, insets), our semantic perhaps remarkable, considering the relatively scant data
cliques Si are useful in text comprehension and ques- at our disposal. Unlike the LCLR approach, our numerical
tion answering. We can expand a set of question words discovery of synonyms does not draw on the WordNet
S database [19] or pre-existent corpora of question-answer
Q = {Wq1 , . . . , WqK } into Q ∪ Q 0 = K k=1 Sqk , by bringing
together the semantic cliques Sqk generated from a refer- pairs. Unlike the CNN method, we do not need pre-trained
ence text by each and every question word Wqk . word2vec embeddings [20] as semantic input.
As before, we construct a localized Markov matrix P = Moreover, our algorithm (LISF∗ in Fig. 7b) performs
(pij )1≤i,j≤N on this subset of word patterns Q ∪ Q 0 . We slightly better on a subset of 990 questions that do not re-
further use the Brin–Page damping [15] to derive an ergodic quire quantitative cues (How big? How long? How many? How
Markov matrix P e = (peij )1≤i,j≤N , where peij = 0.85pij + old? What became of? What happened to? What year? and so on).
0.15 This indicates that, with a Markov chain description of two-
N .
By analogy to the behavior of internet surfing [15], body interactions between topics, our structural model fits
[16], we model the process of associative reasoning [17] associative reasoning better than rule-based reasoning [17],
as a navigation through the nodes Q ∪ Q 0 according to while imitating human behavior in the presence of limited
e , which quantifies the click-through rate from one idea data. To enhance the reasoning capabilities of our algorithm,
P
it is perhaps appropriate to apply a Markov random field
to another. The PageRank recursion [16] ensures a unique
e . If our question Q and a [21, §4.1.3] to graphs of word patterns, to capture many-
equilibrium state π e attached to P
body interactions among different topics.
candidate answer A contain, respectively, words from WQ1 ,
. . . , WQm ∈ Q and WA1 , . . . , WAn ∈ Q ∪ Q 0 (counting
multiplicities, but excluding function words and patterns 4 C ONCLUSION
with fewer than 3 occurrences in the reference document),
In our current work, we define semantics through alge-
then we assign the following entropy production score
braic invariants that are concept-specific and language-
m X
n
X independent. To construct such invariants, we develop a
F [Q, A] := − eQi peQi Aj log peQi Aj
π (11) stochastic model that assigns a semantic fingerprint (list of
i=1 j=1
recurrence eigenvalues) to each concept via its long-range
to this question-answer pair.5 contexts. Consistently using a single Markov framework, we
A sample work flow is shown in Fig. 7a, to illustrate are able to extract topics (Figs. 2e, 3b,b’), translate topics
how our rudimentary question-answering machine handles (Figs. 3c, 4c, 5b,c, 6) and understand topics (Figs. 5a,b,
a query. To answer a question, we use a single Wikipedia 7a,b), through statistical mining of short and medium-length
page (without infoboxes and other structural data) as the texts. In view of these three successful applications, we are
only reference document and training source. Like a typical probably close to a complete set of semantic invariants, after
human reader of Wikipedia, our numerical associative rea- demystifying the long-range behavior of human languages.
soning generates a weighted set of nodes Q ∪ Q 0 (presented Notably, our algorithms apply to documents of moderate
graphically as a thought bubble in Fig. 7a), without the help lengths, similar to the experience of human readers. This
of external stimuli or knowledge feed. Here, the relative contrasts with data-hungry algorithms in machine learning
[18], [22], which utilize high-dimensional numerical repre-
5. One may compare the score F [Q, A] to sentations of words and phrases [11], [12], [20], [23] from
Pthe PKolmogorov–Sinai
entropy production rate [9, (4.27)] η(P) = − N i=1
N
j=1 πi pij log pij
large corpora. Our semantic mechanism exhibits universal-
of a Markov matrix P = (pij )1≤i,j≤N . The score F [Q, A] is modeled ity on long-range linguistic scales. This adds to our quanti-
after Boltzmann’s partition entropies, weighted by words in the ques- tative understanding of diversity on shorter-range linguistic
tion, and sifted by topics in the answer. Such a weighting and sifting
method is analogous to the definition of scattering cross-sections in scales, such as phonology [24], [25], [26], morphology [27],
particle physics. [28], [29], [30] and syntax [3], [30], [31], [32], [33].

0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 8

Thanks to the independence between semantics and [18] Y. Yang, W.-t. Yih, and C. Meek, “WikiQA: A challenge dataset
syntax [3], our current model conveniently ignores the non- for open-domain question answering,” in Proceedings of the 2015
Conference on Empirical Methods in Natural Language Processing
Markovian syntactic structures which are essential to fluent (EMNLP). Lisbon, Portugal: Association for Computational Lin-
speech. In the near future, we hope to extend our framework guistics, 2015.
further, to incorporate both Markovian and non-Markovian [19] C. Fellbaum, Ed., WordNet: An Electronic Lexical Database (Language,
features across different ranges. The Mathematical Principles Speech, and Communication). Cambridge, MA: MIT Press, 1998.
[20] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,
of Natural Languages, as we envision, must and will combine “Distributed representations of words and phrases and their com-
the statistical analysis of a Markov model with linguistic positionality,” in Advances in Neural Information Processing Systems
properties on shorter time scales that convey morphological 26. La Jolla, CA: NIPS, 2013, pp. 3111–3119.
[21] D. Mumford and A. Desolneux, Pattern theory: the stochastic analysis
[27], [28], [29], [30] and syntactical [3], [30], [31], [32], [33] of real-world signals. Natick, MA: A K Peters, 2010.
information. [22] V. Tshitoyan, J. Dagdelen, L. Weston, A. Dunn, Z. Rong,
O. Kononova, K. A. Persson, G. Ceder, and A. Jain, “Unsuper-
vised word embeddings capture latent knowledge from materials
ACKNOWLEDGMENTS science literature,” Nature, vol. 571, no. 7763, pp. 95–98, 2019.
[23] S. Arora, Y. Li, Y. Liang, T. Ma, and A. Risteski, “A latent variable
We thank N. Chomsky and S. Pinker for their inputs on model approach to PMI-based word embeddings,” Transactions of
several problems of linguistics. We thank X. Sun for discus- the Association for Computational Linguistics, vol. 4, pp. 385–399,
2016.
sions on neural networks. We thank X. Wan, R. Yan and D. [24] M. A. Nowak and D. C. Krakauer, “The evolution of language,”
Zhao for their suggestions on experimental design, during Proc. Natl. Acad. Sci. USA, vol. 96, no. 14, pp. 8028–8033, 1999.
the early stages of this work. We thank two anonymous [25] C. Everett, D. E. Blasí, and S. G. Roberts, “Climate, vocal folds,
reviewers, whose thoughtful comments help us improve the and tonal languages: Connecting the physiological and geographic
dots,” Proc. Natl. Acad. Sci. USA, vol. 112, pp. 1322–1327, 2015.
presentation of this work. [26] C. Everett, “Languages in drier climates use fewer vowels,” Fron-
tiers in Psychology, vol. 8, p. Article 1285, 2017.
[27] S. Pinker, “Words and rules in the human brain,” Nature, vol. 387,
R EFERENCES no. 6633, pp. 547–548, 1997.
[28] W. D. Marslen-Wilson and L. K. Tyler, “Dissociating types of
[1] F. de Saussure, Cours de linguistique générale, 5th ed. Paris, France: mental computation,” Nature, vol. 387, no. 6633, pp. 592–594, 1997.
Payot, 1949. [29] E. Lieberman, J.-B. Michel, J. Jackson, T. Tang, and M. A. Nowak,
[2] S. Pinker and A. Prince, “On language and connectionism: Analy- “Quantifying the evolutionary dynamics of language,” Nature, vol.
sis of a parallel distributed processing model of language acquisi- 449, no. 7163, pp. 713–716, 2007.
tion,” Cognition, vol. 28, no. 1-2, pp. 73–193, 1988. [30] M. G. Newberry, C. A. Ahern, R. Clark, and J. B. Plotkin, “Detect-
[3] N. Chomsky, Syntactic Structures, 2nd ed. Berlin, Germany: ing evolutionary forces in language change,” Nature, vol. 551, no.
Mouton de Gruyter, 2002. 7679, pp. 223–226, 2017.
[4] A. D. Friederici, “The neurobiology of language comprehension,” [31] S. Pinker, “Survival of the clearest,” Nature, vol. 404, no. 6777, pp.
in Language Comprehension: A Biological Perspective, A. D. Friederici, 441–442, 2000.
Ed. Berlin, Germany: Springer, 1999, ch. 9, pp. 265–304. [32] M. A. Nowak, J. B. Plotkin, and V. A. A. Jansen, “The evolution of
[5] J. P. Herrera and P. A. Pury, “Statistical keyword detection in syntactic communication,” Nature, vol. 404, no. 6777, pp. 495–498,
literary corpora,” Eur. Phys. J. B, vol. 63, pp. 135–146, 2008. 2000.
[6] M. Ružička, “Anwendung mathematisch-statistischer Methoden [33] M. Dunn, S. J. Greenhill, S. C. Levinson, and R. D. Gray, “Evolved
in der Geobotanik (Synthetische Bearbeitung von Aufnahmen),” structure of language shows lineage-specific trends in word-order
Biológia (Bratislava), vol. 13, pp. 647–661, 1958. universals,” Nature, vol. 473, no. 7345, pp. 79–82, 2011.
[7] Y. Zhou and X. Zhuang, “Robust reconstruction of the rate con- [34] C. Berg, J. P. R. Christensen, and P. Ressel, Harmonic Analysis on
stant distribution using the phase function method,” Biophys. J., Semigroups: Theory of Positive Definite and Related Functions, ser.
vol. 91, no. 11, pp. 4045–4053, 2006. Graduate Texts in Mathematics. New York, NY: Springer-Verlag,
[8] N. Haydn, Y. Lacroix, and S. Vaienti, “Hitting and return times in 1984, vol. 100.
ergodic dynamical systems,” Ann. Probab., vol. 33, pp. 2043–2050, [35] S. M. Ross, Stochastic Processes, 2nd ed. Hoboken, NJ: John Wiley
2005. & Sons, 1995.
[9] T. M. Cover and J. A. Thomas, Elements of Information Theory,
2nd ed. Hoboken, NJ: Wiley Interscience, 2006.
[10] M. Pollicott and M. Yuri, Dynamical Systems and Ergodic Theory, ser.
London Mathematical Society Student Texts. Cambridge, UK:
Cambridge University Press, 1998, vol. 40.
[11] A. Joulin, P. Bojanowski, T. Mikolov, H. Jégou, and E. Grave, “Loss
in translation: Learning bilingual word mapping with a retrieval
criterion,” in Proceedings of the 2018 Conference on Empirical Methods Weinan E received his B.S. degree in mathe-
in Natural Language Processing. Brussels, Belgium: Association for matics from the University of Science and Tech-
Computational Linguistics, Oct.-Nov. 2018, pp. 2979–2984. nology of China in 1982, and his Ph.D. degree
[12] X. Chen and C. Cardie, “Unsupervised multilingual word embed- in mathematics from the University of California,
dings,” in Proceedings of the 2018 Conference on Empirical Methods in Los Angeles in 1989. He has been a full profes-
Natural Language Processing. Brussels, Belgium: Association for sor at the Department of Mathematics and the
Computational Linguistics, Oct.-Nov. 2018, pp. 261–270. Program in Applied and Computational Mathe-
[13] H. W. Kuhn, “The Hungarian Method for the assignment prob- matics, Princeton University since 1999. Prof.
lem,” Nav. Res. Logist. Q., vol. 2, pp. 83–97, 1955. E has made significant contributions to numer-
[14] ——, “A tale of three eras: The discovery and rediscovery of the ical analysis, fluid mechanics, partial differential
Hungarian Method,” Eur. J. Oper. Res., vol. 219, pp. 641–651, 2012. equations, multiscale modeling, and stochastic
[15] S. Brin and L. Page, “The anatomy of a large-scale hypertextual processes.
web search engine,” Comput. Networks ISDN, vol. 30, no. 1-7, pp. His monograph Principles of Multiscale Modeling (Cambridge Univer-
107–117, 1998. sity Press, 2011) is a standard reference for mathematical modeling of
[16] L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank physical systems on multiple length scales. He was awarded the Peter
citation ranking: Bringing order to the web,” Stanford InfoLab, Henrici Prize in 2019 for his recent contributions to machine learning.
Tech. Rep., 1999, http://ilpubs.stanford.edu:8090/422/.
[17] S. A. Sloman, “The empirical case for two systems of reasoning.”
Psychol. Bull., vol. 119, no. 1, pp. 3–22, 1996.

0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2020.3022533, IEEE
Transactions on Pattern Analysis and Machine Intelligence
IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE 9

Yajun Zhou received his B.S. degree in Physics


from Fudan University (Shanghai, China) in
2004, and his Ph.D. degree in Chemistry from
Harvard University in 2010. Having undertaken
postdoctoral training in applied and computa-
tional mathematics at Princeton University, he
now works in the Laboratory for Natural Lan-
guage Processing & Cognitive Intelligence at
Beijing Institute of Big Data Research. Dr. Zhou
has published in numerical analysis, special
functions, number theory, electromagnetic the-
ory, quantum theory, probability theory and stochastic processes, solv-
ing several open problems in the related fields.
A book chapter in Elliptic integrals, elliptic functions and modular
forms in quantum field theory (Springer, 2019) surveys his proofs of
various mathematical conjectures. Dr. Zhou is a polyglot, with a working
knowledge of 24 languages spanning 7 language families.

0162-8828 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Cornell University Library. Downloaded on September 08,2020 at 12:51:11 UTC from IEEE Xplore. Restrictions apply.

You might also like