Markov Chains

This document discusses how matrices, Markov chains, and digraphs are used by Google in its PageRank algorithm. It begins with an introduction to matrices, Markov chains, and digraphs. It then explains that PageRank assigns importance values to web pages based on the page's links, with more important pages that link to a page causing it to rank higher. PageRank models the web as a digraph and uses the eigenvector of the transition matrix's stationary distribution as the importance values. This iterative process produces PageRank values that prioritize search results.

Uploaded by

Dursun -Avrupa Yakası-

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

147 views37 pages

Markov Chains

Uploaded by

Dursun -Avrupa Yakası-

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 37

Matrices, Digraphs, Markov

Chains & Their Use by

Google
Leslie Hogben
Iowa State University and
American Institute of Mathematics
Bay Area Mathematical Adventures
February 27, 2008
With material from Becky Atherton
Matrices
Markov Chains
Digraphs
Googles PageRank
Outline
Introduction to Matrices
A matrix is a rectangular array of
numbers
Matrices are used to solve systems of
equations
Matrices are easy for computers to
work with
Matrix arithmetic
Matrix Addition

1 2
3 4

(

(
+
3 1
2 0

(

(
=
1+ 3 2 + (1)
3+ (2) 4 + 0

(

(
=
4 1
1 4

(

(

1 2
3 4

(

(
3 1
2 0

(

(
=
(1)(3) + (2)(2) (1)(1) + (2)(0)
(3)(3) + (4)(2) (3)(1) + (4)(0)

(

(
=
1 1
1 3

(

(
Matrix Multiplication
At each time period, every object in the
system is in exactly one state, one of
1,,n.
Objects move according to the
transition probabilities: the probability
of going from state j to state i is t
ij

Transition probabilities do not change
over time.
Introduction to Markov Chains
The transition matrix of a Markov chain
T = [t
ij
] is an n n matrix.
Each entry t
ij
is the probability of
moving from state j to state i.
0 s t
ij
s 1
Sum of entries in a column must be
equal to 1 (stochastic).
Example: Customers can choose from three major
grocery stores: H-Mart, Freddys and Shoppers
Market.

Each year H-Mart retains 80% of its
customers, while losing 15% to Freddys and
5% to Shoppers Market.
Freddys retains 65% of its customers, loses
20% to H-Mart and 15% to Shoppers
Market.
Shoppers Market keeps 70% of its
customers, loses 20% to H-Mart and 10% to
Freddys.

T =
.80 .20 .20
.15 .65 .10
.05 .15 .70

(

(
(
(
Example: The transition matrix.

Look at the calculation used to determine the
probability of starting at H-Mart and shopping
there two year later:
We can obtain the same result by multiplying row
one by column one in the transition matrix:

P(HH)P(HH) + P(HF)P(FH) + P(HS)P(SH)
=.80*.80+.15*.2+.05*.2 =.68

.80 .20 .20
.15 .65 .10
.05 .15 .70

(

(
(
(
.80 .20 .20
.15 .65 .10
.05 .15 .70

(

(
(
(
=
.68 .32 .32
.22 .47 .16
.10 .21 .52

(

(
(
(
This matrix tells us the probabilities of going
from one store to another after 2 years:
Compute the probability of shopping at each
store 2 years after shopping at Shoppers
Market:

T
2
=
.68 .32 .32
.22 .47 .16
.10 .21 .52

(

(
(
(

.68 .32 .32
.22 .47 .16
.10 .21 .52

(

(
(
(
0
0
1

(

(
(
(
=
.32
.16
.52

(

(
(
(
If the initial distribution was evenly
distributed between H-Mart, Freddys, and
Shppers market, compute the distribution
after two years:

.68 .32 .32
.22 .47 .16
.10 .21 .52

(

(
(
(
.333
.333
.333

(

(
(
(
=
.44
.285
.275

(

(
(
(
To utilize a Markov chain to compute
probabilities, we need to know the initial
probability vector q
(0)

If there are n states, let the initial probability

vector be where

q
i
is the probability of being in state i initially
All entries 0 s q
i
s 1
Column sum = 1

q
(0)
=
q
1
q
n

(

(
(
(
What happens after 10 years?

T
10
=
.50 .50 .50
.28 .28 .28
.22 .22 .22

(

(
(
(

T
10
q
1
q
2
q
3

(

(
(
(
=
.50 .50 .50
.28 .28 .28
.22 .22 .22

(

(
(
(
q
1
q
2
q
3

(

(
(
(
=
.50
.28
.22

(

(
(
(
Example:

Let q
(k)
be the probability distribution after k
steps.
We are iterating q
(k+1)
= T q
(k)

Eventually, for a large enough k,
q
(k+1)
= q
(k)
= s
Resulting in s = T s

s is called a steady state vector

s =q
(k)
is an eigenvector for eigenvalue 1
In the grocery example, there was a unique
steady state vector s, and T q
(k)
s. This
does not need to be the case:

T =
0 1
1 0

(

(
, T
2k
= I =
1 0
0 1

(

(

T
2k+1
= T,
0 1
1 0

(

(
a
b

(

(
=
b
a

(

(
How can we guarantee convergence to an
unique steady state vector regardless of initial
conditions?
One way is by having a regular transition matrix
A nonnegative matrix is regular if some power of
the matrix has only nonzero entries.
.15 1
.85 0
B
(
(

=
2
.8725 .15
.1275 .85
B
(
(

=
Digraphs
A directed graph (digraph) is a set of vertices
(nodes) and a set of directed edges (arcs) between
vertices
The arcs indicate relationships between nodes
Digraphs can be used as models, e.g.
cities and airline routes between them
web pages and links

How Matrices, Markov Chains
and Digraphs are used by
Google
How does Google work?
Robot web crawlers find web pages
Pages are indexed & cataloged
Pages are assigned PageRank values
PageRank is a program that prioritizes pages
Developed by Larry Page & Sergey Brin in 1998
When pages are identified in response to a
query, they are ranked by PageRank value

Why is PageRank important?
Only a few years ago users waited much longer for search
engines to return results to their queries.
When a search engine finally responded, the returned list
had many links to information that was irrelevant, and
useless links invariably appeared at or near the top of the
list, while useful links were deeply buried.
The Web's information is not structured like information in
the organized databases and document collections - it is
self organized.
The enormous size of the Web, currently containing
~10^9 pages, completely overwhelmed traditional
information retrieval (IR) techniques.
By 1997 it was clear that IR technology of the
past wasn't well suited for Web search
Researchers set out to devise new approaches.
Two big ideas emerged, each capitalizing on the
link structure of the Web to differentiate between
relevant information and fluff.
One approach, HITS (Hypertext Induced Topic
Search), was introduced by Jon Kleinberg
The other, which changed everything, is Google's
PageRank that was developed by Sergey Brin and
Larry Page

How are PageRank values assigned?
Number of links to and from a page give
information about the importance of a page.
More inlinks the more important the
page
Inlinks from good pages carry more
weight than inlinks from weaker pages.
If a page points to several pages, its
weight is distributed proportionally.
Imagine the World Wide Web as a directed
graph (digraph)
Each page is a vertex
Each link is an arc

1
2
3
4
5
6
A sample 6 page web (6
vertex digraph)

PageRank defines the rank of page i
recursively by
r
j
is the rank of page j
I
i
is the set of pages that point into page
i
O
j
is the set of pages that have outlinks
from page j

r
i
=
r
j
| O
j
|
jeI
i

For example, the

rank of page 2 in
our sample web:

3 5 1
2
4 3 3
r r r
r = + +
1
2
3
4
5
6
Since this is a recursive definition,
PageRank assigns an initial ranking equally
to all pages:

(0)
1
i
r
n
=
then iterates

r
i
=
r
j
| O
j
|
jeI
i

Process can be written using matrix notation.

Let q
(k)
be the PageRank vector at the k
th

iteration
Let T be the transition matrix for the web
Then q
(k+1)
= T q
(k)

T is the matrix such that t
ij
is the probability
of moving from page j to page i in one time
step
Based on the assumption that all outlinks are
equally likely to be selected.

1
, if there is a link from to
0, otherwise
j
ij
j i
O
t

=
Using our 6-node sample web:
Transition matrix:
1
2
3
4
5
6
0 1/ 2 1/ 4 1/ 3 0 0
1/ 3 0 1/ 4 0 1/ 3 0
1/ 3 1/ 2 0 0 0 0
1/ 3 0 1/ 4 0 1/ 3 0
0 0 1/ 4 1/ 3 0 0
0 0 0 1/ 3 1/ 3 0
T
(
(
(
(
(
(
(
(
(

=
To eliminate dangling nodes and obtain a
stochastic matrix, replace a column of zeros
with a column of 1/ns, where n is the number
of web pages.

0 1/ 2 1/ 4 1/ 3 0 0
1/ 3 0 1/ 4 0 1/ 3 0
1/ 3 1/ 2 0 0 0 0
1/ 3 0 1/ 4 0 1/ 3 0
0 0 1/ 4 1/ 3 0 0
0 0 0 1/ 3 1/ 3 0
T
(
(
(
(
(
(
(
(
(

=
0 1/ 2 1/ 4 1/ 3 0 1/ 6
1/ 3 0 1/ 4 0 1/ 3 1/ 6
1/ 3 1/ 2 0 0 0 1/ 6
1/ 3 0 1/ 4 0 1/ 3 1/ 6
0 0 1/ 4 1/ 3 0 1/ 6
0 0 0 1/ 3 1/ 3 1/ 6
T
(
(
(
(
(
(
(
(
(

=
Webs nature is such that T would not be
regular
Brin & Page force the transition matrix to be
regular by making sure every entry satisfies
0 < t
ij
< 1
Create perturbation matrix E having all
entries equal to 1/n
Form Google Matrix:
(1 ) , for some 0 1 T T E o o o = + s s
Using = 0.85 for our 6-node sample web:
0.85 (1 .85)
1/ 40 9/ 20 19/ 80 37/120 1/ 40 1/ 6
37/120 1/ 40 19/ 80 1/ 40 37/120 1/ 6
37/120 9/ 20 1/ 40 1/ 40 1/ 40 1/ 6
37/120 1/ 40 19/ 80 1/ 40 37/120 1/ 6
1/ 40 1/ 40 19/ 80 37/120 1/ 40 1/ 6
1/ 40 1/ 40 1/ 40 37/120 37/120 1/ 6
T T E
(
(
(
(
(
(
(
(
(

= +
=
By calculating powers of the transition matrix,
we can determine the stationary vector:
25
.2066 .2066 .2066 .2066 .2066 .2066
.1770 .1770 .1770 .1770 .1770 .1770
.1773 .1773 .1773 .1773 .1773 .1773
.1770 .1770 .1770 .1770 .1770 .1770
.1314 .1314 .1314 .1314 .1314 .1314
.1309 .1309 .1309 .1309 .1309 .1309
T

| |

|
\ .

=
(
(
(
(
(
(
(
(
(

Stationary vector for our 6-node sample

web:
.2066
.1770
.1773
.1770
.1314
.1309
(
(
(
(
(
(
(
(
(

= s
How does Google use this stationary vector?
Query requests term 1 or term 2
Inverted file storage is accessed
Term 1 doc 3, doc 2, doc 6
Term 2 doc 1, doc 3
Relevancy set is {1, 2, 3, 6}

s
1
=.2066, s
2
=.1770,
s
3
=.1773, s
6
=.1309
Doc 1 deemed most
important
.2066
.1770
.1773
.1770
.1314
.1309
(
(
(
(
(
(
(
(
(

= s
Adding a perturbation matrix seems reasonable,
based on the random jump idea- user types in
a URL
This is only the basic idea behind Google, which
has many refinements we have ignored
PageRank as originally conceived and described
here ignores the Back button
PageRank currently undergoing development
Details of PageRanks operations and value of o
are a trade secret.

Updates to Google matrix done periodically

Google matrix is HUGE

Sophisticated numerical methods are be used
Thank you!

If You Asked Me If I Love Him (I'd Lie) (Larry-AO3)
No ratings yet
If You Asked Me If I Love Him (I'd Lie) (Larry-AO3)
238 pages
Essentials of Human Diseases and Conditions 7th Edition TEXTBOOK
33% (3)
Essentials of Human Diseases and Conditions 7th Edition TEXTBOOK
13 pages
Robert Frost Quiz
No ratings yet
Robert Frost Quiz
4 pages
Mos Painting Works.
No ratings yet
Mos Painting Works.
4 pages
A. Course Reader Lecture Notes
No ratings yet
A. Course Reader Lecture Notes
177 pages
Chapters_PageRank_and_7
No ratings yet
Chapters_PageRank_and_7
66 pages
math551lab12
No ratings yet
math551lab12
5 pages
Google Pagerank and Reduced-Order Modelling
No ratings yet
Google Pagerank and Reduced-Order Modelling
56 pages
Google Pagerank: The World'S Largest Matrix Computation
No ratings yet
Google Pagerank: The World'S Largest Matrix Computation
13 pages
TM3 ch05 Link Analysis
No ratings yet
TM3 ch05 Link Analysis
69 pages
Applications of Stochastic Models in Web Page Ranking
No ratings yet
Applications of Stochastic Models in Web Page Ranking
8 pages
How Google Works - Markov Chains and Eigenvalues - Klein Project Blog
No ratings yet
How Google Works - Markov Chains and Eigenvalues - Klein Project Blog
8 pages
1.1 Pagerank Description
No ratings yet
1.1 Pagerank Description
19 pages
Lab 4-2
No ratings yet
Lab 4-2
4 pages
APPM 2360 Project 2 Network Markov Chains
No ratings yet
APPM 2360 Project 2 Network Markov Chains
6 pages
Page Rank PDF
0% (1)
Page Rank PDF
20 pages
The Use of The Linear Algebra by Web Search Engines
No ratings yet
The Use of The Linear Algebra by Web Search Engines
5 pages
A Brief Introduction To Markov Chains - Towards Data Science
0% (1)
A Brief Introduction To Markov Chains - Towards Data Science
26 pages
20241017_page_rank
No ratings yet
20241017_page_rank
29 pages
Page Rank Algorithm
No ratings yet
Page Rank Algorithm
18 pages
Power Point
No ratings yet
Power Point
77 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
44 pages
How Works: M. Ram Murty, FRSC Queen's Research Chair Queen's University
No ratings yet
How Works: M. Ram Murty, FRSC Queen's Research Chair Queen's University
29 pages
Abstract. The Original Purpose of Google'S Pagerank Algorithm Is To Assess The
No ratings yet
Abstract. The Original Purpose of Google'S Pagerank Algorithm Is To Assess The
6 pages
Lecture 12 - Link Analysis
No ratings yet
Lecture 12 - Link Analysis
57 pages
Random Walks On Graphs: An Overview: Purnamrita Sarkar
No ratings yet
Random Walks On Graphs: An Overview: Purnamrita Sarkar
71 pages
Markov Chains PDF
No ratings yet
Markov Chains PDF
66 pages
6 Pagerank
No ratings yet
6 Pagerank
7 pages
Markov Chain
No ratings yet
Markov Chain
7 pages
Full 290-321
No ratings yet
Full 290-321
32 pages
Hefferon Markov Basics
No ratings yet
Hefferon Markov Basics
6 pages
Datamining-Lect7 - Link Analysis Ranking PageRank - Random Walks HITS Absorbing Random Walks and Label Propagation
No ratings yet
Datamining-Lect7 - Link Analysis Ranking PageRank - Random Walks HITS Absorbing Random Walks and Label Propagation
99 pages
24f_09_hidden_markov_models
No ratings yet
24f_09_hidden_markov_models
79 pages
Applications of Matrices: 2004 NCSSM TCM Conference
No ratings yet
Applications of Matrices: 2004 NCSSM TCM Conference
7 pages
Markov Chains
No ratings yet
Markov Chains
9 pages
Google Pagerank: Maths Delivers!
No ratings yet
Google Pagerank: Maths Delivers!
24 pages
De Kerchove NV07
No ratings yet
De Kerchove NV07
15 pages
Deeper Inside Pagerank: Amy N. Langville and Carl D. Meyer
No ratings yet
Deeper Inside Pagerank: Amy N. Langville and Carl D. Meyer
33 pages
MArkov
No ratings yet
MArkov
26 pages
Link Analysis
No ratings yet
Link Analysis
47 pages
Module 4 MapReduce and Link Analysis
No ratings yet
Module 4 MapReduce and Link Analysis
103 pages
Bab Amrkov
No ratings yet
Bab Amrkov
96 pages
Topic 4 CEM615
No ratings yet
Topic 4 CEM615
69 pages
Uniformed Search
No ratings yet
Uniformed Search
37 pages
Page Rank Algorithm
No ratings yet
Page Rank Algorithm
9 pages
Linear Algebra Report
No ratings yet
Linear Algebra Report
21 pages
Lecture9
No ratings yet
Lecture9
64 pages
1 St Gallen Oct2024 Centrality
No ratings yet
1 St Gallen Oct2024 Centrality
53 pages
Lec18 HMMs
No ratings yet
Lec18 HMMs
56 pages
CSF-469-L11-13 (Link Analysis Page Rank)
No ratings yet
CSF-469-L11-13 (Link Analysis Page Rank)
47 pages
Ai 2023
No ratings yet
Ai 2023
4 pages
Markov Google
No ratings yet
Markov Google
78 pages
Dtic Ad0724786matrices
No ratings yet
Dtic Ad0724786matrices
15 pages
Book PDF
No ratings yet
Book PDF
516 pages
pagerank
No ratings yet
pagerank
3 pages
The_Use_of_the_Linear_Algebra_by_Web_Search_Engine
No ratings yet
The_Use_of_the_Linear_Algebra_by_Web_Search_Engine
6 pages
DAA PYQ solution
No ratings yet
DAA PYQ solution
24 pages
Advanced Analysis of Algorithms: Dept of CS & IT University of Sargodha
No ratings yet
Advanced Analysis of Algorithms: Dept of CS & IT University of Sargodha
51 pages
Lec3 Handout
No ratings yet
Lec3 Handout
34 pages
Chapter3 State Space Search
100% (1)
Chapter3 State Space Search
75 pages
Daa Unit5
No ratings yet
Daa Unit5
10 pages
States, State Graphs, and Transition Testing: Unit Iv
No ratings yet
States, State Graphs, and Transition Testing: Unit Iv
42 pages
Markov Chain
No ratings yet
Markov Chain
53 pages
Dive Into Algorithms: A Pythonic Adventure for the Intrepid Beginner
From Everand
Dive Into Algorithms: A Pythonic Adventure for the Intrepid Beginner
Bradford Tuckfield
No ratings yet
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
D Engine Tappet Setting Valve Timing
100% (1)
D Engine Tappet Setting Valve Timing
2 pages
LED Retrofit OSRAM
No ratings yet
LED Retrofit OSRAM
6 pages
QC Device List
No ratings yet
QC Device List
20 pages
weather monitoring system EI poster
No ratings yet
weather monitoring system EI poster
1 page
Office TOOL LIST
No ratings yet
Office TOOL LIST
5 pages
Profinite Group
No ratings yet
Profinite Group
6 pages
Biology
No ratings yet
Biology
12 pages
CUP-FEEDING-PROCEDURE-WITH-RATIONALE
No ratings yet
CUP-FEEDING-PROCEDURE-WITH-RATIONALE
3 pages
Daftar Pustaka RF
No ratings yet
Daftar Pustaka RF
4 pages
Fever Myths Parents Often Believe - tcm28 195283
No ratings yet
Fever Myths Parents Often Believe - tcm28 195283
2 pages
Tableau Reappro Stock
No ratings yet
Tableau Reappro Stock
120 pages
516_5_1
No ratings yet
516_5_1
18 pages
Critique and Conversation in Gaudiya Vaisnava Theology
No ratings yet
Critique and Conversation in Gaudiya Vaisnava Theology
13 pages
SCM Performance Management
No ratings yet
SCM Performance Management
12 pages
Codul-Rutier Actualizat
0% (2)
Codul-Rutier Actualizat
222 pages
California Appeals Court decision on NEM 3
No ratings yet
California Appeals Court decision on NEM 3
40 pages
paraloid_km-377
No ratings yet
paraloid_km-377
4 pages
AC 43-02 Amdt 0
100% (1)
AC 43-02 Amdt 0
14 pages
Lecture 2: Limit, Continuity and Differentiability
No ratings yet
Lecture 2: Limit, Continuity and Differentiability
9 pages
Data Pro 2015 Waec
No ratings yet
Data Pro 2015 Waec
14 pages
OGP 435 - A Guide To Selecting Appropriate Tools To Improve HSE Culture
No ratings yet
OGP 435 - A Guide To Selecting Appropriate Tools To Improve HSE Culture
24 pages
Shanta Western Tower- Level- 08- Agreement Office Rent
No ratings yet
Shanta Western Tower- Level- 08- Agreement Office Rent
5 pages
Hannsjörg Schröder, Natasha Moser, Stefan Huggenberger - Neuroanatomy of the Mouse_ an Introduction-Springer (2020)
No ratings yet
Hannsjörg Schröder, Natasha Moser, Stefan Huggenberger - Neuroanatomy of the Mouse_ an Introduction-Springer (2020)
353 pages
A English Project: Name: - Sakshi Rathore Roll No: - 10090
No ratings yet
A English Project: Name: - Sakshi Rathore Roll No: - 10090
5 pages
metabolism of tumors - Otto Warburg
No ratings yet
metabolism of tumors - Otto Warburg
12 pages

Markov Chains

Uploaded by

Markov Chains

Uploaded by

Matrices, Digraphs, Markov

Chains & Their Use by

For example, the

Process can be written using matrix notation.

Stationary vector for our 6-node sample

You might also like