Markov Chains
Markov Chains
s is called a steady state vector
s =q
(k)
is an eigenvector for eigenvalue 1
In the grocery example, there was a unique
steady state vector s, and T q
(k)
s. This
does not need to be the case:
T =
0 1
1 0
(
(
, T
2k
= I =
1 0
0 1
(
(
T
2k+1
= T,
0 1
1 0
(
(
a
b
(
(
=
b
a
(
(
How can we guarantee convergence to an
unique steady state vector regardless of initial
conditions?
One way is by having a regular transition matrix
A nonnegative matrix is regular if some power of
the matrix has only nonzero entries.
.15 1
.85 0
B
(
(
=
2
.8725 .15
.1275 .85
B
(
(
=
Digraphs
A directed graph (digraph) is a set of vertices
(nodes) and a set of directed edges (arcs) between
vertices
The arcs indicate relationships between nodes
Digraphs can be used as models, e.g.
cities and airline routes between them
web pages and links
How Matrices, Markov Chains
and Digraphs are used by
Google
How does Google work?
Robot web crawlers find web pages
Pages are indexed & cataloged
Pages are assigned PageRank values
PageRank is a program that prioritizes pages
Developed by Larry Page & Sergey Brin in 1998
When pages are identified in response to a
query, they are ranked by PageRank value
Why is PageRank important?
Only a few years ago users waited much longer for search
engines to return results to their queries.
When a search engine finally responded, the returned list
had many links to information that was irrelevant, and
useless links invariably appeared at or near the top of the
list, while useful links were deeply buried.
The Web's information is not structured like information in
the organized databases and document collections - it is
self organized.
The enormous size of the Web, currently containing
~10^9 pages, completely overwhelmed traditional
information retrieval (IR) techniques.
By 1997 it was clear that IR technology of the
past wasn't well suited for Web search
Researchers set out to devise new approaches.
Two big ideas emerged, each capitalizing on the
link structure of the Web to differentiate between
relevant information and fluff.
One approach, HITS (Hypertext Induced Topic
Search), was introduced by Jon Kleinberg
The other, which changed everything, is Google's
PageRank that was developed by Sergey Brin and
Larry Page
How are PageRank values assigned?
Number of links to and from a page give
information about the importance of a page.
More inlinks the more important the
page
Inlinks from good pages carry more
weight than inlinks from weaker pages.
If a page points to several pages, its
weight is distributed proportionally.
Imagine the World Wide Web as a directed
graph (digraph)
Each page is a vertex
Each link is an arc
1
2
3
4
5
6
A sample 6 page web (6
vertex digraph)
PageRank defines the rank of page i
recursively by
r
j
is the rank of page j
I
i
is the set of pages that point into page
i
O
j
is the set of pages that have outlinks
from page j
r
i
=
r
j
| O
j
|
jeI
i
=
Using our 6-node sample web:
Transition matrix:
1
2
3
4
5
6
0 1/ 2 1/ 4 1/ 3 0 0
1/ 3 0 1/ 4 0 1/ 3 0
1/ 3 1/ 2 0 0 0 0
1/ 3 0 1/ 4 0 1/ 3 0
0 0 1/ 4 1/ 3 0 0
0 0 0 1/ 3 1/ 3 0
T
(
(
(
(
(
(
(
(
(
=
To eliminate dangling nodes and obtain a
stochastic matrix, replace a column of zeros
with a column of 1/ns, where n is the number
of web pages.
0 1/ 2 1/ 4 1/ 3 0 0
1/ 3 0 1/ 4 0 1/ 3 0
1/ 3 1/ 2 0 0 0 0
1/ 3 0 1/ 4 0 1/ 3 0
0 0 1/ 4 1/ 3 0 0
0 0 0 1/ 3 1/ 3 0
T
(
(
(
(
(
(
(
(
(
=
0 1/ 2 1/ 4 1/ 3 0 1/ 6
1/ 3 0 1/ 4 0 1/ 3 1/ 6
1/ 3 1/ 2 0 0 0 1/ 6
1/ 3 0 1/ 4 0 1/ 3 1/ 6
0 0 1/ 4 1/ 3 0 1/ 6
0 0 0 1/ 3 1/ 3 1/ 6
T
(
(
(
(
(
(
(
(
(
=
Webs nature is such that T would not be
regular
Brin & Page force the transition matrix to be
regular by making sure every entry satisfies
0 < t
ij
< 1
Create perturbation matrix E having all
entries equal to 1/n
Form Google Matrix:
(1 ) , for some 0 1 T T E o o o = + s s
Using = 0.85 for our 6-node sample web:
0.85 (1 .85)
1/ 40 9/ 20 19/ 80 37/120 1/ 40 1/ 6
37/120 1/ 40 19/ 80 1/ 40 37/120 1/ 6
37/120 9/ 20 1/ 40 1/ 40 1/ 40 1/ 6
37/120 1/ 40 19/ 80 1/ 40 37/120 1/ 6
1/ 40 1/ 40 19/ 80 37/120 1/ 40 1/ 6
1/ 40 1/ 40 1/ 40 37/120 37/120 1/ 6
T T E
(
(
(
(
(
(
(
(
(
= +
=
By calculating powers of the transition matrix,
we can determine the stationary vector:
25
.2066 .2066 .2066 .2066 .2066 .2066
.1770 .1770 .1770 .1770 .1770 .1770
.1773 .1773 .1773 .1773 .1773 .1773
.1770 .1770 .1770 .1770 .1770 .1770
.1314 .1314 .1314 .1314 .1314 .1314
.1309 .1309 .1309 .1309 .1309 .1309
T
| |
|
\ .
=
(
(
(
(
(
(
(
(
(