0% found this document useful (0 votes)
30 views8 pages

Link Prediction: Supervised Random Walks: Predicting and Recommending Links in Social Networks

This document discusses supervised random walks for link prediction and recommendation in social networks. It proposes learning a function to estimate edge strengths using positive and negative link examples. A random walk model is then used to rank nodes based on their proximity to the seed node, with the goal of assigning higher scores to positively linked nodes. The model is optimized by minimizing distances between positive and negative node score distributions.

Uploaded by

taprosoft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views8 pages

Link Prediction: Supervised Random Walks: Predicting and Recommending Links in Social Networks

This document discusses supervised random walks for link prediction and recommendation in social networks. It proposes learning a function to estimate edge strengths using positive and negative link examples. A random walk model is then used to rank nodes based on their proximity to the seed node, with the goal of assigning higher scores to positively linked nodes. The model is optimized by minimizing distances between positive and negative node score distributions.

Uploaded by

taprosoft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Link prediction

Supervised Random Walks: Predicting and


Recommending Links in Social Networks

Lars Backstrom (Facebook), Jure Leskovec (Stanford


university
2/7

Motivation

q Link prediction, link recommendation : predicting future


links between existing nodes.

q Applications:
q For social networks it has direct business consequences,
q For networks in organizations, it suggests possible new
collaborations.

AMAs Reading Group Link prediction


3/7

Supervised framework
q For a given node s positive examples are nodes to which s
is linked and negative examples are all the other nodes

q This can not be viewed as a classification task


imbalanced classes
q For example in Facebook, nodes are in average linked to
100 other nodes while Facebook has more than 500 million
existing nodes.

q Solution: Rank nodes instead of classifying them. Other


popular methods for ranking nodes:
q PageRank,
q Random walks with restarts
q The stationary distribution of such random walks assigns
each node a score which gives a ranking of how close to
the considered node are other nodes in the network.

AMAs Reading Group Link prediction


4/7

Problem formulation

q Given a directed graph G(V , E), a node s and two sets:


nodes to which s creates edges D = {d1 , . . . , dk }
(destination nodes) and nodes to which s does not create
edges L = {l1 , . . . , ln } (no-link nodes)
q Each edge (u, v ) E E is characterized by a feature
vector uv X that describes the nodes u and v (age,
gender, hometown, etc.) and the interaction attributes
(when the edge has been created, how many messages u
and v exchanged, etc.)
q A function fw : X R+ estimating edge strengths is then
learned upon D L. Considered functions:
Exponential edge strength: fw (uv ) = ewuv
1
Logistic edge strength: fw (uv ) = 1+ew uv

AMAs Reading Group Link prediction


5/7

Problem formulation
q Find parameters w so that that the function fw assigns
edge weights in a such way that the random walk will be
more likely to visit nodes in D than L
q That is if p is the vector scores, d D, l L, pl < pd
q The proposed optimization problem
X X
min F (w) = ||w||2 + h(pl pd )
w
sS dDs ,lLs

Where h(pl pd ) = 0 if pl < pd and h(pl pd ) > 0


otherwise.
Remarks:
q The optimisation problem is not defined in a traditional ML
way
q scores p. depend on edge strengths estimated by fw .
AMAs Reading Group Link prediction
6/7

Dependency between p and w


q Consider the random walk stochastic transition matrix Q,
fw (uv )
u, v , Quv = (1 ) P + 1v =s
z fw (uz )

The proposed interpretation: Quv is the conditional


probability that a walk will traverse edge (u, v ) given that it
is currently at node u. [0, 1] is the restart probability:
with probability the random walk jumps back to seed
node s and restarts.
q The vector scores p is the stationary distribution of the
Random walk with restarts, and it is the solution of :

pt = pt Q

AMAs Reading Group Link prediction


7/7

Optimization problem is solved using a simple


gradient descent method

F (w) X X h(xld ) pl pd
k , = 2wk + ( )
wk xld wk wk
sS dDs ,lLs

Where xld = pl pd . P
As p is the principal eigenvector of matrix Q, pu = j pj Qju so

pu X pj Qju
= Qju + pj
wk wk wk
j

pu
Remark : pu and wk are recursively entangled the
pu
derivatives wk are recursively estimated applying the chain
rule.
AMAs Reading Group Link prediction
8/7

Features uv for the co-authorship network

q Number of papers written by u before t,


q Number of papers written by v before t
q Number of papers u and v co-authored
q Cosine similarity between the titles of papers written by u
and titles of v s papers
q Time since u and v last co-authored a paper
q Number of common friends between u and v .

AMAs Reading Group Link prediction

You might also like