0% found this document useful (0 votes)
16 views8 pages

Analyzing Temporal Dynamics in Twitter Profiles For Personalized Recommendations in The Social Web

This paper investigates user modeling strategies for inferring personal interest profiles from Twitter interactions, focusing on the temporal dynamics of these profiles. It presents a framework for enriching Twitter messages and analyzes a large dataset to understand how user interests evolve over time, particularly in relation to trending topics. The findings suggest that strategies considering temporal dynamics yield better performance in personalized recommendations on the Social Web.

Uploaded by

jiaqi bao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views8 pages

Analyzing Temporal Dynamics in Twitter Profiles For Personalized Recommendations in The Social Web

This paper investigates user modeling strategies for inferring personal interest profiles from Twitter interactions, focusing on the temporal dynamics of these profiles. It presents a framework for enriching Twitter messages and analyzes a large dataset to understand how user interests evolve over time, particularly in relation to trending topics. The findings suggest that strategies considering temporal dynamics yield better performance in personalized recommendations on the Social Web.

Uploaded by

jiaqi bao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Analyzing Temporal Dynamics in Twitter Profiles for

Personalized Recommendations in the Social Web

Fabian Abel, Qi Gao, Geert-Jan Houben, Ke Tao


Web Information Systems, TU Delft
PO Box 5031, 2600 GA Delft, the Netherlands
{f.abel,q.gao,g.j.p.m.houben,k.tao}@tudelft.nl

ABSTRACT mendations in (other) applications in the Web. Given the


Social Web describes a new culture of participation on the shortness of tweets, making sense of individual tweets and
Web where more and more people actively participate in exploiting tweets for user modeling are non-trivial problems.
publishing and organizing Web content. As part of this cul- In this paper, we describe the research questions that under-
ture, people leave a variety of traces when interacting with lie our studies and evaluations. We report on the findings in
(other people via) Social Web systems. In this paper, we a set of studies related to personalized news recommenda-
investigate user modeling strategies for inferring personal tions where we in particular consider the temporal dynamic
interest profiles from Social Web interactions. In particular, effects.
we analyze individual micro-blogging activities on Twitter.
We compare different strategies for creating user profiles Our ambition is to understand how people behave in the So-
based on the Twitter messages a user has published and cial Web. In previous work, we analyzed the nature of tag-
study how these profiles change over time. Moreover, we ging activities that people perform on Social Web systems
evaluate the quality of the user modeling strategies in the like Flickr, Delicious and StumbleUpon and studied the im-
context of personalized recommender systems and show that pact of cross-system user modeling on personalization [3].
those strategies which consider the temporal dynamics of the
individual profiles allow for the best performance. In this paper, we study characteristics of Twitter-based pro-
files and investigate how one can leverage Twitter activities
for personalization. For this purpose, we developed a li-
1. INTRODUCTION brary for aggregating and enriching the semantics of indi-
With more than 190 million users and more than 65 million
vidual Twitter activities [2]. Given this work, we conduct
postings per day, Twitter is clearly one of the most popu-
an in-depth analysis on a large Twitter dataset of more
lar services on the Social Web1 . To understand this Social
than 30 million tweets that were published by more than
Web and the way in which humans are part of it, an analysis
20,000 users in a period of four months2 . We study user
of Twitter is an effective instrument for scientists. Twitter
modeling on Twitter and answer research questions that
poses a simple question to its users, “what’s happening?”,
concern the temporal evolution of individual user profiles
and restricts the answer to this question to 140 characters.
inferred from Twitter activities:
People therefore publish short messages (tweets) about their
everyday activities on Twitter. Other users follow selected
information streams and can react to Twitter messages, e.g.
• How can we infer personal interests from individual
by re-tweeting or posting a reply. Lately, researchers study
Twitter interactions and to what extent do personal
the network structures that evolve from those user inter-
interests vary over time?
actions on Twitter [4, 13] and investigate how information
propagates through the Twitter network [9, 10]. Yet, lit- • How are personal interests and user concerns influ-
tle research has been done on understanding the semantics enced by public trends? Do inferred interests profiles
of individual Twitter activities and inferring user interests allow for predicting which trends will be adopted by a
from these activities, user interests that can for example be user?
used in modeling the users as basis for personalized recom-
1
• Can we exploit Twitter-based interest profiles to per-
http://techcrunch.com/2010/06/08/ sonalize the users’ Social Web experiences? How do
twitter-190-million-users/ temporal dynamics impact the accuracy of personal-
Permission to make digital or hard copies of all or part of this work for ization?
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, or To answer the above research questions, we develop a frame-
republish, to post on servers or to redistribute to lists, requires prior specific work for enriching and contextualizing Twitter messages.
permission and/or a fee. Our framework allows us to identify entities (e.g. Muam-
WebSci ’11, June 14-17, 2011, Koblenz, Germany. mar Gaddafi (person), Apple (company)) and topics (e.g.
Copyright 2011 ACM.
2
We make a subset of our dataset publicly available at
http://wis.ewi.tudelft.nl/websci11/
politics, sports) that are mentioned by the users in the in- 2. How do topics evolve over time and how does this affect
dividual tweets [2]. Moreover, we are able to relate tweets the representation of a topic?
to news events (e.g. protests in Egypt) and external Web
resources to further contextualize the users’ Twitter activ- 3. How do the interests of individual users into a topic
ities. The semantically enriched Twitter interactions form change over time?
the basis for a variety of user modeling strategies that we an-
alyze in this paper to understand users’ individual Twitter
interactions over time. We base our analysis on previous work, in which we studied
how to extract trends from Twitter [7]. For representing a
The main contributions of our work can be summarized as trending topic, it is often not sufficient to represent it just via
follows. a single concept such as a hashtag (words starting with “#”).
For example, regarding the Egyptian revolution a hashtag
like “#egypt” could be considered as a representative con-
• We develop a framework for modeling the semantics cept to describe this topic. However, (i) not all tweets that
of individual Twitter interactions. Our framework fea- contain the hashtag “egypt” refer to the revolution in Egypt
tures a variety of user modeling strategies that enrich and (ii) there exist tweets that refer to the revolution but
the semantics of individual Twitter messages, capture do not mention the hashtag “#egypt”. Instead, other terms
personal user interests over time and relate personal that refer to entities such as Mubarak (person) or Cairo (lo-
user interests with global trends. cation) may be used. Therefore, we propose to model a
• We conduct a large scale analysis of Twitter-based user topic on Twitter as a set of weighted concepts where a con-
modeling and analyze different design dimensions of cept may refer to an arbitrary entity and where the weight
the user modeling strategies such as weighting schemes indicates how important the concept is for the topic:
and semantic enrichment strategies in detail. We par-
ticularly study the temporal dynamics of Twitter-based
user interest profiles and analyze the relations between Definition 1 (Topic). A topic is a set of weighted con-
cepts where a concept c may be represented via a named en-
personal interests and the adoption of trending topics. tity or hashtag.
• We prove the success of our strategies in the context topic(time, Ttweets ) = {(c, w(c, time, Ttweets ))|c ∈ CH ∪ CE } (1)
of personalized news recommendations on the Social
Web. We evaluate the impact of the different design Here, w(c, time, Ttweets ) is a function that computes the weight
choices on personalization quality in the context of rec- associated to the concept c for the topic based on messages
ommending Web sites and show that the discovered Ttweets that are (possibly) related to the topic and based on
temporal features can successfully be exploited to im- a given timestamp. CH and CE denote the set of hashtags
prove the accuracy of personalization. and entities respectively.

The rest of the paper is organized as follows: based on an The above model for creating the representation of a topic
analysis of the temporal characteristics of user interests into expects a timestamp as input because concepts that relate
trending topics presented in the next section, we will in- to a certain topic may change over time. On the one hand,
troduce the core model of our user modeling framework in the importance of concepts could vary at different points
Section 3. Our framework allows for the creation of user in time and, on the other hand, new concepts could arise
modeling strategies that consider temporal dynamics of user while other concepts that were once representing the topic
behavior on Twitter. In Section 4 we study how the specifics could entirely become useless to describe the topic. Hence,
of the different user modeling strategies impact personaliza- the representation of a topic depends on the time when the
tion before we conclude in Section 5. profile is demanded.

2. EVOLUTION OF USER INTERESTS IN 2.1 Evolution of Topics over Time


TRENDING TOPICS To analyze how the topic Egyptian revolution evolves over
To better understand the temporal dynamics of the interests time, we selected popular entities on every day of our obser-
and concerns individual users express on Twitter by posting vation period based on their co-occurrence frequency with
messages, we monitored – starting from November 15th 2010 hashtags such as “#jan25” or “#tahrir” which we could al-
– more than 20,000 users over a period of more than four most unambiguously relate to the topic.
months and overall collected more than 30 million tweets. In
this section, we analyze how the interests of users into a topic Figure 1 illustrates how the occurrence frequency of entities,
discussed on Twitter change over time. Therefore, we start which are related to the Egyptian revolution, changes over
with a concrete example topic, the Egyptian revolution3 , time. Some entities like Cairo and Mubarak are popular for
which started on January 25th, 2011. In this analysis, we this topic over a long period in time (see Figure 1(b)), which
aim to clarify on the following research questions. means that Twitter users continuously refer to these entities
when publishing tweets about the topic. The occurrence
frequencies of these entities quickly reach their peaks three
1. How can a topic on Twitter be represented? days after the beginning of the Egyptian revolution, which
3
http://en.wikipedia.org/wiki/Timeline_of_the_ started on January 25th, and then decrease rather slowly
2011_Egyptian_revolution over the next two weeks.
)!"!!#$
opposition leader &#!"
Omar Suleiman is Mohammed ElBaradei
Friday of Rage
(%"!!#$ sworn in as Vice
President Vodafone network &!!"

!"#$%&'()'"*%&*''
(!"!!#$ 'hijacked' by Egypt
!""#$$%&"%'()%*'

internet access %#!"


./01$23456/07$
'%"!!#$ and SMS
networks are cut 89:0/5;$<4=010;56$
'!"!!#$ off 282$ %!!"
>9;0?975$
&%"!!#$ $#!" Day of Revolt
&!"!!#$
$!!"
%"!!#$
#!" Shut down
!"!!#$ Friday of Departure
Internet
$

$
&&

&&

&&

&&

&&

&&

&&

&&

&&
!"
'!

'!

'!

'!

'!

'!

'!

'!

'!
&*

&*

&*

'*

'*

'*

'*

'*

'*
*!

*!

*!

*!

*!

*!

*!

*!

*!
'!

')

'+

!&

!%

!,

&(

&-

'&

$"

$"

"

$"

$"

$"

"
$$

$$
!$

!$

!$

!$

!$
%!

%!
'%

'%

'%

'%

'%
()%'

&'

&'
%!

%(

$!

$(

%)
%'

&'
$'

$'

%'

%'

%'
(a) Entities with short lifespan for the topic. +#%'

20.00% Figure 2: User adoption: number of new users per


18.00% day who become interested in the Egyptian revolu-
16.00% tion.
!""#$$%&"%'()%*'

14.00%
12.00%
Egypt
10.00%
Mubarak
nounced that the Egyptian authorities had hijacked Voda-
8.00%
Cairo fone’s network.
6.00% United States
4.00%
Our analysis presented in Figure 1 thus demonstrates that
2.00%
the importance of entities for a given topic changes over
0.00%
time. While there are some entities that are continuously
11

11

11

11

11

11

11

11

11

good representatives for a topic (e.g. Mubarak ), there are


20

20

20

20

20

20

20

20

20
1/

1/

1/

2/

2/

2/

2/

2/

2/
/0

/0

/0

/0

/0

/0

/0

/0

/0

other entities (e.g. SMS ) which characterize a topic only for


20

24

28

01

05

09

13

17

21

()%'
a short period in time. When creating the representation of
(b) Entities with long lifespan for the topic.
a topic it is thus reasonable to consider multiple concepts
(e.g. entities and hashtags) and to compute the importance
Figure 1: Relative occurrence frequencies of entities of each concept as a function of the time when the topic
related to the Egyptian revolution. representation is requested.

In contrast, the entities shown in Figure 1(a) show burst-


2.2 Evolution of User Interests into Topics
like spikes and seem to be relevant for the topic only for Having seen how a topic is discussed within a community
a short period in time. For example, many messages that of users and how the representation of a topic emerges and
were posted on January 28th were related to the entity SMS changes over time, we now analyze how the interests of in-
and referred to the shutdown of the Internet access and short dividual users into a topic evolves over time. We therefore
messaging services in Egypt that happened on January 26th, selected a subset of 1619 Twitter users. In particular, those
such as the following tweet: users for which we monitored at least 20 Twitter messages in
total and observed at least 10 Twitter messages during the
time of the Egyptian revolution but not necessarily 10 mes-
“Again, latest Egypt updates: internet shut down, sages that are related to the incident in Egypt. In fact,
SMS and Blackberry down, plainclothes police we discovered that 70% of the sample users showed inter-
setting cars on fire” est into the Egyptian revolution, i.e. 70% of the users (re-
)tweeted a message that was mentioning a concept of the
corresponding topic representation. While these users were
Therefore the entity SMS became very popular on that day. interested in the topic, the individual behavior showed in-
Similarly, Omar Suleiman was mentioned in many messages teresting specifics. For example, not all the users started
on January 30th as he was sworn in as vice president on tweeting about the event from the very beginning (January
January 29th which resulted in further protests as reported 25th). Figure 2 shows for each day the number of users who
in the following message: published their first tweet about the Egyptian revolution
and therefore showed for the first time that they are – to
“Al Jazeera breaking: Protesters loudly condemn some extent – concerned with the topic.
the appointment of Omar Suleiman as Vice Pres-
ident” As shown in Figure 2, most people do not join the discussion
or dissemination of the event immediately after it happens.
While the small amplitudes before January 25th can be con-
Similarly, the leader of the opposition Mohamed ElBaradei sidered as noise and seem to be caused by the modeling of
became popular for the topic on January 30th as well. More- the topic, on the day of the first wave of protest in Egypt,
over, the peak for Vodafone on February 3rd is much likely the “Day of Revolt”, slightly less than 150 of our sample
related to the news that the mobile phone company an- users joined the discussion on the topic. After the Egyp-
120 16

Standard Devia,on of Timestamps of Each


User's Tweets Related to the Topic (Days)
ST User A
# of tweets related to the topic

100 14
ST User B
ST User C 12
80
10
60
8

40 6

4
20
2
0
0
11

11

11

11

11

11

11

11

11

11
0 100 200 300 400 500 600 700 800 900 1000
20

20

20

20

20

20

20

20

20

20
1/

1/

1/

2/

2/

2/

2/

2/

3/

3/
/0

/0

/0

/0

/0

/0

/0

/0

/0

/0
users who adopted the topic of 'Egypt Revolu,on'
20

25

30

04

09

14

19

24

01

06
1me
(a) Daily activities of users interested in the topic for a
short time period (short-term adopters). Figure 4: Standard Deviation of Timestamps of Re-
lated Tweets Posted by Each User
40

LT User A
35
# of tweets related to the topic

LT User B
30 LT User C
be considered as long-term adopters. All the three long-
term adopters became interested into the topic at the very
25
beginning of the revolt and can thus also be described as
20 early adopters. In contrast, the short-term adopters char-
15 acterized in Figure 3(a) are not among the first users who
10
publish about the incidents in Egypt. In fact, for the Egyp-
tian revolution it seems that there is a correlation between
5
the time when a user adopts a topic and the duration dur-
0 ing which the user is interested into the topic, i.e. early
adopters overlap stronger with long-term adopters than with
11

11

11

11

11

11

11

11

11

11
20

20

20

20

20

20

20

20

20

20
1/

1/

1/

2/

2/

2/

2/

2/

3/

3/

short-term adopters. Furthermore, the Twitter behavior of


/0

/0

/0

/0

/0

/0

/0

/0

/0

/0
20

25

30

04

09

14

19

24

01

06

1me
the short-term adopters regarding the Egyptian revolution
(b) Daily activities of users interested in the topic for a
long time period (long-term adopters). is apparently more influenced by public trends than the be-
havior of the long-term adopters. For example, as depicted
in Figure 3(a), ST User A, B and C show a peak after
Figure 3: Daily activities of users who are interested the riot on February 2nd that was entitled the “Battle of
in the Egyptian revolution. the Camel” and which was heavily discussed in social and
mainstream news media. In contrast, the peaks of the long-
term adopters, shown in Figure 3(b), happen much more
frequently and also occur on days on which were not packed
tian regime shut down the Internet on January 26th, about
with epic events.
300 users became interested into the protests on the “Friday
of Rage”, January 28th, and another 150 users took for the
Figure 4 overviews the sample users who were interested in
first time part in the Twitter discussions on the following
the Egyptian revolution with respect to the duration the dif-
day.
ferent users expressed their interest into the topic on Twit-
ter. In particular, it shows for each user the standard devi-
Having seen when individual users become for the first time
ation of the timestamps of tweets that were related to the
interested in a topic, we were also interested for how long
topic as similarly proposed by Huang et al. [8] who measure
those users were interested in the topic. Figure 3 shows
the temporal stability of hashtags. For Figure 4, we apply
the amount of Twitter messages that selected users were
standard deviation as follows.
posting on different days. The users whose tweeting activ-
ities on the topic of the Egyptian revolution are displayed
in Figure 3(a) can be characterized as short-term adopters s
as they published tweets about the event for less than one PN
k=1 (time(tweetk ) − time)2
week. It is interesting to see that the amount of messages σ(topic, user) = (2)
N −1
these users posted about the topic is fairly high. For exam-
ple, ST User A, who adopted the topic two days after the
beginning of the revolt, published almost 100 tweets about Here, time(tweetk ) is the timestamp of the k-th tweet pub-
the revolution on a single day. Nevertheless, she quickly lished by the given user that refers to the given topic, time
became disinterested. The interests of these three example is the average timestamp of the user’s tweets that relate to
users thus seem to change quickly. Hence, user modeling the topic and N is the overall number of tweets in which the
strategies that aim for capturing users’ interests into topics user refers to the topic.
have to adapt quickly as well.
Figure 4 shows that for nearly 150 users the σ(topic, user) is
Figure 3(b) displays the Twitter activities of three other zero which means that those users just published one tweet
users who were concerned with the Egyptian revolts for a that we could relate to the happenings in Egypt. Overall, for
long time period of more than one month and can therefore more than 75% of the users, the standard deviation of times-
tamps which specify when they published about the topic is are limited to 140 characters it may become difficult to
less than one week. The fraction of long-term adopters for extract meaningful concepts from the tweets. For exam-
whom σ(topic, user) is higher than ten days is with less than ple, given a tweet such as “President’s son and family flee:
2.5% rather low. http://fb.me/J6SmQF7q” it is difficult to understand to which
president, son and family the user refers to. However, the
2.3 Findings semantics of the message can be interpreted when follow-
ing the link posted in the tweet. In this paper, we thus
In summary, we can thus answer the research questions
include a semantic enrichment component into the Twitter
raised at the beginning of this section as follows.
user modeling process that follows hyperlinks, extracts the
main content of the linked Web pages and identifies enti-
ties mentioned in those pages. This allows us to represent a
1. Topics that are discussed on Twitter can be repre-
tweet via both concepts which are extracted from the tweet
sented via the concepts that are referenced from the
and concepts that are extracted from Web sites that are
tweets that relate to the topic. Those concepts can
referenced from the tweet. Based on the semantically en-
be arbitrary entities such as persons, organizations or
riched Twitter messages of a user, we create strategies that
locations as well as cryptic hashtags like “#jan25”. As
infer user interest profiles. In this paper, we represent those
different concepts may be of different relevance for a
profiles in the same way as we represent topics.
topic, it is desirable to weigh the concepts according
to their importance for the topic.
Definition 2 (User Profile). The profile of a user
2. Topics change over time: different concepts are of dif- u is a set of weighted concepts where a concept c may be
ferent importance for a given topic. For example, con- represented via a named entity or hashtag.
cepts such as SMS or Vodafone became important for
P (u, time) = {(c, w(c, time, Ttweets,u )|c ∈ CH ∪ CE } (3)
the Egyptian revolution only for a short time when the
government of Egypt shut down the Internet and took Here, w(c, u, time) is a function that computes the weight
over the telecommunication network of Vodafone. Due associated with the concept c for the given user u based on
to this event-like nature of a Twitter topic, it is help- messages Ttweets,u published by u and based on the given
ful to compute the weight of a concept for a topic as a timestamp. CH and CE denote the set of hashtags and en-
function of time. tities respectively.

3. The interests of individual users into a topic evolve dif-


With p ~(u, time) we refer to P (u, time) in its vector space
ferently over time in the context of the Egyptian rev-
model representation, where the value of the i-th dimension
olution. Most users, who were interested in the topic,
refers to w(ci , time, Ttweets,u ). A straightforward approach
adopted the topic within a few days. Hence, the speed
for computing the weight is to determine the occurrence fre-
in which people adopt a topic on Twitter seems to be
quency of the concept c in the set of tweets published by
rather fast (cf. [9, 12]). However, the fraction of early
the user u. We compare this baseline strategy that ignores
adopters who become interested in an event on the
the time input with a time-sensitive variant which dampens
day the event happens is small. Moreover, the dura-
the occurrence frequency according to the temporal distance
tion during which users are interested in an event-like
between the concept occurrence time and the given times-
topic differs clearly among the different users. In fact,
tamp.
we identified long-term adopters who are interested in
X |time − time(t)|
a Twitter topic over a long period in time and short- w(c, time, Ttweets,u ) = (1 − )
d

term adopters who are concerned with a topic only for t∈Ttweets,u,c
maxtime − mintime
a short period in time and are rather driven by current (4)
trends. In Equation 4, Ttweets,u,c denotes the set of tweets that have
been published by u and refer to the concept c. time(t) re-
turns the timestamp of a given tweet t and maxtime and
3. USER MODELING WITH TEMPORAL mintime denote the highest (youngest) and lowest (oldest)
DYNAMICS timestamp of a tweet in Ttweets,u,c , for example: maxtime =
Given the findings presented in the previous section, we now max({time(t)|t ∈ Ttweets,u,c }). The parameter d is used to
introduce a lightweight user modeling framework that allows adjust the influence of the temporal distance. The higher d is
for the creation of strategies that infer user interests from the set, the higher the penalty of concepts that occur with a high
Twitter activities of a user and allow for capturing temporal distance to the input time as the corresponding scores will
dynamics in these profiles. We implemented our approach be lower than for those concepts for which |time − time(t)|
as extension to the Twitter-based user modeling framework is smaller. In the subsequent sections we set d = 4. Fur-
introduced in [1] and make our strategies also available via thermore, we normalize the weights of a profile P (u, time)
Web services4 . so that the sum of weights in a profile is equal to 1.

Our user modeling strategies aggregate and monitor Twit- Our hypothesis is that the time-sensitive strategy character-
ter messages of an individual user and process each tweet izes the actual demands and concerns of a user better than
by means of a semantic enrichment pipeline that extracts the non-time-sensitive baseline strategy.
hashtags and named entities (e.g. persons, locations or
organizations) from a given tweet. As Twitter messages 4. TIME-SENSITIVE USER MODELING FOR
4
http://wis.ewi.tudelft.nl/tums/ PERSONALIZED RECOMMENDATIONS
To investigate the above hypothesis, we deploy the user mod- each day of our recommendation period which is given by
eling strategies in a personalized recommender system. The the last ten days of January (Jan 20th - Jan 31st). Hence,
recommender provides Web site recommendations to a user our recommendation period overlaps with the beginning of
based on her user profile. We thus apply the Twitter-based the Egyptian revolution. However, the Web sites that are
user modeling strategies to personalize the Social Web ex- recommended to the users in this period may refer to any
perience of the users and point them to Web sites which are topic and are not necessarily related to the revolution in
according to their profiles of interest in their current tempo- Egypt. The ground truth of URLs, which we consider as
ral context. We then study the following research questions. relevant for a specific user u on a particular day, is given
by those Twitter messages which link to the corresponding
Web site and which have been re-tweeted by u on that day.
1. How do semantic enrichment and (time-sensitive) weight- Following this evaluation strategy, we identified, on average,
ing functions of the user modeling framework influence 24.5 relevant URLs for each of the 1619 sample users per
the performance of the recommender system? day. The candidate set of URLs, which were published on a
2. Are there any correlations between characteristic pat- recommendation day, contained, on average, 24549 items.
terns in the generated Twitter profiles and the gained
recommendation quality? For example, how does the Given the ground truth and candidate sets, we applied the
recommendation quality differ between users who have different user modeling strategies together with the above
a tendency to be short-term or long-term adopters on algorithm (see Definition 3) and set of candidate items to
a given topic? compute fresh, personalized Web site recommendations for
each user on each day. The user modeling strategies were
only allowed to exploit tweets published before the start of
4.1 Evaluation Methodology the recommendation period. The quality of the recommen-
We examine the user modeling strategies in the context of
dations was measured by means of S@k (Success at rank k),
a recommender system that we developed for providing per-
which stands for the mean probability that a relevant item
sonalized Web site recommendations to the user. In par-
occurs within the top k of the ranking, and MRR (Mean
ticular those fresh Web sites that are referenced in Twitter
Reciprocal Rank), which indicates at which rank the first
messages (cf. [5, 6]). Recommending Web sites, which are
item relevant to the user occurs on average. For Success@k,
posted on Twitter, is a non-trivial task as URLs, which are
we will focus on S@10 as our recommendation system will
going to be recommended, often refer to news articles or
list 10 Web site recommendations to a user.
other types of fresh, news-like content [9]. This makes it
difficult to apply collaborative filtering methods, but rather
calls for content-based or hybrid approaches [11]. Our main
goal is to analyze and compare the applicability of the dif- 4.2 Results
ferent user modeling strategies in the context of the rec- Figure 5 summarizes the result of our recommendation ex-
ommender system. We particularly analyze how the time- periment. In Figure 5(a), we first analyze the impact of the
sensitive user modeling strategy, introduced in Section 3, in- semantic enrichment provided by our user modeling frame-
fluences personalization and performs in comparison to non- work. We observe that the recommendation quality is posi-
time-sensitive variants. We do not aim to optimize recom- tively influenced by the enrichment component that follows
mendation quality, but are interested in comparing the qual- the links in Twitter messages to also extract named enti-
ity achieved by the same recommendation algorithm when ties from those Web pages. While the performance regard-
inputting different types of user profiles. Therefore we ap- ing MRR increases just slightly, S@10 improves by more
ply a lightweight content-based algorithm that recommends than 15%. For the entity-based user modeling strategy, we
items according to their cosine similarity with a given user thus apply the semantic enrichment method that exploits
profile. We thus cast the recommendation problem into a the links posted in Twitter messages also for the subsequent
search and ranking problem where the given user profile, recommendation experiments.
which is constructed by a specific user modeling strategy, is
interpreted as query. Figure 5(b) shows the performance of the entity-based and
hashtag-based user modeling strategies and illustrates how
the time-dependent weighting function (cf. Equation 4) in-
Definition 3 (Recommendation Algorithm). Given fluences the personalization quality. Regarding S@10, the
a user profile vector p ~(u) and a set of candidate Web re- entity-based user modeling strategy performs slightly better
sources (URLs) R = {~ p(r1 ), ..., p
~(rn )}, which are represented than the hashtag-based method (improvement: 5%). How-
via profiles using the same vector representation that is used ever, there is no significant difference in performance be-
for a given user profile p~(u), the recommendation algorithm tween entity-based and hashtag-based user modeling strat-
ranks the candidate items according to their cosine similarity egy. In contrast, the time-dependent weighting function in-
to p
~(u). creases the recommendation performance clearly. For the
~(u) · p
p ~(ri ) hashtag-based user modeling strategy, weighting the occur-
simcosine (~
p(u), p
~(ri )) = (5) rence frequency according to the time for which a profile
||~
p(u)|| · ||~
p(ri )||
is demanded (hashtag (time)) improves the recommenda-
tion quality over the baseline strategy (hashtag) by 10.4%
Given the Twitter dataset which contains more than 30 mil- and 12% regarding S@10 and MRR respectively. We thus
lion tweets and more than 1.3 million distinct Web sites that find first evidence for our hypothesis that the time-sensitive
are linked from the tweets, we compute personalized recom- strategy characterizes the actual demands and concerns of a
mendations for each user of our sample (cf. Section 2) on user better than the non-time-sensitive baseline strategy.
100% size of profile

relative profile size & performance


'()*+,-'(."34/'5"0(" MRR
.6''./"4(5"1*(7'5"
80%
)'/08)+'/"
9:;!" 60%

'()*+,-'(."/01'2" <=="
40%
34/'5"0(".6''./"

20%

!" !#!$" !#!%" !#!&"


(a) Semantic Enrichment (entity-based profiles) 0%
0% 20% 40% 60% 80% 100%
,-.*/"0.4,6" users (percentiles)
,-123'4,-*5"
(a) Hashtag-based Profiles
'()'*(+"0.4,5"
100% size of profile,

relative profile size & performance


MRR
789!"
,-.*/"0,-123'4,-*5" :;;" 80%

'()'*(+" 60%

!" !#!$" !#!%" !#!&" 40%


(b) Impact of Temporal Dynamics
20%
456!"
0120+13"-*.(/"
788" 0%
!"#$#%"%&'() 0% 20% 40% 60% 80% 100%
()*+,"-*.(/" *!$+,)%&,-&)
users (percentiles)

0120+13"-*.(/" &."-*/0!*''() (b) Entity-based Profiles


*!$+,)%&,-&)
Figure 6: Relation between size of profiles and qual-
()*+,"-*.(/" ity of profiles for supporting personalization.

relative manner. Moreover, the MRR curves show the aver-


!" !#!$" !#!%" !#!&" !#!'"
age performance for the corresponding x% of the users. For
(c) Different Types of Users
example, for those 20% of the users whose hashtag-based
Figure 5: Comparison of user modeling strategies profiles are smaller than the profiles of the other 80% of the
for supporting personalization. users, the recommendation quality is less than 20%. Fig-
ure 6(a) can thus be interpreted as follows: the bigger the
hashtag-based profiles the better the recommendation.
Figure 5(c) illustrates the recommendation performance for
different types of users: (i) people who are continuously ac-
For entity-based profiles, we observe different behavior, as
tive during our recommendation period and re-tweet at least
depicted in Figure 6(b). The quality of recommendations
one Web site on each day of the ten days (i.e. for each day
computed based on entity-based profiles does not depend
there exist at least one relevant item to be recommended)
that strongly on the size of the profiles. In fact, it remains
and (ii) people who are sporadically active (on less than five
fairly stable for varying profile sizes.
days). As depicted in Figure 5(c), the recommendation per-
formance is better for active users than for the sporadically
active users. It is interesting to see that the hashtag-based 4.3 Findings
version performs best for the continuously active users and In this section, we showed how the Twitter-based user mod-
rather fails for the sporadically active users for which the eling strategies can be applied in a recommender system to
entity-based user modeling strategy performs best. Hence, personalize the users’ Social Web experience. The research
it seems that for recommending Web sites on the Social Web, questions raised at the beginning of this section can be an-
the interests of active users can be represented best via hash- swered as follows.
tags while the interests of sporadically active users are best
modeled via the entity-based strategy.
1. When determining the importance of concepts in a
Figure 6 further relates the recommendation quality with user profile, it is beneficial to weigh the concepts with
the size of entity-based and hashtag-based profiles. The size respect to the point in time for which the profile is de-
of a user’s profile is measured by the number of distinct con- manded. Those concepts which a user has been con-
cepts that appear in a profile and is given relatively to the cerned with recently should be weighted higher than
size of the biggest profile. The performance is measured via concepts which have not been referenced by the user
the mean reciprocal rank (MRR) and is also specified in a for a long time. Moreover, we observed that entity-
based user modeling performs best when extracting [3] F. Abel, E. Herder, G.-J. Houben, N. Henze, and
entities from both the Twitter messages and the Web D. Krause. Cross-system user modeling and
resources, which are referenced from the corresponding personalization on the social web. User Modeling and
Twitter message. User-Adapted Interaction (UMUAI), Special Issue on
Personalization in Social Web Systems, pages 1–42,
2. We also discovered remarkable correlations between 2011.
the characteristics of the different types of user pro-
[4] M. Cha, H. Haddadi, F. Benevenuto, and P. K.
files and the resulting recommendation quality. When
Gummadi. Measuring User Influence in Twitter: The
modeling users based on hashtags, the personalization
Million Follower Fallacy. In W. W. Cohen and
performance correlates with the size of the hashtag-
S. Gosling, editors, Proceedings of the Fourth
based profile: the bigger the profile, the better the
International Conference on Weblogs and Social Media
performance. In contrast, personalization enabled via
(ICWSM), Washington, DC, USA. The AAAI Press,
entity-based user modeling is highly independent from
2010.
the size of a profile. Furthermore, we observed that for
sporadically active users, which tend to be short-term [5] J. Chen, R. Nairn, L. Nelson, M. Bernstein, and
adopters (cf. Section 2), the entity-based user mod- E. Chi. Short and tweet: experiments on
eling strategies, provided by our framework, perform recommending content from information streams. In
much better than the hashtag-based strategies. Proceedings of the 28th international conference on
Human factors in computing systems (CHI), pages
1185–1194. ACM, 2010.
5. CONCLUSIONS [6] A. Dong, R. Zhang, P. Kolari, J. Bai, F. Diaz,
In this paper, we analyzed the temporal dynamics of user
Y. Chang, Z. Zheng, and H. Zha. Time is of the
profiles inferred from users’ Twitter activities. We presented
essence: improving recency ranking using twitter data.
a user modeling framework that allows for the creation of
In Proceedings of the 19th international conference on
strategies which extract the semantics of individual Twitter
World Wide Web (WWW), pages 331–340. ACM,
messages and allow for the generation of user interest pro-
2010.
files that specify to which degree a user is interested into a
given concept. Given this framework, we first analyzed the [7] Q. Gao, F. Abel, G.-J. Houben, and K. Tao.
characteristics of topics discussed on Twitter and discov- Interweaving trend and user modeling for personalized
ered that the representation of a topic changes over time: news recommendations. Technical report, submitted
concepts related to a topic may gain or loose importance. to the International Conference on Web Intelligence
For event-like topics, we identified different groups of users: Web (WI), Lyon, France, 2011.
long-term adopters join the discussion early and continu- [8] J. Huang, K. M. Thornton, and E. N. Efthimiadis.
ously contribute to the discussion while short-term adopters Conversational Tagging in Twitter. In M. H. Chignell
join the discussion later and participate just sporadically and E. Toms, editors, Proceedings of the 21st ACM
being influenced by public trends. conference on Hypertext and hypermedia (HT), pages
173–178. ACM, 2010.
Based on this analysis, we introduced strategies that allow [9] H. Kwak, C. Lee, H. Park, and S. Moon. What is
for incorporating those temporal characteristics into user twitter, a social network or a news media? In
profiles as well. We defined time-sensitive user modeling Proceedings of the 19th International Conference on
strategies (hashtag-based and entity-based) and evaluated World Wide Web (WWW), pages 591–600. ACM,
these strategies in context of a recommender system that 2010.
provides Web site recommendations on the Social Web. Our [10] K. Lerman and R. Ghosh. Information contagion: an
results prove the benefits of user modeling strategies that empirical study of spread of news on digg and twitter
capture the temporal dynamics of a user’s Twitter activities social networks. In Proceedings of 4th International
and reveal that semantic enrichment is particularly impor- Conference on Weblogs and Social Media (ICWSM),
tant for users who sporadically participate in the discussions The AAAI Press, 2010.
on Twitter. [11] J. Liu, P. Dolan, and E. R. Pedersen. Personalized
news recommendation based on click behavior. In
Acknowledgements. This work is partially sponsored by C. Rich, Q. Yang, M. Cavazza, and M. X. Zhou,
the EU FP7 project ImREAL5 . editors, Proceeding of the 14th international
conference on Intelligent user interfaces (IUI), pages
6. REFERENCES 31–40. ACM, 2010.
[1] F. Abel, Q. Gao, G.-J. Houben, and K. Tao. [12] T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake
Analyzing User Modeling on Twitter for Personalized shakes twitter users: real-time event detection by
News Recommendations. In International Conference social sensors. In Proceedings of the 19th international
on User Modeling, Adaptation and Personalization conference on World Wide Web (WWW), pages
(UMAP), Girona, Spain. Springer, 2011. 851–860. ACM, 2010.
[2] F. Abel, Q. Gao, G.-J. Houben, and K. Tao. Semantic [13] J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank:
Enrichment of Twitter Posts for User Profile finding topic-sensitive influential twitterers. In B. D.
Construction on the Social Web. In Extended Davison, T. Suel, N. Craswell, and B. Liu, editors,
Semantic Web Conference (ESWC), Heraklion, Proceedings of the Third International Conference on
Greece. Springer, 2011. Web Search and Web Data Mining (WSDM), New
5 York, NY, USA, pages 261–270. ACM, 2010.
http://imreal-project.eu

You might also like