Slides

Transcription

Slides
Aggregating Content and
Network Information to
Curate Twitter User Lists
Derek Greene
Barry Smyth
Gavin Sheridan
Pádraig Cunningham
RecSys-12: Recommender Systems & The Social Web
Clique: Graph & Network Analysis Cluster
School of Computer Science & Informatics
University College Dublin, Ireland
Task: User List Curation
• Twitter allows the grouping of users into topical user lists.
• Storyful maintains lists of users for news stories to monitor
breaking news related to that story.
RSWeb-2012
2
Supporting User List Curation
• Idea: Use data analysis to identify important users that form
the “community” around a news story on Twitter.
• Goal: Recommend new users to expand an embryonic seed
list, helping to find valuable content relating to the story.
RSWeb-2012
3
Curation System Overview
Seed Set
Seed
List
!"#$%&'(
5B"*+C+3DEFD
(0+3&":=&2",&%>
#03:*'"0+%=&%
?&;?0@?0A&%'3
506*!&1*'&
8=,,&%8&&>3
Bootstrap
@0+.,*36A+%13
$%+C&$%*,&H
506*J4)
)*+,-&*.&%
Phase 1
708$&*+801'
#/012%3'
41&506*
Candidate
List
=*:*6>
0;&%*'=01%&@
G%=>(&,,*1@
6:0%*@=0
;&'&<&22%=&3
7089*':*8
43>HI&63
B'&*8KE
4''+86*"0+%=&%
Recommender
#M534%2'
)*+,-&*.&%
Ranked
List
!"#$%&'(
Phase 2
Candidate
List
$%+/&$%*,&K
6*''.%53&7*,8
85+.,*27;+%32
,K33/*6:;&,,
>*''GH01'*F&%G
#&3314&%##*/5;2
26.G,K3/0
/0%12'1*3453.
18*O&:%1/&
<3&957*
Curator
'85%6*3
E8J1;;&''2
9A"*+/+2BCDB
J0&$&*3H*,F&%
=53>/=57&,,
#520*'"5+%1&%
:&'&?&44%1&2
"0%12(*.&357
J16A,;%&/0'9A
#*253",*K75%'0
61,,&%6&&F2
;1,,2/01/F&,
1*0*7F
957*!&3*'&
705%*815
Core
List'
(5+2&"01&4",&%F
M&3'!5%&3253
"G@*3'2
9*(5+2&%&:+;2
<2FKL&72
J56N*'0*6
*%'0+%6261'0
8?*61253
J56$&*+653'
L1/FAH*.3&%
957*P<)
E%1F(&,,*38
@&:@58@5;&%'2
5:&%*'153%&8
M16@&K35,829A
Updater
!*38KGP%&13&%
<''+67*"5+%1&%
>/M13,&K45%957*
A'&*6IC
#*253G!/0+,'Q
4
Recommendation Overview
General Recommender
• Use embryonic seed list as initial training data.
• Compute training data centroid.
• Measure cosine similarity between centroid and
vectors representing candidate non-seed users.
➡ Produce ranked list of top K non-seed users to
present to a human curator.
Which criterion should we use for
recommendation?
Candidate
List
Recommender
Ranked
List
Curator
➡ Propose variety of vectorised approaches, using
criteria based on Twitter network and content
analysis.
RSWeb-2012
5
Content-Based Criteria
• Tweet profiles: Create a term vector for each user, containing
the aggregation of their 50/100/200 most recent tweets.
User
president
@BarackObama
2
RSWeb-2012
health
2
care
2
reform
1
gender
1
costs
1
...
6
Content-Based Criteria
• List names/descriptions: Create a term vector for each user,
containing the aggregation of names and/or descriptions of user lists
to which they have been assigned.
User
@SarahPalinUSA
politics
2
User
elected
@SarahPalinUSA
1
RSWeb-2012
news
1
useless
1
...
politics
1
purpose
1
...
7
Network-Based Criteria
• Followed-by Profiles: Users are similar if they are "co-followed"
by the same set of users. Represent each user as a binary follower
profile vector.
User X is followed by...
User X
@TeamGB
@bradwiggins
@Mo_Farah
@britishswimming
@MichaelPhelps
1
0
0
1
@chrishoy
1
1
1
0
@TomDaley1994
1
1
0
1
...
• Mentioned-by Profiles: Users are similar if they are mentioned
in tweets by the same users (i.e. "co-mentioned)".
• Retweeted-by Profiles: Users are similar if their posts are
retweeted by the same users (i.e. "co-retweeted)".
RSWeb-2012
8
Network-Based Criteria
• Co-Listed Information:
• Other media outlets and individuals will also be simultaneously
curating user lists on topics in the wider Twitosphere.
➡ Would like to “crowdsource” these efforts to support list curation.
Users @BrianODriscoll and @KearneyRob are co-listed on both lists.
9
Experimental Evaluation
• Collected 10 datasets around "Super Tuesday" GOP nominations
in March 2012, with annotated seed users from Storyful.
Dataset
Alaska
Georgia
Idaho
Massachusetts
North Dakota
Ohio
Oklahoma
Tennessee
Vermont
Virginia
Users
948
966
743
821
363
1051
693
979
864
877
Core Users
41
34
20
24
26
97
32
48
36
46
Tweets
185
211
186
209
203
178
205
199
182
200
Friends
208
235
264
244
147
171
178
170
169
160
Followers
269
295
273
293
192
207
211
204
190
199
Listed
89
126
47
122
93
115
109
112
66
115
e 1: Summary of 10 Twitter datasets used in our evaluations, including the number of manually curat
e” users, together with the mean number of tweets, friends, followers, and list memberships per user.
10
RSWeb-2012
Experimental Setup
• Ran 250 X k-fold cross validation experiments per dataset:
- Held out a proportion of the seed set as test data.
- Used the remaining seed users as training data.
- Ranked users in test data using each criterion.
- Calculated precision and recall relative to the test data.
Training Users
Test Users
RSWeb-2012
11
Comparison of Criteria
• Looked at diversity among the
!"1%(2'$3
recommendations from the
different criteria...
4"%%"5$3167
8(2'*3$29#(:'(")2
8(2'*;$#<$3
➡ We see several distinct signals
present across the different views.
8(2'*)&;$2
=$)'(")$3167
>$'5$$'$3167
?5$$'2*@/,A
!"##$%&'(")*+*0
50%
?5$$'2*@.,,A
?5$$'2*@0,,A
?5$$'2*@/,A
>$'5$$'$3167
60%
=$)'(")$3167
!"##$%&'(")*+*,-/
8(2'*)&;$2
70%
8(2'*;$#<$3
80%
8(2'*3$29#(:'(")2
90%
?5$$'2*@.,,A
4"%%"5$3167
100%
?5$$'2*@0,,A
!"1%(2'$3
% Top 3 Placements
for Precision across
!"##$%&'(")*+*,-./
all experiments
40%
30%
➡ No single criterion
20%
10%
-b
te
d
performs consistently
well on all datasets.
R
et
w
ee
ts
ee
y
(5
0
)
s
de
st
Li
Tw
ri
sc
ts
ee
Tw
st
Li
pt
io
n
(1
0
0
)
d
m
er
ge
(2
0
0
)
ts
ee
Tw
C
olis
te
d
n
am
y
Li
st
ed
-b
y
Fo
llo
w
-b
ed
ti
on
M
en
es
0%
12
Aggregating Multiple Rankings
• SVD Aggregation: Combine rankings generated on top 5
individual criteria into a single matrix, apply Singular Value
Decomposition, then rank values in first singular vector.
% Top 3 Placements
for Precision across
all experiments
100%!
90%!
Top 3 Placements!
80%!
70%!
60%!
50%!
40%!
30%!
20%!
10%!
0%!
SVD!
Mentioned-by!
Co-listed!
Followed-by! Tweets (200)! List names!
Comparison of SVD Aggregation V Top 5 Individual Criteria
RSWeb-2012
13
User List Curation System
14
Conclusions
• Summary:
• Proposed variety of criteria for user list building.
• Performed a comprehensive comparison of individual
criteria, demonstrating weaknesses of both content and
network-based strategies.
• Demonstrated that more accurate results can be
achieved using SVD aggregation.
• Future Work:
• Stratification of networks to target recommendations for
niche communities - avoiding the "filter bubble".
• Support for curation across multiple social networks.
RSWeb-2012
15
Any Questions ?
derek.greene@ucd.ie
@derekgreene
http://cliquecluster.org

Similar documents