Slides
Transcription
Slides
Aggregating Content and Network Information to Curate Twitter User Lists Derek Greene Barry Smyth Gavin Sheridan Pádraig Cunningham RecSys-12: Recommender Systems & The Social Web Clique: Graph & Network Analysis Cluster School of Computer Science & Informatics University College Dublin, Ireland Task: User List Curation • Twitter allows the grouping of users into topical user lists. • Storyful maintains lists of users for news stories to monitor breaking news related to that story. RSWeb-2012 2 Supporting User List Curation • Idea: Use data analysis to identify important users that form the “community” around a news story on Twitter. • Goal: Recommend new users to expand an embryonic seed list, helping to find valuable content relating to the story. RSWeb-2012 3 Curation System Overview Seed Set Seed List !"#$%&'( 5B"*+C+3DEFD (0+3&":=&2",&%> #03:*'"0+%=&% ?&;?0@?0A&%'3 506*!&1*'& 8=,,&%8&&>3 Bootstrap @0+.,*36A+%13 $%+C&$%*,&H 506*J4) )*+,-&*.&% Phase 1 708$&*+801' #/012%3' 41&506* Candidate List =*:*6> 0;&%*'=01%&@ G%=>(&,,*1@ 6:0%*@=0 ;&'&<&22%=&3 7089*':*8 43>HI&63 B'&*8KE 4''+86*"0+%=&% Recommender #M534%2' )*+,-&*.&% Ranked List !"#$%&'( Phase 2 Candidate List $%+/&$%*,&K 6*''.%53&7*,8 85+.,*27;+%32 ,K33/*6:;&,, >*''GH01'*F&%G #&3314&%##*/5;2 26.G,K3/0 /0%12'1*3453. 18*O&:%1/& <3&957* Curator '85%6*3 E8J1;;&''2 9A"*+/+2BCDB J0&$&*3H*,F&% =53>/=57&,, #520*'"5+%1&% :&'&?&44%1&2 "0%12(*.&357 J16A,;%&/0'9A #*253",*K75%'0 61,,&%6&&F2 ;1,,2/01/F&, 1*0*7F 957*!&3*'& 705%*815 Core List' (5+2&"01&4",&%F M&3'!5%&3253 "G@*3'2 9*(5+2&%&:+;2 <2FKL&72 J56N*'0*6 *%'0+%6261'0 8?*61253 J56$&*+653' L1/FAH*.3&% 957*P<) E%1F(&,,*38 @&:@58@5;&%'2 5:&%*'153%&8 M16@&K35,829A Updater !*38KGP%&13&% <''+67*"5+%1&% >/M13,&K45%957* A'&*6IC #*253G!/0+,'Q 4 Recommendation Overview General Recommender • Use embryonic seed list as initial training data. • Compute training data centroid. • Measure cosine similarity between centroid and vectors representing candidate non-seed users. ➡ Produce ranked list of top K non-seed users to present to a human curator. Which criterion should we use for recommendation? Candidate List Recommender Ranked List Curator ➡ Propose variety of vectorised approaches, using criteria based on Twitter network and content analysis. RSWeb-2012 5 Content-Based Criteria • Tweet profiles: Create a term vector for each user, containing the aggregation of their 50/100/200 most recent tweets. User president @BarackObama 2 RSWeb-2012 health 2 care 2 reform 1 gender 1 costs 1 ... 6 Content-Based Criteria • List names/descriptions: Create a term vector for each user, containing the aggregation of names and/or descriptions of user lists to which they have been assigned. User @SarahPalinUSA politics 2 User elected @SarahPalinUSA 1 RSWeb-2012 news 1 useless 1 ... politics 1 purpose 1 ... 7 Network-Based Criteria • Followed-by Profiles: Users are similar if they are "co-followed" by the same set of users. Represent each user as a binary follower profile vector. User X is followed by... User X @TeamGB @bradwiggins @Mo_Farah @britishswimming @MichaelPhelps 1 0 0 1 @chrishoy 1 1 1 0 @TomDaley1994 1 1 0 1 ... • Mentioned-by Profiles: Users are similar if they are mentioned in tweets by the same users (i.e. "co-mentioned)". • Retweeted-by Profiles: Users are similar if their posts are retweeted by the same users (i.e. "co-retweeted)". RSWeb-2012 8 Network-Based Criteria • Co-Listed Information: • Other media outlets and individuals will also be simultaneously curating user lists on topics in the wider Twitosphere. ➡ Would like to “crowdsource” these efforts to support list curation. Users @BrianODriscoll and @KearneyRob are co-listed on both lists. 9 Experimental Evaluation • Collected 10 datasets around "Super Tuesday" GOP nominations in March 2012, with annotated seed users from Storyful. Dataset Alaska Georgia Idaho Massachusetts North Dakota Ohio Oklahoma Tennessee Vermont Virginia Users 948 966 743 821 363 1051 693 979 864 877 Core Users 41 34 20 24 26 97 32 48 36 46 Tweets 185 211 186 209 203 178 205 199 182 200 Friends 208 235 264 244 147 171 178 170 169 160 Followers 269 295 273 293 192 207 211 204 190 199 Listed 89 126 47 122 93 115 109 112 66 115 e 1: Summary of 10 Twitter datasets used in our evaluations, including the number of manually curat e” users, together with the mean number of tweets, friends, followers, and list memberships per user. 10 RSWeb-2012 Experimental Setup • Ran 250 X k-fold cross validation experiments per dataset: - Held out a proportion of the seed set as test data. - Used the remaining seed users as training data. - Ranked users in test data using each criterion. - Calculated precision and recall relative to the test data. Training Users Test Users RSWeb-2012 11 Comparison of Criteria • Looked at diversity among the !"1%(2'$3 recommendations from the different criteria... 4"%%"5$3167 8(2'*3$29#(:'(")2 8(2'*;$#<$3 ➡ We see several distinct signals present across the different views. 8(2'*)&;$2 =$)'(")$3167 >$'5$$'$3167 ?5$$'2*@/,A !"##$%&'(")*+*0 50% ?5$$'2*@.,,A ?5$$'2*@0,,A ?5$$'2*@/,A >$'5$$'$3167 60% =$)'(")$3167 !"##$%&'(")*+*,-/ 8(2'*)&;$2 70% 8(2'*;$#<$3 80% 8(2'*3$29#(:'(")2 90% ?5$$'2*@.,,A 4"%%"5$3167 100% ?5$$'2*@0,,A !"1%(2'$3 % Top 3 Placements for Precision across !"##$%&'(")*+*,-./ all experiments 40% 30% ➡ No single criterion 20% 10% -b te d performs consistently well on all datasets. R et w ee ts ee y (5 0 ) s de st Li Tw ri sc ts ee Tw st Li pt io n (1 0 0 ) d m er ge (2 0 0 ) ts ee Tw C olis te d n am y Li st ed -b y Fo llo w -b ed ti on M en es 0% 12 Aggregating Multiple Rankings • SVD Aggregation: Combine rankings generated on top 5 individual criteria into a single matrix, apply Singular Value Decomposition, then rank values in first singular vector. % Top 3 Placements for Precision across all experiments 100%! 90%! Top 3 Placements! 80%! 70%! 60%! 50%! 40%! 30%! 20%! 10%! 0%! SVD! Mentioned-by! Co-listed! Followed-by! Tweets (200)! List names! Comparison of SVD Aggregation V Top 5 Individual Criteria RSWeb-2012 13 User List Curation System 14 Conclusions • Summary: • Proposed variety of criteria for user list building. • Performed a comprehensive comparison of individual criteria, demonstrating weaknesses of both content and network-based strategies. • Demonstrated that more accurate results can be achieved using SVD aggregation. • Future Work: • Stratification of networks to target recommendations for niche communities - avoiding the "filter bubble". • Support for curation across multiple social networks. RSWeb-2012 15 Any Questions ? derek.greene@ucd.ie @derekgreene http://cliquecluster.org