Slides - Tim Althoff
Transcription
Slides - Tim Althoff
TimeMachine T I M E L I N E G E N E R AT I O N F O R K N O W L E D G E -B A S E E N T I T I E S Tim Althoff (Stanford CS), Xin Luna Dong, Kevin Murphy, Safa Alai, Van Dang, Wei Zhang (Google) Learning about new topics is hard 2 3 Web Search Results 4 Structured Information 5 Our System: TimeMachine bert Downey Jr. (1965—) 5 Deborah Falconer Ben Stiller Robert Downey, Sr. Chaplin 1990 Fiona Apple Paramount Pictures 1995 The Party's Over Ally McBeal 2000 Susan Downey Gothika Iron Man 2 Iron Man 2005 Iron Man 3 The Avengers 2010 2 6 Timeline Generation Problem Definition 1-hop event Robert Downey Jr. (1965—) April 4, 1965 Deborah Falconer Do B Ben Stiller Robert Downey, Sr. 1985 Chaplin 1990 Fiona Apple Paramount Pictures 1995 The Party's Over Ally McBeal 2000 Susan Downey Gothika Iron Man 2 Iron Man 2005 Iron Man 3 The Avengers 2010 2015 May 4, 2012 Timeline rIn rel Da sta te Robert Downey Jr. Entity 2-hop event tar In • Timeline needs to be rendered on device The Avengers • Generation must be fast (zoom + interaction) Subject 7 Timeline Generation Approach 1-hop event Robert Downey Jr. (1965—) April 4, 1965 Deborah Falconer B Ben Stiller Do Robert Downey, Sr. bert Downey Jr. List of events rIn rel Da sta te Entity 1985 Chaplin 1990 The Party's Over Fiona Apple Paramount Pictures Ally McBeal 1995 2000 Susan Downey Gothika Iron Man 2 Iron Man 2005 Iron Man 3 The Avengers 2010 May 4, 2012 Timeline 2-hop event tar In The Avengers Subject 8 2 List of Events Subject Related Entity Time Description R. D. Jr Robert D. Sr 1988 In movie directed by 2004 TV show app. with 2005 Got married Jon Bon Jovi Susan Downey Iron Man Avengers 2008 Award for movie 2010 Acted in movie … … … 9 1. Event Generation 10 Candidate Event Generation 1-hop event Do B April 4, 1965 May 4, 2012 rIn rel Da sta te Robert Downey Jr. 2-hop event The Avengers sta rIn Subject Samuel L Jackson related through 2-hop event Related Entity Timestamp 11 Resulting Candidate Events Subject Related Entity Time Description R. D. Jr - 1965 Was born The Avengers 2012 Acted in movie Samuel L Jackson 2012 Co-starred in The Avengers … … … 12 • Some non-informative events: ? any American à nationality: USA à founded 1776 • Frequency Filter: events commonly associated with a large number of subjects are unlikely to be interesting (like IDF) (no “nationality à founded”) • Existence Filter: filter out events before entity begins to exist (no “parent à DOB”) wallpaperbase.org / en.wikipedia.org Event Filtering 13 Event Filtering Evaluation • 87% precision (two indep. raters) • Generate many candidate events (Freebase) 14 2. Event Selection 15 Timeline Quality Criteria 1. Correctness X 2. Relevance X 16 Timeline Quality Criteria (cont.) 3. Content Diversity Encourage selection of different entities US Release Award EU Release vs Award Award US Release 17 Timeline Quality Criteria (cont.) 4. Temporal Diversity / Layout ? 18 Optimization Problem Candidate Set: Objective: Constraint: 19 (2) Relevance Signals • Baseline: global relevance signal • Based on # search queries for entity • Biased towards popular but unspecific events 20 How to improve relevance signal • Use co-occurrences on web scale marvel.com • “Robert Downey Jr” and “May 4, 2012” occurs 173 times on 71 different webpages • US Release date of The Avengers 21 Assigning scores to entities/dates • Run NLP tools across large web corpus of 10B documents (NER + CoRef) • Extract entity-entity and entity-date co-occurrences within small windows • Normalize counts using Normalized Pointwise Mutual Information (NPMI) § accounts for popular entities/dates 22 Improvements from New Signal • Large improvement over global relevance 23 (3) Content Diversity 24 (4) Temporal Diversity Fill • Constraint in optimization problem: “Event boxes cannot overlap.” • Enforce balanced layout during optimization 25 Submodular Optimization • Relevance(T) is submodular § “Diminishing returns” § Less reward for adding to “bigger set” 26 Algorithm & Theoretical Results • Fast approximation through lazy-greedy § Allows for zoom and interaction • Provable approximation guarantee § Worst-case: 33% of optimal solution! § Still holds with complex constraint! § We prove: constraint structure induces independence family that is a p-system (Calinescu et al. 2011) 27 3. Evaluation 28 Experimental Evaluation • User study on Amazon Mechanical Turk § Rating of timelines challenging, possibly subjective, and no ground truth available • Pairwise comparisons using 250 entities § Relative judgements § Explanations • Large-scale VS § >1200 raters § >6000 tasks 29 Results In what fraction of cases do raters prefer our full method over the ablated baselines? Global relevance signal vs Base Baseline ● vs Full−E2D No Date Rel ● vs Full−E2E No Entity Rel Removing Date/Entity Cooc ● No content or temporal diversity vs Full−TD No Temp. Div No Cont. Div vs Full−CD 0.5 0.6 0.7 ● ● 0.8 0.9 Fraction preferring Full (RPref) 1.0 Full = “everything” = Global + Web Cooc + TD + CD 30 Related Work • Document summarization: (Allan et al. 2001) • Submodular optimization: (Krause & Golovin 2014), (Calinescu et al. 2011), (Nemhauser et al. 1978) • Maps of information: (Shahaf et al. 2013) • Timelines based on knowledge bases: (Mazeika et al. 2011), (Tuan et al. 2011), (Wang et al. 2010) 31 Conclusions • TimeMachine: Automatic timeline generation for knowledge base entities • Submodular optimization framework § Jointly optimizes for relevance, content diversity, and temporal diversity § Proved near-optimal performance guarantees • User studies show § Web-based co-occurrence signals improve over baseline model (global importance) § Temporal and content diversity are crucial 32 Thanks! Contact @timalthoff althoff@cs.stanford.edu Check out the demo! cs.stanford.edu/~althoff/timemachine Paper / Proofs / Slides cs.stanford.edu/~althoff Acknowledgements Evgeniy Gabrilovich, Arun Chaganty, Stefanie Jegelka, Karthik Raman, Sujith Ravi, Ravi Kumar, Jeff Tamer, Patri Friedman, Danila Sinopalnikov, Alexander Lyashuk, Jure Leskovec, David Hallac, Caroline Suen 33