Action Sci-fi Comedy Romance Children
Transcription
Action Sci-fi Comedy Romance Children
Recommender Systems Dr. Nava Tintarev (NT) (Part 1 of 2) Did you do your homework? ;) Wait, how does this stuff fit together?! • How do we adapt? (How?) – Adaptive hypermedia - content and navigation • What can we adapt to? (To What?) – User model 2 • Adaptive hypermedia has grown a lot in the last years... 3 Wait, how does this stuff fit together?! • What can we adapt? (What?) – Domain model • Why do we need adaptation? (Why?) – Adaptation/goal model – goals and tasks • Where can we apply adaptation? (Where?) • When can we apply adaptation? (When?) – Application and Context model 4 Why use recommender systems? Information overload • Too many • movies, books, webpages, Systems songs, plumbers, etc that make Searching is difficult personalized recommendations of goods, services, and people (Kautz) What’s good, what’s not? 5 Psst, recsys aren’t a new thing • But some factors are... – User generated content – Quantity and quality of data – New domains – More commercial 6 What IS a recommender system? • User identifies one or more objects as being of interest • The recommender system suggests other objects that are similar (infers liking) 7 But how does this work?! • Remember those slides on user modeling? – What kind of info can we use? – How are we going to get it? Suppose our user has rated ten movies: A. B. C. D. E. Jurassic Park Harry Potter ET Lord of the Rings Alien F. G. H. I. J. Terminator 101 Dalmatians Titanic Sleepless in Seattle Mr Bean ? Which movie do we recommend next….? 8 Getting to know a user’s opinion – Implicit (e.g. viewing time) – or explicit (e.g. ratings or answering of questions) – Recall vs. recognition… • Search vs browse • Top item, top items... 9 Example: XLibris • User reads text and annotates • System generates links and further reading list Example: MovieLens To test out a Movie Recommender go to http://movielens.umn.edu/ • User rates movies • The system suggests ‘best bets’ • Users keep rating movies while checking best bets Similar? – Today: Similar in content content-based filtering – Next lecture: Similar in ‘appreciation’ by other users collaborative filtering – Demographic (stereotypes!) and knowledge/utility-based (clever questions!) methods – Just the tip of the ice-berg… 12 Content-based filtering Starwars Action Sci-fi X X Pretty Woman Little Mermaid Romance Children Anna X 101 Dalmatians Terminator Comedy X + X X + - X ? 13 But what if Anna really likes Sci-fi but not Action movies? Starwars Action Sci-fi X X Pretty Woman X 101 Dalmatians Terminator Comedy Romance Children + + X X X Anna + ? 14 Possible algorithm (1) • Tend to be ‘classifiers’ • Learn weights (wi) for words so that wi for words occurring > threshold • Initially, weights are 1 • For each rated example determine sum • If sum above threshold, and user did not like example, then divide all weights by 2 Possible algorithm (2) • If sum below threshold, and user did like example, then multiply all weights by 2 • Recommend items with highest sum Example (Step 1) Weights Movie 1 Action Sci-fi Comedy Romance Children 1 1 1 1 1 Action Sci-fi Comedy Romance Children X X X X X - Threshold = 2 Sum = 5 > 2, and opinion negative, so, divide weights by 2 Example (Step 2) Weights Action Sci-fi Comedy Romance Children 0.5 0.5 0.5 0.5 0.5 Action Movie 2 Sci-fi Comedy X X Romance Children + Threshold = 2 Sum = 1 < 2, and opinion positive so, multiply by 2 Example (Step 3) Weights Action Sci-fi Comedy Romance Children 0.5 1 1 0.5 0.5 Action Movie 3 X Sci-fi Comedy X Romance Children + Threshold = 2 Sum = 1.5 < 2, and opinion positive so, multiply by 2 Example (Step 4) Weights Movie 3 Action Sci-fi Comedy Romance Children 1 1 2 0.5 0.5 Action Sci-fi Comedy Romance Children X X X - Threshold = 2 Sum = 2.5 > 2, and opinion negative so, divide by 2 Example (Step 5) Weights Movie 3 Action Sci-fi Comedy Romance Children 0.5 0.5 2 0.25 0.5 Action Sci-fi Comedy Romance Children X X X - Threshold = 2 Sum = 2.5 > 2, and opinion negative so, divide by 2 ETC ETC, repeat for all ratings, or do all 10 times Observations We use our knowledge • about the items rated • about other items In particular, attributes like type of movie. Multiple attributes are likely to be important. Needs something like… • Description of items in terms of attributes For example: Type, Director, Actors, ... • Description via keywords • Possibility to look at content itself, like the text Synopsis: Set in late 1930s Arezzo, Italy, Jewish man and poet, Guido Orefice (Roberto Benigni) uses cunning wit to win over an Italian schoolteacher, Dora (Nicoletta Braschi) who's set to marry another man. Charming her with "Buongiorno Principessa“…. What has been used • Features can be automatically extracted e.g. TF/IDF, matrix factorization – 100 words with highest TF-IDF weights. Words occurring more frequently than average, and distinguish from other items. – For example: for restaurant descriptions: words like “noodle”, “shrimp”, “basil”, “exotic”, “salmon” What has been used • Feature vector for new item and previous items. • Common similarity measures: pearsons R correlation and cosine similarity for feature vectors. Multimedia Information Retrieval • • • • • Images Photo collections, Face recognition Video Movie recommendation, Electronic Program Guides Spoken documents Music Other sounds Concept-based image retrieval • Key: Concept-based indexing of images – Based on attributes extracted manually – Based on logical, high level features • Systems for image indexing – ICONCLASS, A&AT, … • What? – Time, location, content Content-based image retrieval • Key: Automatic indexing of images based on low-level features – Color – Texture – Shape – Spatial orientation and layout – Sketch Image input to search Examples - content based IR • QBIC - IBM’s Query By Image Content: http://wwwqbic.almaden.ibm.com • MIT PhotoBook (Source of following examples) http://vismod.media.mit.edu/vismod/demos/photobook/ • Virage: http://www.virage.com • VisualSeek: http://www.ctr.columbia.edu/VisualSEEk Image input to search Problems with Content-based Filtering (1/2) • Need to know about item content – requires manual or automatic indexing • Item features do not capture everything • “User cold-start” problem – Needs to learn what content features are important for the user, so takes time 34 Problems with Content-based Filtering (2/2) What if user’s interests change? Lack of serendipity [Wikipedia: “the effect by which one accidentally discovers something fortunate, especially while looking for something entirely unrelated” ] 35 Summary • User identifies one or more objects as being of interest • The recommender system suggests other objects that are similar • Content-based filtering is one method • … but it’s not perfect • Next week – some solutions! Univ. Carlos III de Madrid 11/06/2009 36 TF/IDF (extra) • Term frequency, inverse document frequency 37 Cosi…wha? (extra) • Cosine similarity - between two vectors of n dimensions by finding the cosine of the angle between them • Value between -1 (different) and 1 (similar). − 0 => usually independence, 38 Pearson correlation (extra) (sample) Mean of X 39