Semantic Tags Generation and Retrieval for Online Advertising
Transcription
Semantic Tags Generation and Retrieval for Online Advertising
Semantic Tags Generation and Retrieval for Online Advertising 1, 2 2 Roberto Mirizzi, Azzurra Ragone, 2 Tommaso Di Noia 1 Yahoo! 2 Polytechnic university of Bari, Italy Presented by Pranjali Rajiv Joshi • Introduction Outline Traditional Ad generation process Problem Statement • Background Technologies DBpedia SPARQL • Not Only Tag (NOT) System Architecture Algorithm • Evaluation • Conclusion 2 Introduction • Display relevant and appropriate advertisements • Sematic relations among keywords important User searches Tiger Woods Ads on Zoo 3 Introduction Traditional process of Ads generation • Select a keyword to activates an ad • Lexicographic approach not concerned with semantics Possible Solution Ontology defines the concepts and • ________ relationships used to describe and represent an area of knowledge Drawbacks? Maintain an ontology, cover every possible domain 4 Introduction Problem Statement • Traditional approaches failed to display relevant ads to users • Could use an ontology, but difficult to maintain • Develop a system called Not Only Tags (NOT) that address these issues 5 Background Technologies • DBpedia - Linked RDF Data Graph – Extracts structured information from Wikipedia – Links different datasets on the Web to Wikipedia data (As of 2014, 3 billion triples, 50 million links) – Allows to make sophisticated queries • SPARQL – Perform queries on RDF data sources (DBpedia) 6 RDF - Resource Description Framework Helps to add sematic information to web Dave Likes Cookies Predicate Subject Triple -> URI RDF Linked Data – RDF Triples connected with RDF links 7 Object RDF Linked Data and DBpedia • DBpedia is one of the main cloud of the RDF Linked Data graph. • Machine-understandable equivalent of Wikipedia Example Links • Wikipedia : https://en.wikipedia.org/wiki/RDFa • DBpedia : http://dbpedia.org/page/RDFa • SPARQL – To perform queries on DBpedia 8 Quiz • A statement in RDF is called _______ a) Tuple b) Treble c) Triple d) Trouple • RDF uses _______ to identify resources a) Web Identifier b) MAC Address c) IP Address d) Network Address • RDF Linked Data Graph consists of _______ a) RDF Triples b) SPARQL c) Both a & d d) RDF Links 9 Quiz • Which of the following is true about Dbpedia? a) It is a RDF Linked Data Graph b) It consists of structured information from Wikipedia c) It is possible to ask complex queries on DBpedia d) All a,b,c • SPARQL is used to query on RDF data sources. a) True b) False 10 Not Only Tag (NOT) System Demo : http://sisinflab.poliba.it/not-only-tags/ Text input area User’s tag bag area Tag cloud population 11 What is behind NOT ? (I) Graph Exploration and computation of similarity value Dbpedia page of RDFa Semantic_Web XML-based_standards 12 What is behind NOT ? (II) … Knowledge_representation Data_managemen t … Internet_architecture … XML Computer_and_telecommunication_stantards … Semantic_Web XML-based_standards RDFa Resource Description Framework Microformats Triplestores Folksonomy … Web_services User_interface_markup_languages … Scalable_Vector_Graphics … Legends Article 13 Category skos:subject skos:broader NOT System Architecture 1• Linked Data graph exploration 1• Rank nodes exploiting external information 1• Store results as pairs of nodes together with their similarity 14 Data structure uri – a DBpedia URI hits – number of times the URI is visited during the exploration r ranked – boolean value representing if URI has been ranked in_context – boolean value stating if URI under consideration is within the context or not 15 Flow Chart DBpedia Ranker max : 2 d < max ? RDFa 16 Flow Chart DBpedia Ranker Explore on basis of skos : broader & skos : subject properties Yes d < max ? RDFa RDFa DBpedia Webpage Exploring first time? Semantic Web 17 Flow Chart DBpedia Ranker Yes d < max ? RDFa Exploring first time? uri hits : 1 ranked : false in_context : true Yes Create data structure ‘r’ Create ‘r’ for Sematic Web Semantic Web 18 Flow Chart DBpedia Ranker Is semantic Web ‘r’ uri hits ranked in_context Compute Similarity within the context of our search domain? sim (RDFa, Sematic Web) Yes In Context ? Increment hit in ‘r’ No Stop Exploration No Semantic Web Yes d < max ? Exploring first time? Yes Create data structure ‘r’ No RDFa Semantic Web Create ‘r’ for Sematic Web 19 Similarity Value (I) delicious 1. Web pages contain or have been tagged by the rdfs : label value associated with uri’s 20 Similarity Value (II) 2. For Wikipedia w1 linked to w2, in Dbpedia we have dbpprop : wikilink from uri 1 to uri 2 W1 Uri 1 RDFa 3. 0 – No link W2 1 – unidirectional 2 – bidirectional uri 2 Semantic Web Rdfs : label of uri 1 is contained in dbprop : abstract of uri 2 and vice versa : m/n n : number of words composing the label m : number of words composing label also is abstract 21 Similarity Value (III) Bidirectional link wikilinkScore(RDFa, Sematic Web) = 2 ‘RDFa’ not present in abstract of Semantic Web ‘Semantic Web’ not present in abstract of RDFa abstractScore(RDFa, Semantic Web) = 0 22 Similarity Value (IV) A quick recap Calculated on basis three things : 1. Number of webpages returned with uri 1, uri 2 2. Wikilink between uri 1 and uri 2 3. Label of uri 1 present in abstract of uri 2 (& vice versa) 23 Context Analyzer (I) • Advertising agency is centered with ‘database and programming languages’ Context ‘C’ 24 Context Analyzer (III) Semantic Web Context ‘C’ C {c1,c2,…ck} Top K resources Returned for ‘database & programming’ Current Node uri Calculate Similarity value s = s + s(c.uri & uri) s (c1, Sematic Web) s (c2, Sematic Web) … … s (ck, Semantic Web) 25 True – In context Yes s >= threshold ? Threshold value 4 No False – Not in Context Quick Recap Graph exploration, similarity value, context analyzer ‘r’ for Semantic Web Semantic Web RDFa max = 2 In context? s >= threshold? (4) No Exploring first time? Stop Exploration s (context c1…ck, Semantic Web) No Yes d < max ? Yes Increment hit in ‘r’ uri hits ranked in_context • web pages - label • wikilinks • label - abstract sim (RDFa, Semantic Web) Compute Similarity Yes Create data structure ‘r’ No subject, broader Semantic Web 26 Evaluation (I) Comparison of 5 different algorithms 50 volunteers Researchers in the ICT area 244 votes collected (on average 5 votes for each users) Average time to vote: 1min and 40secs 27 Evaluation (II) DBpedia Ranker Algo 3 Algo 4 Algo 2 28 Algo 5 Conclusion • Presented a novel system for sematic tag generation and retrieval • System architecture – DBpedia, SPARQL Algorithms - graph exploration, similarity value and context analyzer • Help advertisers in process of keyword selection and enhance ad selection process 29 That’s all for today, have a nice day! Thank you