Web Mining 2.0 presentation
Transcription
Web Mining 2.0 presentation
IIIA - CSIC Mining Music Social Networks for Automating Social Music Services Claudio Baccigalupo – Enric Plaza IIIA-CSIC – September 2007 The Goal To automatically program the music for the channels of a Web radio, with a selection process that emulates the knowledge of an expert (DJ) The Goal To automatically program the music for the channels of a Web radio, with a selection process that emulates the knowledge of an expert (DJ) This requires a domain knowledge about musical associations (which songs and artists are to be played one after the other?) We present how we obtain such a knowledge from a data mining process on a large collection of playlists gathered from the Web 1. The Data Set: gathering playlists from Web users 2. The Data Mining: extracting musical knowledge from playlists 3. The Evaluation: comparing with other similar measures 4. The Application: programming a social Web radio 5. Conclusions The Data Set: why playlists? Playlists are sequences of songs compiled by humans for some purpose, with cultural and social aspects that cannot be found with other sources of musical knowledge (e.g., acoustic-based) Playlists form part of that user-created content that is nowadays more and more available, thanks to the social Web phenomenon Playlists are easy to gather, analyse, store, and understand Playlists have a sequential nature, and the ordering of songs is a relevant feature since our goal is to programme a radio channel The Data Set: which playlists? We have collected 599,565 user-compiled playlists from the Webbased music community MyStrands (http://www.mystrands.com) published using a Web browser published using MyStrands plug-in The Data Set: which playlists? We have collected 599,565 user-compiled playlists from the Webbased music community MyStrands (http://www.mystrands.com) Playlists can be obtained with the Web API called OpenStrands Playlists have an average length of 16.8 songs Users are 65% male, 32 years old in average MyStrands includes more than 5M songs 1. The Data Set: gathering playlists from Web users 2. The Data Mining: extracting musical knowledge from playlists 3. The Evaluation: comparing with other similar measures 4. The Application: programming a social Web radio 5. Conclusions The Data Mining: what to look for? While a song X is playing on a radio channel, we wish to know which songs are musically associated with X , and are good candidates to be selected to play after X on the channel We mine the playlists to learn the song association for any pair of songs (X, Y ) and the artist association for any pair of artists (A, B) Data Mining Process Song X (Artist A) Song Y (Artist B) I Spy (Pulp) Trash (Suede) s(X, Y ) = 0.9 s! (A, B) = 0.7 I Spy (Pulp) T.N.T. (AC/DC) s(X, Y ) = 0.3 s! (A, B) = 0.2 s(X, Y ) ∈ [0, 1] s! (A, B) ∈ [0, 1] The Data Mining: what to consider? We count the co-occurrences of pairs of songs in the playlists I Spy (Pulp) Trash (Suede) occur together in 4 playlists We normalise against the popularity of the songs in the playlists I Spy (Pulp) since also co-occur 4 times, but this value is not as relevant, occurs in 14,897 playlists, 219 times more than Trash (Suede) Basket Case (Green Day) Basket Case (Green Day) We assign stronger associations when the distance between songs is small and when the ordering is preserved Playlist #1: Song 2 (Blur) I Spy (Pulp) Trash (Suede) Wonderwall (Oasis) contiguous post-occurrence between songs Playlist #2: Basket Case (Green Day) Vertigo (U2) distant pre-occurrence between songs Uno (Muse) strong association Trouble (Coldplay) weak association I Spy (Pulp) The Data Mining: song associations We filter out statistically insignificant associations, and cooccurrences between songs from the same artist We obtain from the playlists of MyStrands a set of 112,238 songs that have a song association degree with some other song Top associated tracks for: Strangers In The Night (Frank Sinatra) Smoke On The Water (Deep Purple) Up, Up and Away (The 5th Dimension) Message To Michael (Dionne Warwick) Whatever happens, I Love You (Morrissey) Sugar Baby Love (Rubettes) Move It On Over (Ray Charles) It Serves You Right To Suffer ( John Lee Hooker) Blue Angel (Roy Orbison) Space Truckin’ (AA.VV.) Cold Metal (Iggy Pop) Iron Man (Black Sabbath) China Grove (The Doobie Brothers) Crossroads (Eric Clapton) Sunshine Of Your Love (Cream) Wild Thing ( Jimi Hendrix) The Data Mining: artist associations With the same technique, we estimate the artist association degree for 25,881 artists from the playlists of MyStrands We count the co-occurrences of pairs of artists in the playlists, normalise along their popularity and consider their distances Top associated artists for: Abba John Williams Destiny’s Child Frank Sinatra Agnetha Faltskog A-Teens Chic Gloria Gaynor The 5th Dimension Andy Gibb Olivia Newton-John Meco Danny Elfman John Carpenter London Theatre Orchestra John Barry Hollywood Studio Orchestra Elmer Bernstein Kelly Rowland City High Ciara Fantasia Christina Milian Beyoncé Ashanti Dean Martin Sammy David Jr. Judy Garland Bing Crosby The California Raisins Tony Bennett Louis Prima 1. The Data Set: gathering playlists from Web users 2. The Data Mining: extracting musical knowledge from playlists 3. The Evaluation: comparing with other similar measures 4. The Application: programming a social Web radio 5. Conclusions The Evaluation: preamble We compare the top associated tracks and artists found with the most similar tracks and artists proposed by different Web sites MusicSeer The results will be distinct since we do not look for a similarity (symmetric measure) but for building a good sequence of songs (asymmetric, the ordering matters) Still, some observations can be made The Evaluation: song association We assign the highest rankings to songs which are less popular If one of these songs is contained in the radio library, it will be played, thus the listeners will probably discover new music Otherwise, a less associated/more popular song will be played Top associated songs for: Strangers In The Night (Frank Sinatra) Up, Up and Away (The 5th Dimension) Message To Michael (Dionne Warwick) Whatever happens, I Love You (Morrissey) Sugar Baby Love (Rubettes) Move It On Over (Ray Charles) It Serves You Right To Suffer ( John Lee Hooker) Blue Angel (Roy Orbison) Yahoo! Mr. Tambourine Man (The Byrds) Don’t You Want Me (Human League) I’m a Believer (The Monkees) Good Vibrations (The Beach Boys) Stay (Shakespeare’s Sister) The House of The Rising Sun (The Animals) Oh Pretty Woman (Roy Orbison) The Evaluation: artist association Some high-ranked associations are common, although inferred with different methods (human experts, playlists, listening habits) We are able to spot out first one of the most associated artist Top associated artists for: Abba Agnetha Faltskog A-Teens Chic Gloria Gaynor The 5th Dimension Andy Gibb MyStrands AMG Yahoo! Last.fm Olivia Newton-John Donna Summer Madonna Gloria Gaynor Cyndi Lauper Blondie Kool & The Gang Ace of Base Gemini Maywood Bananarama Lisa Stansfield Gary Wright The Bee Gees The Carpenters The Beatles Foreigner Whitney Houston Roxette Madonna The Bee Gees Madonna Cher Kylie Minogue Boney M. Michael Jackson Elton John MusicSeer Playlists The Bee Gees Blondie Cyndi Lauper Queen Cat Stevens Cher The Beach Boys 1. The Data Set: gathering playlists from Web users 2. The Data Mining: extracting musical knowledge from playlists 3. The Evaluation: comparing with other similar measures 4. The Application: programming a social Web radio 5. Conclusions The Application: what is Poolcasting? The Application: song scheduling The collection of songs (Music Pool) is open and dynamic The music played on each channel cannot be pre-programmed, every channel is automatically scheduled in real time Last song played X Song and Artist Associations Retrieval Music Pool Subset of candidates musically associated with X The Application: retrieval process The best candidates are songs either associated with X, or associated with songs by A, or associated with songs from artists associated with A, or whose artist is associated with A Last song X (A) I Spy X (Pulp) Song and Artist Associations s(X, Y ) s! (A, B) Retrieval Music Pool Cody (Mogwai) Drive (R.E.M.) Uno (Muse) Nikita (Elton John) Noon (Eric Serra) Trash (Suede) Go (Moby) T.N.T. (AC/DC) Pilgrim (Enya) Roxanne (Sting) Candidates Uno (Muse) Go (Moby) Drive (R.E.M.) Trash (Suede) The Application: reuse process The best candidates are then ranked according to the music preferences of the current listeners, and the best song is played Listeners preferences are inferred analysing their music libraries Last song X X (A) I Spy X (Pulp) Song and Artist Associations s(X, Y ) s! (A, B) Candidates Uno (Muse) Go (Moby) Retrieval Drive (R.E.M.) Trash (Suede) Music Pool Ranking Listeners Preferences Feedback the best ranked candidate is played next The Application: more details The higher the rating and the higher the play count of a song in a user library (iTunes), the higher the inferred listener preference Listeners can interact via the Web interface to state their explicit preferences for the songs played or to rate the next candidates When listeners have diverging preferences in the same channel, fairness is achieved by favouring at each moment those listeners who were less satisfied by the last songs played 1. The Data Set: gathering playlists from Web users 2. The Data Mining: extracting musical knowledge from playlists 3. The Evaluation: comparing with other similar measures 4. The Application: programming a social Web radio 5. Conclusions Conclusions We use knowledge discovered from a Web-based music community to provide a group-customised Web service Domain knowledge about which songs and artists are musically associated originates from the data mining of patterns of songs in a large set of playlists compiled by MyStrands users The result is a social Web radio where channels are automatically programmed in real time to match both musical associations criteria and the preferences of the current listeners Future work: evaluate the quality of the associations, and extend the data mining process to include patterns of three or more songs IIIA - CSIC ANY QUESTION? Mining Music Social Networks for Automating Social Music Services Claudio Baccigalupo – Enric Plaza IIIA-CSIC – September 2007