Visualisation and Analysis of the Internet Movie Database
Transcription
Visualisation and Analysis of the Internet Movie Database
Visualisation and Analysis of the Internet Movie Database∗ Adel Ahmed† Vladimir Batagelj‡ Xiaoyan Fu§ School of IT, University of Sydney Discrete and Computational Mathematics NICTA, Australia NICTA, Australia University of Ljubljana, Slovenia Seok-Hee Hong¶ Damian Merrick School of IT, University of Sydney School of IT, University of Sydney Social Science Informatics NICTA, Australia NICTA, Australia University of Ljubljana, Slovenia A BSTRACT In this paper, we present a case study for the visualisation and analysis of large and complex temporal multivariate networks derived from the Internet Movie DataBase (IMDB). Our approach is to integrate network analysis methods with visualisation in order to address scalability and complexity issues. In particular, we defined new analysis methods such as (p,q)-core and 4-ring to identify important dense subgraphs and short cycles from the huge bipartite graphs. We applied island analysis for a specific time slice in order to identify important and meaningful subgraphs. Further, a temporal Kevin Bacon graph and a temporal two mode network are extracted in order to provide insight and knowledge on the evolution. Keywords: Large and Complex Networks, Case Study, Visualisation, Network Analysis, IMDB. Index Terms: H.5.2 [Information Interfaces and Presentation]: User Interfaces—Algorithms; I.3.6 [Computer Graphics]: Methodology and Techniques— 1 I NTRODUCTION Recent technological advances have led to the production of a lot of data, and consequently have led to many large and complex network models across a number of domains. Examples include: • Webgraphs: where the entities are web pages and relationships are hyperlinks; these are huge: the whole graph consists of billions of nodes. • Social networks: These include telephone call graphs (used to trace terrorists), money movement networks (used to detect money laundering), and citation networks or collaboration networks. The size of the network can be medium to very large. • Biological networks: Protein-protein interaction (PPI) networks, metabolic pathways, gene regulatory networks and phylogenetic networks are used by biologists to analyse and engineer biochemical materials. In general, they are smaller, with thousands of nodes. However, the relationships in these networks are very complex. ∗ This paper is based on the winning entry of the Graph Drawing Competition 2005 [7] and invited presentation at Sunbelt Viszard Session [9]. † e-mail: adel.ahmed@nicta.com.au ‡ e-mail:vladimir.batagelj@fmf.uni-lj.si § e-mail:xiaoyan.fu@nicta.com.au ¶ e-mail:shhong@it.usyd.edu.au e-mail:damian.merrick@nicta.com.au ∗∗ e-mail:andrej.mrvar@fdv.uni-lj.si Andrej Mrvar∗∗ Understanding these networks is a key enabler for many applications. Good analysis methods are needed for these networks, and some are available. However, such methods are not useful unless the results are effectively communicated to humans. Visualisation can be an effective tool for the understanding of such networks. Good visualisation reveals the hidden structure of the networks and amplifies human understanding, thus leading to new insights, new findings and possible predictions for the future. We can identify the following challenging research issues for analysis and visualisation of large and complex networks: • Scalability: Webgraphs or telephone call graphs gathered by AT&T have billions of nodes. In some cases, it is impossible to visualise the whole graph, or one cannot possibly load the whole graph in a main memory. Hence, the design of new analysis and visualisation methods for huge networks is a key research challenge from databases to computer graphics. • Complexity: Relationships between actors in a social network, for example, can have a multitude of attributes (for example, observed behavior can be confirmed or unconfirmed, relationships can be directed or undirected, and weighted by probabilities). Also, biological networks are quite complex in nature; for example, metabolic pathways have only a few thousand nodes, but their relationships and interactions are very complex. The data may be given by nature, but some parts of the data may be unknown to human scientists. The design of analysis and visualisation methods to resolve these complexity issues is the second research challenge. • Network Dynamics: Real world networks are always changing over time. Many social networks, such as webgraphs, evolve relatively slowly over time. In some cases, such as telephone call networks, the data is a very fast-streamed graph. Effective and efficient modeling, analysis and visualisation for dynamic networks are challenging research topics. One approach to solve these challenging issues is an integration of analysis with visualisation and interaction. Analysis tools for networks are not useful without visualisation, and visualisation tools are not useful unless they are linked to analysis. Further, interaction is necessary to find out more details or insights from the visualisation. In this paper, we present a case study for our approach to integrating analysis, visualisation and interaction using large and complex temporal multivariate networks derived from the IMDB (Internet Movie Data Base). In general, the IMDB is a huge and very rich data set with many attributes. Note that the IMDB data set has become a challenging data set for visualisation researchers [7, 9]. For example, a multi-scale approach for visualisation of small world networks was used for data sets from IMDB [3]. A visualization approach for dynamic affiliation networks in which events are characterized by a set of descriptors was presented [6]. A radial ripple metaphor was devised to display the passing of time and ’EnquŒtes du commissaire Maigret, Les’ Popular Science Unusual Occupations Richard, Jean (I) Whitman, Gayne Carpenter, Ken (I) ’Sitte, Die’ Heinrichs, Dirk Gawlich, Cathlen B hm, Iris Boyd, Karin ’Nero Wolfe Mystery, A’ Hutton, Timothy Fox, Colin (I) Dunn, Conrad Chaykin, Maury ’Commissario Corso, Il’ Abatantuono, Diego Maggio, Rosalia Panczak, Hans Georg Martens, ’Operation Phoenix - J ger zwischen den Welten’Dirk (I) Jarczyk, Robert Bock, Alana Pfohl, Lawrence Flair, Ric Borden, Steve (I) Starrcade Dansk melodi grand prix Eurovision Song Contest, The Rasmussen, Tommy (I) Olsen, Jłrgen Heick, Keld de Mylius, Jłrgen Siggaard, Kirsten Hłeg, Jannie Kelehan, Noel Berry, Colin Statsministerens nyt rstale Schl ter, Poul Rasmussen, Poul Nyrup Cream of Comedy Sims, Tim Leese, Lindsay Kennedy Center Honors: A Celebration of the Performing Arts, The Dronningens nyt rstale Cronkite, Walter Margrethe II Levesque, Paul Michael Jacobs, Glen Hickenbottom, Michael Gunn, Billy (II) Hart, Owen Traylor, Raymond DiBiase, Ted Anoai, Solofatu Ross, Jim (III) Royal Rumble Hart, Bret Summerslam Smith, Davey Boy King of the Ring Lawler, Jerry Survivor Series Eaton, Mark (II) Calaway, Mark McMahon, Vince Figure 1: Arcs with multiplicity at least 8 conveys relations among the different constituents through appropriate layout. Note that the method is suitable for an egocentric perspective. As the first step of our approach, we integrate network analysis methods [5, 10] with visualisation. In particular, we defined the new analysis methods such as (p,q)-core and 4-ring to identify important dense subgraphs and short cycles from the huge bipartite graphs. We applied island analysis for a specific time slice in order to identify important and meaningful subgraphs of the large and complex network. Further, a temporal Kevin Bacon graph and a temporal two mode network are extracted and visualised in order to provide insight and knowledge on the evolution of the IMDB data set. This paper is organised as follows. In the next Section, we present a simple analysis of the IMDB data set. In Section 3, we present the integration of network analysis methods with visualisation for large bipartite graphs including (p,q)-core, 4-ring and island. Section 4 presents visual analysis based on the Kevin-Bacon number. Section 5 presents galaxy metaphor visualisation of a temporal two mode actor-movie network, and a visual analysis of the two mode network with company attributes. Section 6 concludes. 2 BASIC CHARACTERISTICS OF IMDB The source of the original data is the Internet Movie Database. We transformed the contest data into a temporal network with some additional vectors and partitions describing the properties of vertices. The IMDB network is bipartite (two mode) and has 1324748 = 428440 + 896308 vertices and 3792390 arcs. 9927 of the arcs in the network are multiple (parallel) arcs. The nature of the appearance of multiple arcs can be seen in Figure 1, where all arcs with multiplicity of at least 8 are displayed. Note that in the analysis that follows, we treat multiple arcs as single. The IMDB network consists of 132714 weak components. 3 V ISUALISATION AND A NALYSIS OF L ARGE B IPARTITE N ETWORKS There are few direct specialized methods for analyzing bipartite networks, especially large ones. Because of the size of the IMDB network, the standard reduction of the entire network to one or the other derived 1-mode network was not an option. This motivated us to design and implement two new methods for analysis of bipartite networks: • bipartite version of cores – (p, q)-cores Table 1: (p, q : n1 , n2 ) for IMDB 1 1590: 1590 1 | 22 24: 1854 1153 | 43 14: 29 83 2 516: 788 3 | 23 23: 47 56 | 44 14: 29 83 3 212: 1705 18 | 24 23: 34 39 | 45 13: 30 95 4 151: 4330 154 | 25 22: 42 53 | 46 13: 29 94 5 131: 4282 209 | 26 22: 31 38 | 47 12: 29 101 6 115: 3635 223 | 27 22: 31 38 | 48 12: 28 100 7 101: 3224 244 | 28 20: 36 53 | 49 12: 26 95 8 88: 2860 263 | 29 20: 35 52 | 50 11: 27 111 9 77: 3467 393 | 30 19: 35 59 | 51 11: 26 110 10 69: 3150 428 | 31 19: 35 59 | 52 11: 16 79 11 63: 2442 382 | 32 19: 34 57 | 53 10: 35 162 12 56: 2479 454 | 33 18: 34 62 | 54 10: 35 162 13 50: 3330 716 | 34 18: 34 62 | 55 10: 34 162 14 46: 2460 596 | 35 18: 33 61 | 56 10: 34 162 15 42: 2663 739 | 36 17: 33 65 | 57 9: 35 187 16 39: 2173 678 | 37 16: 33 75 | 58 9: 33 180 17 35: 2791 995 | 38 16: 30 73 | 59 9: 33 180 18 32: 2684 1080 | 39 16: 29 70 | 60 9: 32 178 19 30: 2395 1063 | 40 15: 29 77 | 61 9: 31 177 20 28: 2216 1087 | 41 15: 28 76 | 62 9: 31 177 21 26: 1988 1087 | 42 15: 28 76 | 63 8: 31 202 • 4-ring weights on lines 3.1 (p, q)-core Analysis The subset of vertices C ⊆ V is a (p, q)-core in a bipartite (2-mode) network N = (V1 ,V2 ; L), V = V1 ∪V2 if and only if a. in the induced subnetwork K = (C1 ,C2 ; L(C)), C1 = C ∩ V1 , C2 = C ∩ V2 it holds ∀v ∈ C1 : degK (v) ≥ p and ∀v ∈ C2 : degK (v) ≥ q ; b. C is the maximal subset of V satisfying condition a. The basic properties of bipartite cores are: • C(0, 0) = V • K(p, q) is not always connected • (p1 ≤ p2 ) ∧ (q1 ≤ q2 ) ⇒ C(p1 , q1 ) ⊆ C(p2 , q2 ) Using (p, q)-cores, we can identify important dense structure out of large and complex networks. We design a very efficient O(m) algorithm to fine (p, q)-cores, and implement in Pajek . Since there are many (p, q)-cores, we must answer the question of how to select the interesting ones among them. To help the user in these decisions, we implemented a Table of cores’ characteristics n1 = |C1 (p, q)|, n2 = |C2 (p, q)| and k – number of components in K(p, q) (see Table 1 and 2). We look for (p, q)-cores where • n1 + n2 ≤ selected threshold • big jumps from C(p − 1, q) and C(p, q − 1) to C(p, q). For example, we selected (247,2)-core and (27,22)-core. From the labels we can see that the corresponding topics are: wrestling, and pornography. See Figures 2 and 3. 3.2 4-ring Analysis A k-ring is a simple closed chain of length k. Using k-rings we can define a weight of edges as wk (e) = # of different k-rings containing the edge e ∈ E. Since for a complete graph Kr , r ≥ k ≥ 3 we have wk (Kr ) = (r − 2)!/(r − k)! the edges belonging to cliques have large weights. Therefore, these weights can be used to identify the dense parts of a network. For example, all r-cliques of a network belong to r − 2edge cut for the weight w3 . Zhukov, Boris (I) Wright, Charles (II) Wilson, Al (III) Wight, Paul Wickens, Brian White, Leon Warrior Warrington, Chaz Ware, David (II) Waltman, Sean Walker, P.J. von Erich, Kerry Vaziri, Kazrow Van Dam, Rob Valentine, Greg Vailahi, Sione Tunney, Jack Traylor, Raymond Tenta, John Taylor, Terry (IV) Taylor, Scott (IX) Tanaka, Pat Tajiri, Yoshihiro Szopinski, Terry Storm, Lance Steiner, Scott Steiner, Rick (I) Solis, Mercid Snow, Al Smith, Davey Boy Slaughter, Sgt. Simmons, Ron (I) Shinzaki, Kensuke Shamrock, Ken Senerca, Pete Scaggs, Charles Savage, Randy Saturn, Perry Sags, Jerry Ruth, Glen Runnels, Dustin Rude, Rick Rougeau, Raymond Rougeau Jr., Jacques Rotunda, Mike Ross, Jim (III) Rock, The Roberts, Jake (II) Rivera, Juan (II) Rhodes, Dusty (I) Reso, Jason Reiher, Jim Reed, Bruce (II) Race, Harley Prichard, Tom Powers, Jim (IV) Poffo, Lanny Plotcheck, Michael Piper, Roddy Pfohl, Lawrence Pettengill, Todd Peruzovic, Josip Palumbo, Chuck (I) Page, Dallas Ottman, Fred Orton, Randy Okerlund, Gene Nowinski, Chris Norris, Tony (I) Nord, John Neidhart, Jim Nash, Kevin (I) Muraco, Don Morris, Jim (VII) Morley, Sean Morgan, Matt (III) Mooney, Sean (I) Moody, William (I) Miller, Butch Mero, Marc McMahon, Vince McMahon, Shane Matthews, Darren (II) Martin, Andrew (II) Martel, Rick Marella, Robert Marella, Joseph A. Manna, Michael Lothario, Jose Long, Teddy LoMonaco, Mark Lockwood, Michael Levy, Scott (III) Levesque, Paul Michael Lesnar, Brock Leslie, Ed Leinhardt, Rodney Layfield, John Lawler, Jerry Lawler, Brian (II) Laurinaitis, Joe Laughlin, Tom (IV) Lauer, David (II) Knobs, Brian Knight, Dennis (II) Killings, Ron Kelly, Kevin (VIII) Keirn, Steve Jones, Michael (XVI) Johnson, Ken (X) Jericho, Chris Jarrett, Jeff (I) Jannetty, Marty James, Brian (II) Jacobs, Glen Jackson, Tiger Hyson, Matt Hughes, Devon Huffman, Booker Howard, Robert William Howard, Jamie Houston, Sam Horowitz, Barry Horn, Bobby Hollie, Dan Hogan, Hulk Hickenbottom, Michael Heyman, Paul Hernandez, Ray Henry, Mark (I) Hennig, Curt Helms, Shane Hegstrand, Michael Heenan, Bobby Hebner, Earl Hebner, Dave Heath, David (I) Hayes, Lord Alfred Hart, Stu Hart, Owen Hart, Jimmy (I) Hart, Bret Harris, Ron (IV) Harris, Don (VII) Harris, Brian (IX) Hardy, Matt Hardy, Jeff (I) Hall, Scott (I) Guttierrez, Oscar Gunn, Billy (II) Guerrero, Eddie Guerrero Jr., Chavo Gray, George (VI) Goldberg, Bill (I) Gill, Duane Gasparino, Peter Garea, Tony Funaki, Sho Fujiwara, Harry Frazier Jr., Nelson Foley, Mick Flair, Ric Finkel, Howard Fifita, Uliuli Fatu, Eddie Farris, Roy Eudy, Sid Enos, Mike (I) Eaton, Mark (II) Eadie, Bill Duggan, Jim (II) Douglas, Shane DiBiase, Ted DeMott, William Davis, Danny (III) Darsow, Barry Cornette, James E. Copeland, Adam (I) Constantino, Rico Connor, A.C. Cole, Michael (V) Coage, Allen Coachman, Jonathan Clemont, Pierre Clarke, Bryan Chavis, Chris Centopani, Paul Cena, John (I) Canterbury, Mark Candido, Chris Calaway, Mark Bundy, King Kong Buchanan, Barry (II) Brunzell, Jim Brisco, Gerald Bresciano, Adolph Bloom, Wayne Bloom, Matt (I) Blood, Richard Blanchard, Tully Blair, Brian (I) Blackman, Steve (I) Bischoff, Eric Bigelow, Scott ’Bam Bam’ Benoit, Chris (I) Batista, Dave Bass, Ron (II) Barnes, Roger (II) Backlund, Bob Austin, Steve (IV) Apollo, Phil Anoai, Solofatu Anoai, Sam Anoai, Rodney Anoai, Matt Anoai, Arthur Angle, Kurt AndrØ the Giant Anderson, Arn Albano, Lou Al-Kassi, Adnan Ahrndt, Jason Adams, Brian (VI) Young, Mae (I) Wright, Juanita Wilson, Torrie Vachon, Angelle Stratus, Trish Runnels, Terri Robin, Rockin’ Psaltis, Dawn Marie Moretti, Lisa Moore, Jacqueline (VI) Moore, Carlene (II) Mero, Rena McMichael, Debra McMahon, Stephanie Martin, Judy (II) Martel, Sherri Laurer, Joanie Keibler, Stacy Kai, Leilani Hulette, Elizabeth Guenard, Nidia Garc a, LiliÆn Ellison, Lillian Dumas, Amy Survivor Series Royal Rumble Table 2: (p, q : n1 , n2 ) for IMDB Size Freq Size Freq Size Freq Size Freq -------------------------------------------------------2 5512 20 19 38 4 59 2 3 1978 21 18 39 3 61 1 4 1639 22 15 40 2 64 1 5 968 23 9 42 2 67 1 6 666 24 13 43 3 70 1 7 394 25 12 45 3 73 1 8 257 26 6 46 4 76 1 9 209 27 6 47 5 82 1 10 148 28 5 48 1 86 1 11 118 29 6 49 2 106 1 12 87 30 3 50 2 122 1 13 55 31 6 51 1 135 1 14 62 32 5 52 2 144 1 15 46 33 3 53 1 163 1 16 39 34 1 54 2 269 1 17 27 35 5 55 1 301 1 18 28 36 4 57 1 332 2 19 29 37 7 58 1 673 1 -------------------------------------------------------- Kesten, Brad Brando, Kevin Robbins, Peter (I) Shea, Christopher (I) Altieri, Ann Ornstein, Geoffrey Hauer, Brent Charlie Brown and Snoopy Show Reilly, Earl ’Rocky’ Charlie Brown Celebration You Don’t Look 40, Charlie Brown He’s Your Dog, Charlie Brown Making of ’A Charlie Brown Christmas’ You’re In Love, Charlie Brown It’s the Great Pumpkin, Charlie Brown Charlie Brown’s All Stars! Life Is a Circus, Charlie Brown Charlie Brown Christmas Race for Your Life, Charlie Brown Be My Valentine, Charlie Brown Mendelson, Karen Stratford, Tracy Schoenberg, Jeremy It’s Magic, Charlie Brown Dryer, Sally Melendez, Bill You’re a Good Sport, Charlie Brown It’s a Mystery, Charlie Brown Boy Named Charlie Brown It’s an Adventure, Charlie Brown It’s Flashbeagle, Charlie Brown Play It Again, Charlie Brown Momberger, Hilary Is This Goodbye, Charlie Brown? Charlie Brown Thanksgiving There’s No Time for Love, Charlie Brown You’re Not Elected, Charlie Brown Snoopy Come Home It’s the Easter Beagle, Charlie Brown Shea, Stephen Figure 2: (247,2)-core ’WWF Smackdown!’ ’WWE Velocity’ ’Sunday Night Heat’ ’Raw Is War’ WWF Vengeance WWF Unforgiven WWF Rebellion WWF No Way Out WWF No Mercy WWF Judgment Day WWF Insurrextion WWF Backlash WWE Wrestlemania XX WWE Wrestlemania X-8 WWE Vengeance WWE Unforgiven WWE SmackDown! Vs. Raw WWE No Way Out WWE No Mercy WWE Judgment Day WWE Armageddon Wrestlemania X-Seven Wrestlemania X-8 Wrestlemania 2000 Survivor Series Summerslam Royal Rumble No Way Out King of the Ring Invasion Fully Loaded Taylor, Scott (IX) Van Dam, Rob Matthews, Darren (II) LoMonaco, Mark Hughes, Devon Huffman, Booker Heyman, Paul Hebner, Earl McMahon, Stephanie Keibler, Stacy Wight, Paul Simmons, Ron (I) Senerca, Pete Ross, Jim (III) Rock, The Reso, Jason McMahon, Vince McMahon, Shane Martin, Andrew (II) Levesque, Paul Michael Layfield, John Lawler, Jerry Jericho, Chris Jacobs, Glen Hardy, Matt Hardy, Jeff (I) Gunn, Billy (II) Guerrero, Eddie Copeland, Adam (I) Cole, Michael (V) Calaway, Mark Bloom, Matt (I) Benoit, Chris (I) Austin, Steve (IV) Anoai, Solofatu Angle, Kurt Stratus, Trish Dumas, Amy Figure 3: (27,22)-core The 3-ring weights were already available [8]. However, there are no 3-rings in the IMDB network. The densest substructures are complete bipartite subgraphs K p,q . They contain many 4-rings. This motivated us to design a method to find 4-rings weights. We implement it in Pajek . Figure 4: Charlie Brown To identify interesting substructures, we applied the simple islands procedure for the weight w4 . It takes around three minutes to compute w4 weights on a 1400 MHz, 1GB RAM computer, and 13 seconds to determine the islands. We obtained 12465 simple line islands on 56086 vertices. Here is their size distribution. There are 94 of size at least 30; and only 10 over 100. The largest island corresponds to wrestling. Each island represents a special topic. We visualized only some of them. For example, see Figures 4, 5, 6, 7 and 8. 3.3 Time slices and Island Analysis By extracting a time slice from the complete network, we can identify the main groups in selected time periods. Islands can identify important subgraphs of large networks based on the value of attributes [4]. To illustrate this, we extracted the time slice 1935-1950. There are 223 simple islands [4] for w4 on 1774 vertices. For example, we selected island 6 – ’Dona Macabra’; see Figure 9. 4 T EMPORAL C O -S TARRING N ETWORK : N ETWORK K EVIN -BACON We extracted a small important subset of the actors in the IMDB network and constructed from it a dynamic visualisation of a 1mode network showing the co-appearance of actors in films. To define a sufficiently small important subgraph, we first considered only nodes in the network with a Kevin Bacon number of 1. The Kevin Bacon number of an actor is a similar concept to the Sawak nus el lail Sergeant Madden Honky Tonk Soltan, Hoda Hoodlum Saint, The Roaring Twenties, The Malak el zalem, El Rostom, Hind Fatawa, El Unconquered El Dekn, Tewfik Union Pacific Phelps, Lee (I) Flavin, James Big City Tarik el saada Hub fil zalam Saum, Cliff Wells Fargo Star Is Born, A Sittat afarit, alFatat el mina Hareb min el ayyam Abu Hadid Elf laila wa laila Souk el selah Nashal, El Maktub alal guebin Fatawat el Husseinia Amir el antikam Abid el gassad Ghaltet ab Abu Dahab Aguazet seif Hamida Batal lil nehaya Namrud, El Ebn el-hetta Nassab, El Zoj el azeb, El Abid el mal Cass el azab Ghazal al-banat Rasif rakam khamsa Laab bil nar, El Iskanderija... lih? Imlak, El Matloub zawja fawran Sarhan, Shukry Port Said Riad, Hussein Dunn, Ralph Hamama, Faten San Quentin You Can’t Take It with You Hamdi, Imad Ard el ahlam Vogan, Emmett Chandler, Eddy Flowers, Bess Shawqi, Farid Baad al wedah Massiada, Al Asrar el naas Baba Amin Beyt al Taa Haked, El Osta Hassan, El Ibn al ajar Ana bint min? Murra kulshi, El Mohtal, El Zalamuni el habaieb Ashki limin? Ana zanbi eh? O’Connor, Frank (I) Whole Town’s Talking, The Sullivan, Charles (I) Nancy Drew... Reporter Dust Be My Destiny Meet John Doe Castle on the Hudson Holmes, Stuart Valley of the Giants Racket Busters Kid Galahad Go Getter, The They Made Me a Criminal Women in the Wind El-Meliguy, Mahmoud Abu Ahmad Mower, Jack Man Who Talked Too Much, The Naughty But Nice Yankee Doodle Dandy Kid From Kokomo, The King of the Underworld They Drive by Night Secret Service of the Air Figure 7: Shawqi, Farid and El-Meliguy, Mahmoud Bad Men of Missouri Adventures of Mark Twain, The Polizeiruf 110 - Henkersmahlzeit Polizeiruf 110 - Der Pferdem rder Polizeiruf 110 - Tote erben nicht Polizeiruf 110 - Kurschatten Polizeiruf 110 - Mordsfreunde Polizeiruf 110 - Der Spieler Polizeiruf 110 - Todsicher Polizeiruf 110 - Hei kalte Liebe Polizeiruf 110 - Jugendwahn Polizeiruf 110 - Doktorspiele Polizeiruf 110 - Angst um Tessa B low Polizeiruf 110 - Rosentod Polizeiruf 110 - Zerst rte Tr ume Starkes Team - Die Natter, Ein Starkes Team - Im Visier des M rders, Ein Starkes Team - Braunauge, Ein Starkes Team - Verraten und verkauft, Ein Starkes Team - Bankraub, Ein Starkes Team - Tr ume und L gen, Ein Starkes Team - Der sch ne Tod, Ein Starkes Team - Das gro e Schweigen, Ein Winkler, Wolfgang Starkes Team - Mordlust, Ein Starkes Team - Der Todfeind, Ein Starkes Team - Kleine Fische, gro e Fische, Ein Schwarz, Jaecki Starkes Team, Ein Starkes Team - Lug und Trug, Ein Starkes Team - Auge um Auge, Ein Horner, Mike Starkes Team - T dliche Rache, Ein Michaels, Sean Starkes Team - Kollege M rder, Ein Sanders, Alex (I) North, Peter (I) Starkes Team - Eins zu Eins, Ein Dough, Jon Starkes Team - Der letzte Kampf, Ein Voyeur, Vince Starkes Team - Kindertr ume, Ein Davis, Mark (V) Starkes Team - Der Mann, den ich hasse, Ein Boy, T.T. Starkes Team - Blutsbande, Ein Morgan, Jonathan (I) Polizeiruf 110 - Kopf in der Schlinge Polizeiruf 110 - Ein Bild von einem M rder Figure 5: Mower, Jack and Phelps, Lee Starkes Team - Das Bombenspiel, Ein Smashing the Money Ring Starkes Team - Sicherheitsstufe 1, Ein Knockout Lerche, Arnfried Bademsoy, Tayfun Lansink, Leonard Starkes Team - Der Verdacht, Ein Thomas, Paul (I) Starkes Team - Roter Schnee, Ein Savage, Herschel Starkes Team - Erbarmungslos, Ein Wallice, Marc ’Aff re Semmeling, Die’ Jeremy, Ron Maranow, Maja West, Randy (I) Martens, Florian Silvera, Joey Starkes Team - M rderisches Wiedersehen, Ein Drake, Steve (I) Byron, Tom Figure 6: Adult Figure 8: Polizeiruf 110 and Starkes Team Erdös number of a mathematician; it represents the length of the shortest path in the movie star collaboration network from the actor to Kevin Bacon. The data set was divided into time slices of a decade in length (e.g. 1920s, 1930s, etc.), and the set of actors reduced in each decade to only those who had co-starred in at least 5 films with another actor with a Kevin Bacon number of 1. The sizes of the graphs for each of these time slices are given in Table 3. The 1-mode co-starring networks of these reduced sets of actors were constructed for each decade, and a three-dimensional layout was generated for each using the Scale-free network layout [2]in GEOMI [1]. Nodes in the force-directed layout were restricted to lie on one of three concentric spheres, depending on the degree of the node [2]. The colouring of each node was also used to indicate the degree. The size of each node was dependant on the number of movies in which the corresponding actor starred in that particular decade. Similarly, the width of an edge was used to represent the number of co-appearances between two actors in a decade. To effectively illustrate the evolution of the co-starring network, we display smooth animations between the layouts of subsequent decades. The animations are broken into several parts shown one after the other in time, in order to aid retention of the mental map. First, nodes and edges not present in the first layout are faded out. Nodes present in both first and second layouts are then animated to their new positions in the second layout. Nodes new to the second layout burst out from the centre and come to rest in their calculated positions, and finally new edges are faded in to show the new collaborations in the second decade. The animation is downloadable from http://www.it.usyd.edu.au/∼dmerrick/gd05contest/gd05final.avi Gonzalez, Gibran Langlands, Rob Fernandez, Emiliano Janitors, The Cardona, Renan Arenas, Mathieu Cabello, Antonio Misterio del latigo negro, El Tesoro de Morgan, El Noriega, Leonardo J. Del Degan, Davide Calles, David Villate, Victor Triboulet Lupo und der Muezzin Blanco, Tomas (I) Tehtaan varjossa Trevino, Alejandro Aroza, Diego Gomez, Martha Buendia, Jorge Primo Baby Tierra y mar del noroeste Velasco, Gary Frank, Constanze Monja alferez, La Martinez, Pablo (V) de Anda, Rafael Frauscher, Richard Rueda, Enrique Lopez, Celso Tu Hau Segarra, Carol Silencio roto Rayo de luz, Un Obregon, Julia Roldan, Celia Hoy canto para ti Martin Fierro Zea, Kristi Todo un caballero Barreiro, Jose Perez, Jose A. (I) Parra, Aleksandr Camargos, Glaucia Busquets, Enrique D’Org, Olga Escobar, Valeria O’Farril, Alfredo Villarreal, Juan Antonio Lopez, Bruno Suenos atomicos Soler, Cote Sor Juana Inez de la cruz Marti, Adam Isla Isabel Deray, Sara Wimer, Homero Calvo, Ricardo Dona Macabra Madre padrona Delholm, Kirsten Morales, Lucy Figure 9: Dona Macabra KB1 Initial all decades, no filtering 1910s, ≥ 5 films 1920s, ≥ 5 films 1930s, ≥ 5 films 1940s, ≥ 5 films 1950s, ≥ 5 films 1960s, ≥ 5 films 1970s, ≥ 5 films 1980s, ≥ 5 films 1990s, ≥ 5 films 2000s, ≥ 5 films V 1324748 2742 16 4 25 17 19 16 79 59 207 124 Figure 10: The co-starring actors visualisation (1960s) E 3792390 336060 18 2 53 17 18 35 411 73 425 208 Table 3: Graph sizes per decade of co-starring network This process was continued for all decade slices from 1911 through to 2004, and the result can be seen in the downloadable animation. Figures 10, 11, 12, 13, 14 show snapshots of the animation from the 1960s through to the early 2000s. The visualisation revealed a number of interesting facts. One unexpected finding was the substantial number of actors with a Kevin Bacon number of 1 in the early years of the twentieth century, some of whom could clearly not have co-starred in a film with Kevin Bacon. This revealed some problems in the collection of the movie data set. The years of some movies had been recorded incorrectly, while edges to other movies that possessed the same name as a movie of a prior decade were all recorded as belonging to the earlier movie. In the 1960s (Figure 10), the visualisation shows a clique involving the US president John F. Kennedy. This is due to the assassination of Kennedy in 1963, and the subsequent barrage of documentaries that were produced detailing the event. The other actors in the clique (Jacqueline Kennedy, John and Nellie Connally, etc.) were all present at the assassination. They are present in this data set since the movie JFK, starring Kevin Bacon, included real archive footage of the assassination. The Kennedys continue through to later decades in the visualisation, illustrating the vast number of documentary films developed that were based on this event. The 1970s, shown in Figure 11, sees the first large connected group of Hollywood actors that continue as big names to this day. James Earl Jones, Robert Redford, Steve Martin and John Travolta all appear in this group. Figure 11: The co-starring actors visualisation (1970s) The visualisation of the 1980s (Figure 12) highlights some particularly close-knit groups of actors. Comedy stars Chevy Chase, Dan Akroyd and Bill Murray appear due to roles in Satuday Night Live, Caddy Shack and Spies Like Us. Also present are Jim Cummings, Jack Angel and Rob Paulson, who have quite high degrees due to their involvement as voice actors in many short cartoons and episodes. These groups continue into the 1990s, where the groups of actors become much larger and more highly connected (Figure 13). More well-established modern actors like Whoopi Goldberg, Tom Hanks and Dennis Hopper become particularly prominent in this decade. Finally, in the 2000s, we see some particularly interesting and unexpected phenomena (Figure 14). First, music stars such as Britney Spears, Beyoncé Knowles and Sheryl Crow appear with very high degree and connectedness, due to their participation in numerous music award shows. Secondly, on the other side of the visualisation, popular actor Arnold Schwarzenegger links politicians to the movie stars and musicians in the rest of the co-starring network. This was primarily due to Schwarzenegger’s entry into politics, in Figure 12: The co-starring actors visualisation (1980s) Figure 14: The co-starring actors visualisation (2000s) reduce visual complexity as follows. We define the “stars” from the IMDB as follows: • every star actor must have been in more than 12 movies over the whole time period • every star movie must have more than 12 actors • each star actor must have played in between three to six movies in each year Figure 13: The co-starring actors visualisation (1990s) becoming the governor of the US state of California. Following this event, he was in several political documentaries in which Bill Clinton also appeared. Bill Clinton, in turn, is linked through documentaries and archival footage to other famous politicians, such as Ronald Reagan, Richard Nixon and John F. Kennedy. A G ALAXY OF M OVIE S TARS OF T EMPORAL ACTOR M OVIE N ETWORK This section describes a galaxy of movie stars of the temporal actormovie network with animation (in order to see the overview), and a visualisation of the network of specific time slice (in order to see the details). First we consider a “galaxy of stars” metaphor of the movie-actor network. The main idea is to map the “movie stars” in a movie (i.e. animation) of a galaxy of stars which displays actor-movie interactions. Representing as much information as possible without introducing overwhelming visual complexity has always been a challenge when visualising large data sets. We define important subgraphs to 5 We again use a bipartite (2-mode) network model. There are two types of nodes: actor nodes and movie nodes. Actor nodes are displayed as stars in the night sky, and edges are displayed as faint lines joining up “constellations” of actors (See Figure 15). Edges with bends are displayed between actor and movie nodes; however, movie nodes are hidden; in this manner, collaboration between actors can easily be seen. In this case, the picture not only reduces the visual complexity (especially for edges), but also represents actormovie and actor-actor interactions at the same time. To produce an overview of the temporal network dynamics, we computed a layout for each year from 1907 to 2004 and produced an animation. A two-dimensional force-directed layout was generated for each year’s subgraph using GEOMI [1]. The animation is performed between each layout, in a similar manner to the animation of the co-starring authors network in the previous section. The animation is available from http://www.it.usyd.edu.au/∼dmerrick/gd05contest/gd05-final.avi Once we have an overview of the temporal network using an animation, we now focus on the details of the specific year of the network to observe some interesting patterns in specific time periods. Figure 16 shows part of the layout of year 1918. Those three actors co-starred in five movies together; on the other hand, they did not appear in any other movies. Only one of the movies includes actors from outside. This kind of pattern can be usually found in the early years. Figures 17 and 18 show a different pattern. They are both captured from the layout of year 1983. In Figure 17, nineteen actors co-starred in a masterpiece. In Figure 18, the same group of people starred in a series of movies together, whilst also appearing in other movies with actors from outside the group. Compared to the pattern of early years in Figure 16, one may gain some knowledge and insight about the trends of the movie industry from Figure 17. Figure 17: Many actors co-starring one movie. Figure 15: A frame from the galaxy of stars animation Figure 18: Same group of people in several movie. Figure 16: Actor collaboration pattern in early years. Further insights can be discovered when combining company attributes in visualisation, Figures 19 to 22 show. There are two clusters in 1985. To assist with analysis, we display the movie nodes with their labels. The two clusters are normal movies and adult movies. Figures 19 to 22 show some patterns in the evolution: before the 1990s, these two types of movies were clearly separated, meaning that they were produced by different companies with different actors. That is, two groups seldom collaborated. However, these two groups started to merge into one big group. The actors started to move around between different companies for collaboration. For example, see the year 1994. It is difficult to separate these two groups in the picture. This may be an indication of the possible change in the movie industry, as well as to the social network of actors. This visualisation can be a useful supplement to formal analysis methods. 6 C ONCLUSION Integration of good analysis methods with proper visualisation methods is an effective approach to gain an insight into large and complex networks. Our next step is to further integrate various analysis methods with visualisation on different data sets. A formal evaluation on the insights and knowledge derived then needs to be carried out. Ultimately, appropriate interaction methods need to be integrated in order to complete our visual analysis framework for large and complex networks. R EFERENCES [1] A. Ahmed, T. Dwyer, M. Forster, X. Fu, J. Ho, S. Hong, D. Koschützki, C. Murray, N. Nikolov, A. Tarassov, R. Taib and K. Xu, GEOMI: GEometry for Maximum Insight, Proc. of Graph Drawing 2006, pp. 468-479, 2006. [2] A. Ahmed, T. Dwyer, S. Hong, C. Murray, L. Song and Y. Wu, Visualisation and Analysis of Large and Complex Scale-free Networks, Proc. of EuroVis 2005, pp. 18, 2005. [3] D. Auber, Y. Chiricota, F. Jourdan and G. Melanon, Multiscale Visualization of Small World Networks, Proc. of InfoVis, pp. 75-81, 2003. [4] V. Batagelj, Analysis of large networks - Islands, Dagstuhl seminar 03361: Algorithmic Aspects of Large and Complex Networks, 2003. [5] U. Brandes and T. Erlebach, Network Analysis: methodological foundations, Springer, 2005. [6] U. Brandes, M. Hoefer and C. Pich, Affiliation Dynamics with an Application to Movie-Actor Biographies, Proc. of EuroVis 2006, pp. 179186, 2006. [7] Graph Drawing 2005 Competition, http://gd2005.org/ [8] Pajek, http://vlado.fmf.uni-lj.si/pub/networks/pajek/ [9] Sunbelt XXVI 2006 Viszard Sesseion. [10] S. Wasserman and K. Faust, Social Network Analysis: Methods and Applications, Cambridge University Press, 1994. Figure 19: Layout of 1985 Figure 21: Layout of 1991 Figure 20: Layout of 1988 Figure 22: Layout of 1994