Maple – a Web Map Service for Verbal Visualisation Using Tag
Transcription
Maple – a Web Map Service for Verbal Visualisation Using Tag
Maple – a Web Map Service for Verbal Visualisation Using Tag Clouds Generated from Map Feature Frequencies Stefan Hahmann and Dirk Burghardt Dresden University of Technology, Institute for Cartography, Helmholtzstraße 10, 01069 Dresen, Germany Email: Stefan.Hahmann@tu-dresden.de, Dirk.Burghardt@tu-dresden.de Abstract Tag cloud visualisation has been introduced in the seventies. As it is eye-catching and engaging, it is used in many web applications and has become a very popular visualisation technique today. This paper presents an approach that uses this technique in combination with maps. Our method augments cartographic representations with additional verbal content, which is one of the strongest instruments available to cartographers to communicate spatial information. The idea is that only few words extracted from the semantics contained in the features of the underlying map are suitable to characterise the map section as a whole. To demonstrate the approach we used the OpenStreetMap dataset. In order to allow a variety of web map clients to use the results of the method, we realised the prototype by implementing it as a Web Map Service (WMS) based on the according Open Geospatial Consortium (OGC) specification. Keywords: tag cloud, word cloud, web mapping, geovisualisation, knowledge representation, exploration, semantics, Web Map Service, OpenStreetMap 1 Introduction: History of Tag Clouds and Related Work More than 30 years ago, Milgram and Jodelet (1976) introduced the technique of drawing words at different font sizes to present a visual overview of a text corpus that emphasizes certain words. The results of their work on a “collective mental map of Paris” are shown in Figure 1. The method they used may be seen as the origin of “tag cloud visualisation”. 2 Fig. 1 Introducing tag clouds to cartography: Milgram and Jodelet’s “collective mental map of Paris” (Milgram and Jodelet 1976) Fig. 2 Web application World Explorer:1 Landmarks automatically generated from clustered Flickr photo locations and their assigned tags. Bigger labels indicate bigger cluster sizes In the age of computer visualisation, this technique has become very popular, especially with the rise of Web 2.0. A general overview of different layout algorithms for the generation of tag clouds is given by Viégas and Wattenberg (2008). Several researchers have recently published on tag cloud visualisations for cartographic contexts. The approach of Ahern et al. (2007) exploits the public Flickr2 photo collection. Tagged images with assigned GPS coordinates are used to compute clusters of popular tags, i.e. regions with a disproportionately high density of photos and the dominant tag of each such region are calculated. The tag label of each cluster is placed centred at the centroids position of a cluster and the label size is scaled according to the size of the cluster. Figure 2 shows results of the application World Explorer, which implements this approach. A cartographic base map is overlaid with the computed labels, which results in a map of what we would call popular landmarks. We have chosen this name because the labels are based on places most 1 2 http://tagmaps.research.yahoo.com/worldexplorer.php. http://www.flickr.com. 3 Fig. 3 Visualisation of geospatial and nongeospatial context information using tag cloud visualisations generated from geo-referenced German Wikipedia articles, modified after Paelke et al. (2010) Fig. 4 The Taggram method – a layout algorithm that adapts the shape of a tag cloud to an arbitrary geometric region (Nguyen and Schumann 2010) frequently photographed by the users of the Flickr photo platform. The approach may be used on multiple scales. Results are similar to those of Milgram’s collective mental map with a bias towards tourist attractions. Paelke et al. (2010) use content of geo-referenced Wikipedia articles to represent context information on maps. They compute tag cloud visualisations from articles that can be located within a specified map section via the coordinates given in the article. Figure 3 shows a result of this work. The benefit of this approach is its potential to show geospatial as well as non-geospatial context information. It can be seen, for example, that the terms “Friedhof” (German for “cemetery”) and “Weltkrieg” (German for “World War”) appear in the same tag cloud and will thus be associated with each other by the user of the application. Nguyen and Schumann (2010) present a layout algorithm for tag clouds that adapts the shape of the cloud to an arbitrary geometric region. Figure 4 shows a result of this so-called taggram method. 2 The Approach: Using OGC WMS standard for On-Map Word Cloud Visualisation The objective of the approach we present in this paper is to verbally visualise the main semantic information that is contained within a map. For this purpose we use the word cloud technique as an information analysis method in combination with maps. This method makes use of adding verbal content to the map, as this is one of the most powerful communicational resources available to cartographers. 4 Fig. 5 Section of the OpenStreetMap database schema: Node, Way and Relation represent geometric entities that can be annotated with zero to n Tag entities specifying their semantics Fig. 6 A Flowchart of the process that generates the word cloud layer Fig. 7 A word cloud highlighting frequently used terms within the titles of the presentations held at ICC 2009 5 In the introductory section we have already described methods for the mapping of landmarks and context using tag visualisation techniques. In this section we shift the focus towards visualisation of semantic information. The approach is to analyse the frequency of the semantics of the map features contained in a map. These frequencies are shown on a map using word clouds. The idea is that most frequent semantic information within a specified area is well suited to characterise the semantics of a cartographic dataset within this area as a whole. To what extent the cartographic model is describing the real world the approach can verbally visualise the characteristics of the real world within this area. For the demonstration of our approach we use the OpenStreetMap (OSM) dataset. Figure 5 shows the section of the OSM database schema that models geometric objects and their semantics. In fact the OSM tags are the semantics of the dataset. OSM tags consist of a key-value pair and specify map features. They are linked to geometric objects. Each geometric object can be associated with a multitude of tags that specify its meaning. Tags can have references to points (table node), lines and polygons (table way) or complex objects (table relation). Figure 6 shows a flowchart of the implementation of the application that overlays an OSM base map with a word cloud processed for the current map extent. OpenLayers serves as a WMS client and the OSM Mapnik layer as a base map. The map client requests the server by sending a getMap request that conforms to the Open Geospatial Consortium Web Map Service specification (Open GIS Consortium 2001). The server queries a mirrored OSM database, which results in a list of tags and tag frequencies within the bounding box of the getMap request. In the case of point objects, tag frequency increases with every occurrence of this tag in conjunction with a point object. In the case of line and polygon objects a tag frequency, rises with the number of vertices of line and polygon objects that are associated with this tag. OpenLayers, which we used as a WMS client, allows switching between different layers. In the communication with the WMS server, switching between different layers is realized by the layer parameter of the getMap request, which allows the client to specify which keys and/or values will be presented by the overlaid word cloud. As some keys such as “created_by” and “address” occur with disproportionately high frequency, we added the possibility to delete certain tags from the tag frequency list. The filtered list is then processed by the word cloud layout software. The produced image is sent back to the client as the getMap response. The algorithm which is used to layout the word clouds was described by Viégas et al. (2009). An implementation is available via an executable version3 of the engine that drives the popular word cloud visualisation website wordle. 4 Figure 7 shows the result of a word cloud visualisation of the titles of the presentations held at ICC 2009. Maple – the name of the application – is an adaption of the name wordle to the context of cartographic maps. 3 4 Available at: http://www.alphaworks.ibm.com/tech/wordcloud/download. Wordle – Beautiful Word Clouds, available at: http://www.wordle.net. 6 3 Resulting Maps and Discussion Figures 8, 9, 10 and 11 show the results of the implementation. Figure 8 presents a map and an overlaid word cloud, which is computed for the area of the “Neustadt” district of the German city of Dresden. It shows the frequency of occurrences of tag values associated with map features having the key “amenity”. The fact that Dresden Neustadt is a nightlife district is quickly deducible even for a map user who does not know the area, because “pub” and “restaurant” show up in a big font size. Frequencies of values of map features with the key “highway”, which includes all types of streets and footways, are shown in Figure 9 for an area in southern Dresden. As the tags “footway” and “residential” appear very large, it is obvious, that this OSM section provides very much detail even on public footpaths as it represents a mostly residential area. A word cloud processed from the frequencies of the keys within the centre of the German city of Leipzig is contained in Figure 10. It illustrates why the OpenStreetMap project really deserves StreetMap being a part of its name. The tags “highway” and “name” that normally co-occur on street features are the most prominent keys within this cloud. It has to be mentioned that the keys “created_by” and “address” have been filtered. The tag “railway” is also big as Leipzig’s main station is within this map extent. Figure 11 shows a map overlaid with the names of the OSM editors that have been used within this area. It is remarkable that there are considerable differences between different regions as different local OSM mapping communities seem to prefer different OSM editors to edit data. These results may turn out to be interesting for the mapping community especially for the development and documentation of OSM edit tools. The tag cloud visualisation method allows the analysis of both object types and object values. Figure 10 is an example for object type visualisation and Figures 8, 9 and 11 are examples for object value visualisation. Table 1 shows typical computing times for word cloud processing at different scales within an area of high density of OSM objects. Test environment was a machine equipped with an AMD Dual Core Opteron 2.6 GHz processor and 1 GB RAM. Times for processing of the word cloud from the tag frequency list are nearly scale-independent, whereas times for database queries increase exponentially with decreasing scale. For the prototype, a copy of the part of the OSM database that covers the area of Germany was used. This includes currently just 5% of all data of the database. However there are still about 40 million entities in table ‘nodes’, 5 million entities in table ‘ways’ and 80.000 entities in table ‘relation’ that need to be queried. Additionally, there are about 8 million entities in table ‘node_tags’, 14 million entities in table ‘way_tags’, and 300.000 entities in table ‘relation_tags’, which need to be analysed for every word cloud request. Word clouds are intuitively perceptible and by their nature do not suffer from the labelling problem of bar charts, tree maps or bubble charts. Furthermore they 7 Fig. 8. OSM Mapnik base map and an overlaid word cloud computed from the values of the key “amenity” in the area of Dresden Neustadt Fig. 9. OSM Mapnik base map and an overlaid word cloud computed from the values of the key “highway” in the area of southern Dresden Fig. 10. OSM Mapnik base map and an overlaid word cloud computed from the OSM keys in the centre of the city of Leipzig. Keys “created_by” and “address” are filtered Fig. 11. OSM Mapnik base map and an overlaid word cloud computed from the values of the key “created_by”, which indicates the tool that was used to edit data, in the centre of Leipzig 8 Table 1. Computational time for word cloud processing at different scales, test environment: AMD Dual Core Opteron 2.6 GHz, 1 GB RAM Scale 1:36000 Area (km²) Overall computing time Database query time (sec) (sec) Cloud processing time (sec) 20 103 95 8 1:18000 5 37 29 8 1:9000 1.25 10 3 7 1:4500 0.32 7 0 7 1:2250 0.08 7 0 7 are able to present the gist of a word corpus. Cons are that long words as well as words with many ascenders and descenders get undue attention and that it is not possible to read exact values. The layout algorithm of a word cloud is more sophisticated than the layout algorithm of a tag cloud as it uses the typographical whitespace more efficiently. The big advantage of using a standardised WMS implementation is that a multitude of existing WMS clients can directly integrate the results. Our implementation even allows making use of the getMap transparency parameter and hence an overlay that does not completely hide the examined map is possible. A disadvantage of the word cloud visualisation displayed directly on a map is that map readers are used to associate text shown on a map with the directly underlying situation. In the case of an overlaid word cloud that describes the whole map section, this can lead to misinterpretation. Hence, this method may be more useful when displaying just one district, whereas for the case of displaying a whole town, it might not provide significant insights. It would be possible to solve this issue if the word cloud would not be directly overlaid on the map but would be shown in a separate space of the application. Using the tag frequency in relation to vertices within a map area underestimates the relevance of large objects having few vertices and overestimates the relevance of small objects having many vertices. This affects lines as well as polygons which consume much map space with few vertices and accordingly lines and polygons that consume little map space with many vertices. This bias is relevant for the cases where the overlaid word clouds are intended to visualise the main semantic information of a map section like in Figures 8 and 9. Instead of using tag frequencies, the estimation of relevant tags within the word cloud visualisation can be improved if length of lines and areas of polygons associated with specific tags are used as a weight. For use cases where we only want to present statistics of a dataset like in Figures 10 and 11 the vertex related frequency estimation is sufficient. 9 4 Conclusions and Future Work We have presented a method that is able to present the main semantic information included within a certain map section using a word cloud visualisation technique that visualises map feature frequencies on a map. Up to a certain degree this technique is able to verbally characterise the real world environment presented in a specific map section. Our demonstration is based on the OSM dataset but the approach is also applicable to other cartographic databases, e.g. data provided by national mapping agencies. Even non-primarily cartographic datasets such as Twitter could be analysed. Future work needs to address query performance, especially for the scenario of a huge global OSM dataset and queries on mid- and small-scale map extents. Furthermore, verbal descriptions of the meaning of the OSM tags could be taken from the OSM wiki website to produce more vernacular word clouds. Field (2010) stresses shortcomings in the algorithm of word cloud creation and recommends the use of a more sophisticated layout method. He argues that relative position and different colours of single words in the word cloud could be used to group words according to certain attribute dimensions inherent in the data. Last but not least, an empirical study needs to be carried out to prove whether map users are able to interpret word clouds overlaid on maps and benefit from this additional information. A study conducted by Lohmann et al. (2009), which compares different tag cloud layouts with a focus on human task-related performance, could serve as a starting point. References Ahern S, Naaman M, Nair R, Yang J (2007) World Explorer: Visualizing Aggregate Data from Unstructured Text in Geo-Referenced Collections. 7th ACM/IEEE-CS Joint Conference on Digital Libraries, Vancouver. ACM Press, New York. pp. 1–10 Dinh-Quyen N, Schumann H (2010) Taggram: Exploring Geo-Data on Maps through a Tag Cloud-based Visualization. Information Visualisation (IV), 2010 14th International Conference, London. IEEE Computer Society, Los Alamitos. pp. 322–328 Field K (2010) Cartographically Wordy but not Necessarily Worthy. The Cartographic Journal 47(3): 195–197 Lohmann S, Ziegler J, Tetzlaff L (2009) Comparison of Tag Cloud Layouts: Task-Related Performance and Visual Exploration. Gross T et al. (eds.): Human-Computer Interaction – INTERACT 2009, LNCS 5726, Springer, Berlin / Heidelberg. pp. 392–404 Milgram S, Jodelet D (1976) Psychological maps of Paris. Proshansky HM, Ittelson WH, Rivlin LG (eds.): Environmental psychology, Second Edition, Holt, Rinehart and Winston, New York. pp. 104-124 Open GIS Consortium (2001) Web Map Service Implementation Specification (Tech. Rep. OGC 01-068r2), Wayland, MA, USA Paelke V, Dahinden T, Eggert D, Mondzech J (2010) Location Based Context Awareness Through Tag-Cloud Visualizations. Joint International Conference on Theory, Data Handling and Modelling in GeoSpatial Information Science, Hong Kong. pp. 290–295 10 Viégas FB, Wattenberg M (2008) Tag Clouds and the Case for Vernacular Visualization. Interactions: 15(4): 49–52 Viégas FB, Wattenberg M, Feinberg J (2009) Participatory Visualization with Wordle. IEEE Transactions on Visualization and Computer Graphics 15(6): 1137–1144 11 Copyright of Advances in Cartography and GIScience. Volume 1 is the property of SpringerVerlag and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder’s express written permission. However, user may print, download, or email articles for individual use.