Methodological approach to a massive destination
Transcription
Methodological approach to a massive destination
Methodological approach to a massive destination content analysis of travel blogs and reviews Dr. Estela Mariné‐Roig Dr. Salvador Anton Clavé 7th World Conference for Graduate Research in Tourism, Hospitality and Leisure Istanbul 3‐8 June 2014 Department of Geography Content 1 Aims 2 Travel blogs and reviews 3 Case Study Methodology Database Content analysis 4 Concluding remarks 1 Aims Propose a methodology to conduct a massive computerized quantitative content analysis of travel blogs and reviews concerning a specific destination. This should serve researchers to unveil multiple aspects of a destination’s online image as expressed by tourists. 2 Travel blogs and reviews The Internet has become the main channel for seeking and disseminating information, notably in travel and tourism Knowing what is said by tourists in the web 2.0 UGC becomes of major importance for destinations Within Travel 2.0, blogs and reviews of travel experiences have rapidly expanded and become popular Great amounts of information 2 Travel blogs and reviews 2 Travel blogs and reviews Travel blogs and reviews have great potential as rich and meaningful information sources for destinations: They are frequently up‐dated, ordered and classified geographically and chronologically They give insights into destination image and tourists’ perceptions “Future research needs to explore other frameworks that will be appropriate in maximizing the usefulness of travel blogs to the academe and the industry” (Pan et al., 2007) 3 Methodology – Case Study Catalonia First order world tourism destination. Second top tourist region in EU‐27. 2013: 15.6 million foreign tourists. Barcelona is among the top European tourist capitals. 9 regional tourist brands. 3 Methodology ‐ Database 1. Data source selection: Specialized Travel blog and review hosting websites Need to choose websites objectively in relation to the case study (Catalonia): Check former works, bibliographical sources, subject guides, blog search engines, and standard search and metasearch engines using keywords Use a selection criterion: More than 100 entries about the case study GetJealous.com, MyTripJournal.com, StaTravel.com, TravelBlog.org, TravelJournals.net, TravellersPoint.com, TravelPod.com, IgoUgo.com, TripAdvisor.com, TravBuddy.com, VirtualTourist.com 3 Methodology ‐ Database 2. Data collection and download: Most studies gather very small samples of blogs and reviews for study difficulties. Need to conduct massive quantitative analyses because of the great volume of information online. All relevant blogs and reviews about the case study should be downloaded through Web Copiers. Manual exploration to see web structure and locate html files relative to the case study More than 100,000 files retrieved in the case of Catalonia TP. http://www.travelpod.com/blogs/0/State/destination.html TB. http://www.TravelBlog.org/Europe/Spain/Catalonia/ 3 Methodology ‐ Database 2. Data collection and download: 3 Methodology ‐ Database 2. Data collection and download: Travel blog and review database Domain (acronym) GetJealous.com (GJ) IgoUgo.com (IO) MyTripJournal.com (MT) StaTravel.com (ST) TravBuddy.com (TY) TravelBlog.org (TB) TravelJournals.net (TJ) TravellersPoint.com (TS) TravelPod.com (TP) TripAdvisor.com (TA) VirtualTourist.com (VT) Barcelona 0 1,073 536 243 1,066 2,348 115 0 998 67,882 10,289 Other towns 0 71 72 12 80 280 4 0 481 34,519 2,192 Unclassified 1,164 0 0 0 0 106 0 596 0 43 285 Empty 371 (1) ‐ ‐ ‐ 11 (2) ‐ ‐ ‐ ‐ 112,698 (3) 515 (3) 1: "This site has now expired ..."; 2: "Sorry, X has not created any entries ..."; 3: The writing body is empty Travel blogs and reviews per tourism brands First entry 2001‐08‐27 2000‐06‐06 2001‐07‐25 2005‐05‐30 1985‐05‐20 1997‐03‐07 2002‐08‐01 1986‐05‐09 1984‐12‐27 2002‐10‐17 1999‐12‐08 3 Methodology ‐ Database 3. Data arrangement, cleaning and debugging: Arrangement: Structure of folders and files root\website\brand\town\entrydate_pagename[_ending].htm 3 Methodology ‐ Database 3. Data arrangement, cleaning and debugging: Cleaning: Online sources are full of “noise” (Carson, 2008) Character encoding problems Needless content: identified with WYSIWYG interface and erased using a mass removal utility Non‐significant words 3 Methodology ‐ Database 3. Data arrangement, cleaning and debugging: Before: 21KB After: 3KB Sample of removed HTML directives: <div id="header"> ... </div> <div class='blog_breadcrumbs'> ... </div> <div class='blognav'> ... </div> <div class='ads_leader'>...</div> <div id="footer"> ... </div> 3 Methodology ‐ Database 3. Data arrangement, cleaning and debugging: Debugging: Preliminary word frequency count and identification of misspelled keywords. Especially common in non‐English speaking destinations. Correct noun Barcelona Casa Batlló Antoni Gaudí Barri Gòtic Parc Güell Montjuïc !!! Misspellings Bathelona, Barcellona, Barthelonaaaa, Bar‐th‐elona, Bar‐tha‐lona, Bar‐the‐lona ... Batllo House; Casa Batillo, Batilló, Batlla, Batllao, Batllò, Bátllo, Batlo, Battllo, Battló ... Antonio Gaudi; Gaudì, Gaüdi, Gaudie, Gaudii, Goudi, Goudí, Guadi, Gualdi, Gudi ... Barri Gotico; Bari Gotic; Ghotic Barrio, District, Quarter; Gotic area, neighborhood ... Parc Guël, Güel, Guéll, Guelle; Park Gueil, Guel, Güelle, Guelli; Parque Guelle, Güelle ... Monjuic, Montjeuic, Montjic, Montjouïc, Montjuîc, Montjuich, Montjuiic, Montjuik ... More than 100 ways of misspelling “Sagrada Familia” 3 Methodology ‐ Database 4. Language detection, data mining and dissemination: Language detection: Before content analysis language of entries should be detected (naive Bayes classifier) Language of posts 3 Methodology ‐ Database 4. Language detection, data mining and dissemination: Data mining: Extraction of Blog titles Bloggers’ hometown Country of origin of bloggers and reviewers 3 Methodology ‐ Database 4. Language detection, data mining and dissemination: Dissemination: Visibility Indexed pages Presence in the social media Domain (acronym) GetJealous.com (GJ) IgoUgo.com (IO) MyTripJournal.com (MT) StaTravel.com (ST) TravBuddy.com (TY) TravelBlog.org (TB) TravelJournals.net (TJ) TravellersPoint.com (TS) TravelPod.com (TP) TripAdvisor.com (TA) VirtualTourist.com (VT) Usage Bing 20,700 221,000 12,100 9,430 83,700 256,000 119,000 175,000 667,000 8,260,000 2,100,000 Geographical distribution of users Link‐based ranks Google 544,000 1,470,000 270,000 33,900 194,000 888,000 1,400,000 594,000 9,260,000 85,500,000 6,870,000 Visit‐based ranks 3 Methodology – Content analysis Researchers are still trying to ascertain the ‘what’ and ‘how’ of analysing travel blogs (Banyai & Glover, 2011) Content analysis: most suitable technique to conduct massive analyses of blogs and reviews What makes this technique particularly rich and meaningful is its reliance on coding and categorizing of data (Stemler, 2001). 3 Methodology – Content analysis Receptacle Text Approach Quantitative Interpretation Thematic Categories of Analysis Geography: brand regions & region Attraction factors Feelings and dichotomies Cultural identity references Measuring system Frequency counts Software Site Content Analyzer Other software: java utility to process strings 3 Methodology – Content analysis 3 Methodology – Content analysis With this process data should be organized in two different ways to be able to implement different measures: Group or category Count Site‐Wide Density Average Weight Word_a1 ... ... ... Word_a2 ... ... ... Word_a3 ... ... ... Word_a4 ... ... ... GROUP A ... ... ... Word_b1 ... ... ... Word_b2 ... ... ... GROUP B ... ... ... Word groups of categories, with reference to the total database Matrix with content categories file per file CATEGORY 1 CATEGORY 2 CATEGORY 3 CATEGORY 4 T‐BLOG 1 XXX XXX XXX XXX T‐BLOG 2 XXX XXX XXX XXX T‐BLOG 3 XXX XXX XXX XXX T‐BLOG 4 XXX XXX XXX XXX … … … … … 3 Methodology – Content analysis Examples of measures implemented to this database: Most frequent words Study of outstanding elements Descriptive statistics and P‐correlation Cluster analysis Spatial indexes 4 Concluding remarks Objective method to select the most relevant data sources for the case study and establishment of a selection criterion according to research goals. It includes the analysis of websites’ image dissemination to assess the capacity the targeted information sources have to disseminate the information they convey. Massive analysis of data: All travel blogs and reviews about our case study on the websites fulfilling the criterion. key: Creation of a database Download web pages and entries to the PC, arrange them into a structure of folders and files. Data cleaning and debugging, language detection, data mining. Quantitative content analysis performed on online texts, based on word counts or frequencies and word grouping into categories, proved to be a useful and appropriate method of analysis to shed light on the projected and perceived images of a destination. Computerized content analysis through Site Content Analyzer and other software are suitable to deal with quantitative data and large sets of analysis. Category system enables to look deeper into certain complex aspects, such as cultural identity and the spatial distribution of tourist image. Methodological framework could be used for other studies whose target were different destinations and different types of online media and tourist websites. It contributes to the preparation of data and sistematization of procedures 7th World Conference for Graduate Research in Tourism, Hospitality and Leisure Istanbul 3‐8 June 2014 Thanks for your attention! Estela Marine‐Roig (estela.marine@urv.cat) Salvador Anton Clavé (salvador.anton@urv.cat) www.globaltur.org Acknowledgement: The research that this paper is based on was financed by the Spanish Ministry of Science and Innovation (CSO2011‐23004/GEOG). Department of Geography