The Data Visualizers Using data visualizations to uncover the true
Transcription
The Data Visualizers Using data visualizations to uncover the true
MaRS Market Insights The Data Visualizers Using data visualizations to uncover the true meaning behind a data set MaRS is a member of Content Lead and Market Analyst: Neha Khera, MaRS Market Intelligence Partner & Advisor: Acknowledgements: We thank the following individuals and organizations for their participation in this report: Dr. Kamran Khan, CEO and Founder, Bio.Diaspora Nick Edouard, EVP Business Development & Marketing, BuzzData Nadia Amoroso, CEO and Co-Founder, DataAppeal Haim Sechter, COO, DataAppeal Niall Wallace, CEO and Founder, Infonaut Lisa Zhang, Co-Founder, Polychart Faizal Karmali, Director and Co-Founder, Quinzee Sam Molyneux, CEO and Co-Founder, Sciencescape Eugene Woo, Founder, Venngage Disclaimer: The information provided in this report is presented in summary form, is general in nature, current only as of the date of publication and is provided for informational purposes only. Specific advice should be sought from a qualified legal or other appropriate professional. MaRS Discovery District, © October 2012 Table of Contents Data Visualization Market / 4 Data, data and more data. What’s all the hype about? / 5 1: Amount of data created daily / 5 What is enabling the big data hype / 6 Extracting value from big data / 6 FIGURE 2: The Digital Intelligence Architecture / 6 The rise of data visualization / 7 FIGURE 4: Search and news reference volume for the word “infographic” on Google. / 8 FIGURE Data visualization tools / 9 The challenge with data visualizations / 9 Investment in the data visualization space / 9 Noteworthy applications of data visualization / 11 Understanding census data / 11 5: GTA Population Change by Municipality 1996-2001 / 11 FIGURE 6: GTA Population Change by Municipality 2006-2011 / 12 Tracking disease / 13 FIGURE 7: John Snow’s cholera map / 13 FIGURE 8: Map reflecting Sault Ste. Marie mosquito trapping efforts / 14 Improving healthcare / 14 FIGURE 9: Hospital 30-day overall readmission rates by Ontario region, 2009-2010 / 14 Supporting decision-making / 115 FIGURE 10: Edward Tuft’s figure on the 1986 Challenger Space Shuttle launch decision / 15 Driving transparency / 16 FIGURE 11: Energy being supplied by renewable sources for US residents / 16 FIGURE Groups supporting data visualization / 17 Looking ahead / 18 References / 19 Appendix A: The Data Visualizers / 20 Bio.Diaspora / 20 BuzzData / 22 DataAppeal / 24 Infonaut / 26 Polychart / 28 Quinzee / 30 Venngage / 32 i Data Visualization Market Data visualization is not a new concept. It has been used for centuries to distill and communicate information. Think about all the maps, graphs and charts in existence, and the popularity of this form of data analysis will quickly become clear. However, with advancements in technology, data visualizations are taking on more complex forms than ever before. They are being used to unravel the meaning behind big data sets that would otherwise be too difficult to understand. Highlighted in this piece are eight Ontario-based startups whose innovative applications are setting the future for data visualization. 04 Data, data and more data. What’s all the hype about? To understand the importance of data visualization, let’s take a step back and look at the impact of data in today’s modern economy. It has been said that we are living through the Industrial Revolution of data: an era where so much data is being produced on a daily basis by people and machines that we no longer have the capacity to store it all. From the billions of mobile phones to the trillions of RFID sensors, we live in a world where our every action and reaction is being captured and stored. And while it may seem eerily intrusive, the capturing of data has the potential to drastically improve the world in which we live. This is the rise of what’s known as “big data.” The term “big data” was coined to describe data sets with a size and complexity beyond the ability of typical database software tools to capture, store, manage and analyze them.1 This definition is intentionally subjective and is not meant to limit “big” data sets to a certain number of terabytes.1 Just how big a phenomenon big data actually is was eloquently captured in a remark by Google’s Eric Schmidt. He pointed out that we are creating as much information every two days as we did from the dawn of civilization up until 2003. On a daily basis, this translates into around 2.5 exabytes of data.2 FIGURE 1: Amount of data created daily With each coming year, the vastness of data generated will only intensify. For example, the Square Kilometer Array (SKA) Telescope — the world’s largest telescope — is projected to generate in excess of one exabyte of data per day when it goes live in 2024.3 This is roughly twice the amount of data that’s generated everyday on the World Wide Web.3 IBM is working feverishly to develop a supercomputer powerful enough to handle this amount of information. Big data can and will impact every nation, industry, company and individual around the globe, whether it’s in terms of understanding our galaxy, optimizing healthcare, selecting an ideal retail location or finding the 05 perfect date. A study by McKinsey Global Institute estimates that big data can add $300 billion worth of value to the US healthcare system and can increase retailers’ operating margins by as much as 60%.1 There is no doubt that those who collect, analyze and act on their data successfully will gain a competitive advantage in their market. What is enabling the big data hype The rise of big data springs from two main factors: 1. The increased generation of information. 2. The ability to store this information. Both of these factors are tied to advancements in technology. Social media applications have generated huge amounts of sentiment online, where the beliefs, activities and interests of billions of people are being captured in a way like never before. Mobile devices are used by over six billion people today, of which nearly five billion are in developing countries.4 These devices are capturing data in regions where information was previously difficult to extract. And through the rise of networked sensor technologies such as RFID (radio-frequency identification) tags, more than 30 million articles are being tracked across the transportation, industrial and retail sectors.1 And as Moore’s law continues to prevail, we now also have the ability to store all this data that’s being generated. And storage of vast amounts of data is financially accessible to many. Today, the entire world’s music can be stored on a device that costs less than $600.1 Up until the turn of this century, storing an average music playlist of 7,000 songs would have cost $500 alone. Extracting value from big data The creation and capture of data by itself does not, obviously, benefit anyone — only when analysis is added to the mix is the value of big data unlocked. Unfortunately, this is also an area where significant challenges exist. Big data analysis remains a market in its infancy. As Google’s Chief Economist Hal Varian put it, “Data are widely available; what is scarce is the ability to extract from them.”5 FIGURE 2: The Digital Intelligence Architecture6 06 Big data analysis is often hindered by the sheer cost involved in purchasing tools that can process large volumes of information. Another impediment is not being able to process information quickly enough to extract insights in real-time. Waiting two days or two weeks for reports is becoming unacceptable given the fast pace of digital interactions. What is likely the biggest obstacle is the lack of talent and expertise in the data science field. The McKinsey Global Institute gauges that by 2018, more than half of all big data jobs, nearly 200,000 of them, will go unfilled because skilled candidates will be in short supply.7 However, as we turn our attention to the field of data visualization — one form of data analysis — we start to see many of these roadblocks disappear. The power of data visualizations lies in their ability to transform the most complex of data sets into a rendering that even novice users can interpret. And through technology innovation, data visualization tools have become increasingly easier to adopt, with intuitive user-interfaces and cloud-based access. The rise of data visualization At its core, data visualization is the use of abstract, non-representational pictures to show numbers.8 It can include points, lines, symbols, words, shading and colour.8 Data visualizations make it easier to spot trends and patterns amid large amounts of information. They also make it possible for data to tell a story. Just as experts in the field of communication propose the use of stories to better convey information verbally, the same holds true when conveying information through data. And one of the best ways to tell a data story is to use a compelling visual. As industry-renowned data visualization expert Edward Tufte once said about the traditional rows and columns of data tables, “The world is complex, dynamic, multidimensional; the paper is static, flat. How are we to represent the rich visual world of experience and measurement on mere flatland?”9 Data illustration techniques have been in use since as early as 6200 BC, when the oldest known map was drawn. However, it was not until the eighteenth century when data visualizations went beyond mapping and more abstract measures were introduced, including the ever-popular pie and bar charts. The nineteenth century saw the creation of what many have argued to be the world’s best data visualization: Charles Joseph Minard’s 1869 visualization titled Napoleon’s March, which depicts the movement and losses of Napoleon’s army as it invaded Russia in 1812. After 1975, we witnessed the most rapid advancements in data visualization, which stemmed from the development of software and computer systems. Data visualizations moved beyond pie and bar charts, and more complex formats began to appear and aid us in processing information. For example, through the use of mind maps, our thought patterns can now be visually organized. Apps like Flipboard and Newsmap have completely reinvented the display of news, while tag clouds have provided another way to discover and search for information. And through network graphs, we can now uncover the connectivity between any number of entities, be they our own social circles, groups of companies or globally dispersed cities. Moreover, visualizations no longer adhere to a static format: they can be interactive in nature. This allows a user to drill down on certain data points, or manipulate and change views of the information to reach deeper insights. Infographics are another popular visualization form. Their growth since 2009 came with the rise of content marketing, which involves the creation and sharing of content in order to engage with customers.10 Brands and advertisers frequently use infographics as a form of content, as they provide both interesting insights and visual appeal, and are easy for users to share on the web. 07 FIGURE 4: Search and news reference volume for the word “infographic” on Google. 08 Data visualization tools Until about 2007, Microsoft Excel was the de facto standard for developing visualizations, whether they were pivot tables or simple graphs. When analyzing larger data sets or looking for more complex visualizations, knowledge workers would often have to tap into their company’s own business intelligence (BI) units to access highly skilled data scientists and analysts. Since 2007, however, a new breed of visualization tools has emerged which is characterized by simplicity and ease of use. These tools enable non-technical workers to bypass their BI units and model data themselves. This is the rise of what Gartner touts as “data visualization applications,” an industry Gartner predicts will reach $1 billion as early as 2013.11 Tableau Software is one of the fastest-growing data visualization applications on the market today and is in use by over 9,000 organizations around the globe. Tableau’s success is a testament to the rise of the data visualization market, which research firms Gartner and IDC predict won’t slow down any time soon. The challenge with data visualizations With the advent of these innovative tools, the ability to create a visualization of a data set is no longer difficult. What remains difficult, however, is the creation of a good visualization. If we break down the field of data illustration, we see that it is essentially the coming together of two contrasting fields of study: art and science. It requires the harmonious work of both the left and right brain, where the most complex of data sets can be gathered and refined and then organized in a simple yet compelling way. Finding this type of expertise is not an easy feat — unless, of course, you’re a Google. Google’s “Big Picture” data visualization group is led by Martin Wattenberg, and a quick look at his resume makes you realize he is among a special breed of people. How many people do you know with both a doctorate in mathematics and an exhibition at New York’s MoMA? Due to the difficulty in finding the right talent and expertise, data visualizations often end up being too complex to interpret, or they distort the information by focusing on the visual and not the meaning of the data itself. As Tufte explains, “excellent visualizations are those that give the viewer the greatest number of ideas in the shortest amount of time, with the least ink and in the smallest space.”8 In essence, data illustration is about simplifying the complex as much as possible. Investment in the data visualization space 2011 was a banner year for companies in the field of big data, with an estimated $2.47 billion invested by venture capital firms globally.12 This was a 38% increase from the amount invested in 2010.12 The following chart depicts some of the top data visualization companies and their respective funding to date. 09 Excluded from this chart is the analytics application Spotfire, which was acquired for $195 million by TIBCO software.13 Prior to its acquisition in 2007, Spotfire raised nearly $40 million over the course of ten years.13 Qlik Technologies is another notable software product with powerful visualization techniques. The company went public in July 2010 at a valuation of nearly $900 million.13 Prior to its IPO, the company raised over $80 million over a ten-year period.13 10 Noteworthy applications of data visualization Understanding census data For over a century, visualizations have been used by governments to better understand census data and decide, for instance, how representation should be apportioned and federal dollars distributed.14 A recent example (below) shows Statistics Canada maps depicting population changes in the Greater Toronto Area from 1996 to 2011. They reflect how population growth is slowing in Toronto and Mississauga and rising in areas north of these cities. FIGURE 5: GTA Population Change by Municipality 1996-200115 11 FIGURE 6: GTA Population Change by Municipality 2006-201116 OpenFile, a Toronto-based startup, has used 2011 Canada census data to build their CensusFile application. Through the use of data maps, this application allows anyone to mine the census data and gain insights about their neighbourhood. 12 Tracking disease One of the most cited examples of a data visualization success story was John Snow’s cholera map. During an 1854 outbreak of cholera in London, England, Snow used a spot map to illustrate how outbreaks of cholera were centered around the city’s water pumps. This depiction helped prove that cholera was being spread through water and not by air, as was thought at the time.17 FIGURE 7: John Snow’s cholera map17 In 2006, the city of Sault Ste. Marie in Ontario was able to eliminate what could have been a potentially serious threat related to the West Nile Virus. The Sault Ste. Marie Innovation Centre had done a systematic job of enabling the sharing of data sets between various municipalities within the city. The data sets were then being merged using data maps to uncover new insights. Through this activity, the Centre happened to learn about an unusually large collection of mosquitoes within the city’s underground transformer vaults. Due to an absence of draining structures, the vaults had unknowingly become the perfect breeding ground for mosquitoes. Were it not for the use of data visualization, this threat of West Nile Virus would not have been discovered and mitigated. 13 FIGURE 8: Map reflecting Sault Ste. Marie mosquito trapping efforts18 Improving healthcare The Canadian Institute for Health Information (CIHI) has developed the Canadian Hospital Reporting Project (CHRP), which is focused on improving the quality of healthcare across the nation. Visualizations are being used to increase understanding of mortality rates, readmission rates, costs of hospital stays and other health indicators. The project’s goal is to provide data insights to key decision- and policy-makers, so improvements can be made and hospitals can collaborate to achieve efficiencies. FIGURE 9: Hospital 30-day overall readmission rates by Ontario region, 2009-201019 14 Supporting decision-making The 1986 destruction of the Space Shuttle Challenger, which was due to a damaged O-ring seal, has been attributed in part to a failure of data analysis. Decision-makers at the US space agency, NASA, were uncertain about whether to launch the space shuttle in below-freezing temperatures, and relied on poorly presented data and short bullet points in making their decision. As data visualization expert Edward Tufte later pointed out, this disaster could have been avoided had the data been more clearly conveyed through the use of a graphic. The sample graphic Tufte later developed makes obvious the risk of O-ring damage in extreme cold temperatures. FIGURE 10: Edward Tuft’s figure on the 1986 Challenger Space Shuttle launch decision20 Today, NASA is heavily involved in the development of visualizations that explain NASA missions and scientific results. 15 Driving transparency General Electric (GE) is one of the many companies developing extraordinary visualizations, based on the petabytes of data collected through their various technologies. GE is hoping the visualizations will help not only simplify the complex nature of their work, but also drive insights and discoveries that might otherwise be difficult to achieve. For example, GE has developed an interactive visualization to help US residents understand how much of their energy is being supplied by renewable sources. FIGURE 11: Energy being supplied by renewable sources for US residents21 IBM is also experimenting with data visualization and has developed an application called Many Eyes that invites anyone to upload a data set or to visualize an existing one. 16 Groups supporting data visualization Discovery Exhibition is a US-based organization that profiles “visualization impact stories.” Highlights in 2011 included visualizations that helped reveal the mortality rate of African infants, understand traffic patterns in Beijing and optimize car engine injection systems. Information is Beautiful is another US organization focused on celebrating beautiful designs in data visualization. Among the nominated designs for 2012 is one on the Vancouver Canucks’ franchise history. Here in Ontario, York University and OCAD University have teamed up to develop the Centre for Innovation in Information Visualization and Data-Driven Design (CIV-DDD), which is essentially a data visualization research hub. Leveraging computer scientists from York and designers from OCAD, the group is working to develop data visuals that help solve specific problems across the areas of healthcare, arts, social sciences and engineering. Sample projects underway include understanding the impact of social media content and mapping the origins of Africans liberated from transatlantic slavery. MaRS’ very own Data Catalyst team is working with data to provide insights on the innovation economy in Ontario. Their outputs will include visualizations and dashboards representing the impact of innovation support in the province, as well as visualizations that highlight opportunities for market and economic growth in key sectors. 17 Looking ahead There is no question about the potential for growth and innovation in the data visualization space. Otherwise hard-to-understand rows and columns of numbers are brought to life through visualization techniques. Data illustrations not only help to tell a story, but they reveal the true meaning behind a data set. However, data visualization is only one of a series of analytics techniques. As we continue to collect more and more data every day, an increasing number of techniques will be required to distill the most complex of data sets down to an easily accessible message. This is an existing gap in the big data market, and an area where entrepreneurs should think about focusing their efforts. 18 References 1. McKinsey Global Institute: Big data: The next frontier for innovation, competition, and productivity 2. TechCrunch: Eric Schmidt: Every 2 Days We Create As Much Information As We Did Up To 2003 3. CNN Tech: A telescope that generates more data than the whole internet 4. The World Bank: Mobile Phone Access Reaches Three Quarters of Planet’s Population 5. The Economist: Data, data everywhere 6. Forrester report: Welcome to the Era of Digital Intelligence 7. Fast Company: Time To Build Your Big-Data Muscles 8. Edward Tufte: The Visual Display of Quantitative Information 9. Forrester report: Advanced Data Visualization (ADV) Platforms, Q3 2012 10. Content Marketing Institute: What is Content Marketing? 11. Gartner report: Emerging Technology Analysis: Visualization-Based Data Discovery Tools 12. SGMarketwatch: Venture Capital Sees Big Returns in Big Data 13. Dow Jones VentureSource 14. Fast Company: Infographic of the Day: What the Census Said About Us…in 1870 15. Toronto Urban Development Services: Population Growth and Aging 16. Statistics Canada 17. Visual.ly: John Snow Cholera Map 18. ESRI Canada: Case Study: Sault Ste. Marie Innovation Centre 19. Canadian Institute for Health Information: CHRP Key Findings 20. Edward Tufte: Visual Explanations: Images and Quantities, Evidence and Narrative, p.44 21. GE: Renewable Energy Sources 19 Appendix A: The Data Visualizers Data visualizations are being used today to unravel the meaning behind big data sets that would otherwise be too difficult to understand. Highlighted in this piece are eight Ontario-based startups whose innovative applications are setting the future for data visualization. Bio.Diaspora Bio.Diaspora brings together disparate information about global outbreaks, climatic conditions and travel patterns, and synthesizes them to facilitate risk assessments of infectious disease threats around the world. MaRS Market Intelligence spoke with Kamran Khan, Founder of Bio.Diaspora. How did you come up with the idea for Bio.Diaspora? I am an infectious disease physician, and have my own clinical practice based at St. Mike’s hospital in Toronto. Back in 2003 when we had the SARS outbreak, I really got a to chance to see how a disease can impact a city — not only in terms of health, but also the psychological and economic damages that come with it. SARS alone took $2 billion out of our local economy here in Toronto. It really got me thinking about the interconnectedness of the global community, and I realized that I was going be to practising medicine in a world where I would require a full understanding of infectious disease activity across the globe. The question, however, was how could one individual possibly know what is happening in cities all around the world and how they are connected to each of those cities? This is when I began to focus my research on the global airline network, which transports over 2.5 billion travellers every year. Following airline activity was a way to grasp how the world is interconnected and how cities and countries share the risks of infectious diseases. Where did the name Bio.Diaspora come from? I realize it’s quite a mouthful, but Bio.Diaspora is talking about the scattering of living systems. Its literal meaning is the scattering of life. It represents how living systems interact in a world where there is so much movement happening. 20 Where are you able to source data about not only global outbreaks, but also travel patterns? We get information from official government reporting as well as from online chatter, which can provide early clues about infectious disease outbreaks. We’re pulling information from our colleagues at the Harvard Medical School who run HealthMap, from NASA satellite imagery and from a variety of other sources pertaining to human, animal and insect populations. With respect to the airline industry, we work with different agencies to analyze over 2.5 billion travel itineraries every year. How do you present this information within Bio.Diaspora? In terms of techniques, we use a combination of maps, charts, tables and word clouds to visualize different types of information. For example, we are using a word cloud to visualize the birthplace of residents across the United States. We have about 200 countries from where people originate and portraying this information in a chart or a bar graph is not particularly efficient. One thing that is not often considered is how humans will interact with information. When designing Bio. Diaspora, visualizing the data was very important to me, and, more importantly, visualizing it as accurately as possible was critical. We want to minimize the potential for misinterpretation. Who are the customers Bio.Diaspora is targeting? Our customers are currently governments and public health agencies, which have a responsibility to protect their citizens against international infectious disease threats. Going forward, we will include national departments of defense that are concerned about biological threats, as well as companies that are negatively impacted by infectious disease outbreaks such as insurance agencies. Another target is pharmaceutical companies that manufacture drugs or vaccines for certain diseases. Do you see this information ever becoming available to the public? I don’t see this happening anytime soon, because the data is potentially sensitive in its raw format. However, it is possible that sections could be made available to travellers, because getting sick while travelling can be particularly unpleasant and people would value this information. There may be creative ways to utilize some of our information and distill it right down to an individual traveller’s needs. Looking into the future, what is your vision for the ideal state in which diseases are tracked? My hope is that we get away from reacting and move more into anticipating. Today, we’re largely firefighters in that we basically wait for fires (that is, outbreaks) to emerge and land on our doorstep, and then we react to them. What we really need is an early warning system. An early warning system could provide any jurisdiction with the ability to look out to the rest of the world, to have situational awareness of what’s occurring in terms of outbreaks, and to understand how people are moving into that particular geographic region at any given time. As a global community, we need to start thinking more proactively and prioritizing prevention, rather than working as a collection of individual countries solely focused on our immediate self-interests. This is a reality of living in a highly interconnected world. In hindsight, do you think SARS could have been anticipated and prevented in Toronto? I think there was definitely enough information to indicate SARS was going to land in Toronto, as there was something unusual happening in Guangdong province in China, which is right next door to Hong Kong. Many of the tools that we have today didn’t exist back then, but they would have certainly given us good insights. Looking back, we can see just how predictable the movement of SARS was. It’s amazing how much the spread of the disease tracked the corridors of people’s movements worldwide. 21 What are some of your favourite visualizations? One image that really speaks to me is the image of flight lines in the world. When looking at it, you can see the fabric of how the world is connected today. You can not only see the physical geography of places, but also a depiction of social contexts and relationships. It’s not necessarily an image that would be used for decisionmaking, but it’s a beautiful rendering of something that’s complex and global. BuzzData BuzzData gives people the analysis and visualization tools they need to find the story in a data set, and to communicate it visually through the creation of smart executive summaries. People can set up their own BuzzData Hive, where teams and communities can store and share their files, visualizations and analysis. MaRS Market Intelligence spoke with Nick Edouard of BuzzData. What is the underlying problem you are trying to solve with BuzzData? There are many problems with the way in which data is shared today, particularly in large organizations. Too many people look to share large files such as Excel spreadsheets by email, with massive cover notes. What the intended audience usually really needs is just the key facts and figures: the executive summary, if you will, from that data. People often do not have the time or the skill sets to understand data that is not communicated visually and effectively. And while there are good file-sharing tools such as Dropbox, they are not an optimum way to communicate information, particularly when that data needs engagement and discussion in order for meaningful insights to be extracted. The visualizations are key to helping users explore and understand the data that’s been uploaded. For example, say you have uploaded a North American sales forecast. BuzzData will offer a suite of tools that asks, “What do you want to do with this data? Do you want to try a visualization? Do you want to complete an infographic? Do you want to find some structure in this unstructured document?” 22 The tools will then return the output of that manipulation as a new artifact into the data room. These artifacts can then be structured into an executive summary, highlighting the key facts and figures that need to be communicated. Have you built these visualization tools internally? No. We are leveraging best-in-class tools from third parties, one of which is the infographic application infogr.am. There are a whole host of visualization tools, applications and products out there that do one thing really well, whether that be mapping, graphing, motion charts, etc. But it’s hard for a user to know what exists. Our goal is to make it easy for a BuzzData user to choose the best tool they want and to produce the type of analysis and artifact that they are looking for. Who are your some of your customers today? We’re doing a lot of work with some really exciting companies and organizations that regularly produce data and know they need to do better in terms of how they share and use it, both internally and externally — market research and management consultant companies, for example. Often, they are looking at new ways of delivering information to their customers, say by taking their current executive summaries and turning them into something that is much more visual, engaging and easier to grasp. We also ran the Best City in the World Contest with The Economist’s intelligence unit earlier this year to crowdsource a new livability index. The Economist Intelligence Unit (EIU) has been publishing the results of their livability analysis for years. One angle they were interested in was readers’ thoughts on whether they were approaching the analysis correctly and what additional factors could be included in the index. So the EIU published the underlying data, and the community engaged with it and produced some really interesting results. One of the things the winner did was assess the relative proportion of green space to urban sprawl within a city using OpenStreetMap and Google Earth. Another individual produced an app that calculates the best city for an individual based on a user’s own preferences and rankings. How comfortable are your customers with sharing their data publicly? BuzzData Hives can either be private — locked down and by invitation only, or public — discoverable by Google. At the moment, the private Hives outnumber the public ones at a rate of four to one, so we’re definitely not just about public or open-data sharing. That said, there are some very interesting things happening on the public side. Some organizations are looking to better inform their communities about specific topics so that they in turn become better advocates, and others are seeking to get their community involved in the development of products and services. A professor at the University of Toronto’s faculty of math has recently set up an NSERC public Hive. This Hive gives academics, government officials and anyone interested in NSERC* a way to engage with the data related to its funding so they can better understand how it is being applied, whether it’s working and so on. Our customers are sensitive to issues around data storage — specifically, security and jurisdictional considerations. While BuzzData itself is secure and not cloud-based, our customers are increasingly questioning where their data physically resides and who could potentially access it. This is a challenge for us and the SaaS market in general, as we need to build solutions that meet our customer’s specific requirements. Fortunately we anticipated this, and we believe we have built the product to be able to accommodate these requirements. 23 What are some of your favourite visualizations? I’m a very big fan of Santiago Ortiz’s Moebio project and, specifically, his visualization that broke down The Iliad by the number of times each character’s name appears in each book of the poem. Having read classics at university, I thought this was a bit of a cheat sheet that I could really have done with ten years ago! It was an interesting way of looking at something that you wouldn’t think was necessarily data. It provides structure to what is otherwise unstructured information. *Natural Sciences and Engineering Research Council of Canada DataAppeal DataAppeal is a web-based tool that automatically renders large amounts of data into three-dimensional animated maps. It offers an alternative to the often complex mapping tools available today. MaRS Market Intelligence spoke with Nadia Amoroso and Haim Sechter of DataAppeal. How did the idea for DataAppeal come about? Nadia: My background is actually in landscape architecture and creative mapping and I recently wrote a book called The Exposed City: Mapping the Urban Invisibles. While writing this book, I was looking at various data points within a city — elements such as demographics, crime rates and surveillance cameras. This is information that is not normally visible. I was interested in creating some type of landscape or new topography based on this data, hoping to reveal hidden patterns within a geographic space. So I began manually creating data maps. When presenting these maps at various conferences, I was amazed at the interest they created. This interest demonstrated to me the importance of creating a tool that allowed others to use visualization techniques to help them analyze data. What about existing geographic information system (GIS) tools, which can also develop data maps? Haim: I have been in the business intelligence space for 12 years now, and have seen all the issues GIS presents. Namely, they are very expensive and difficult to install. So what happens is that their use is limited to only a few individuals within an organization. When speaking with Nadia, it was interesting to learn how engaged people were with her data maps. Our goal with DataAppeal is to overcome the challenges presented by GIS today and make mapping tools something that people within organizations actually want to use so that they can share their data. 24 What do you think is special about the insights data maps reveal, as versus other forms of visualizations? Nadia: Mapping is an ideal way to visualize geographic data. One of the key figures I was researching for my book was an architect named Hugh Ferriss. Back in 1916, New York came out with a zoning ordinance, and a lot of people in the area — citizens, architects and even city officials — had a hard time understanding what the numbers and codes of the planning ordinance meant. So Ferriss manually sketched and rendered 3D maps of the form and shape of buildings that could be built, based on the zoning bylaws. He took textual information and turned them into works of art, which got a lot of attention and even graced the covers of The New York Times Magazine. The maps provided instant insight into the data. Why did you choose to use 3D shapes to represent the data in DataAppeal? Nadia: The use of 3D is fairly new in the data visualization space and comes from my background in urban design and landscape architecture. Because our application is built on the Google Earth platform, you can actually walk through the 3D data itself, as if in street view, and view it from all dimensions. It makes the experience much more immersive. From a more practical standpoint, 3D gives a dimension of height to data, which is an extra level of analysis that you wouldn’t normally get from a 2D data map. Often data maps group information in colours, but it is not easy to know the amount of variation between two data points with different shades. It is easier to see variation when one data point is, say, twice the height of another point. What are some of the challenges people run into when using data maps? Haim: With geo-data, the end user needs to have some overall understanding of what it is that they are looking at, otherwise the data can really be misinterpreted. If I’m showing two different values that are exactly the same — for example, the number of shootings in Orangeville versus Toronto — it will look much more intense in Orangeville because of the size of the land mass. We have built some training tools right into DataAppeal so that these kinds of errors are not made. How do you see the DataAppeal product evolving? Haim: One thing we are focused on creating is a data gallery to enable people to profile their visualizations on our website. Making it easy to share data will play a big role in bringing together people and organizations from around the globe. Our emphasis will not be on the sharing of numbers but, rather, on the sharing of art. That is what we feel is the key to driving transparency. Nadia: James Corner is a landscape architect who teaches at the University of Pennsylvania. He has created some very poetic mappings by taking aerial photographs and superimposing collages to show elements that would not otherwise be seen in the image, such as an underground ravine or what is hidden within the soil. His work inspired me to create Data Appeal. Another person I admire is Hans Rosling, who has created some amazing animated visualizations. These are spectacular to watch in some of his TED talks. 25 Infonaut Infonaut’s product, HospitalWatchLive, tracks the interaction of patients, staff and assets in hospital settings, providing evidence to better understand and control the spread of infections. MaRS Market Intelligence spoke with Niall Wallace, Founder of Infonaut. How did you come up with the idea for this product? Infonaut was founded in 2006 and was focused at that time on healthcare and data visualization. We got involved in the SARS response by doing things like mapping quarantine cases. We got really good at tracking diseases through this type of work. Fast forward to 2009 and Infonaut was asked to help a hospital that was experiencing a superbug outbreak. They wanted us to help them get a handle on what was going oninside their hospital building. Up until that point, we had only been working on the movement of disease in the outside world. So how did you tackle the tracking of disease inside a building? We focused on the movements and locations of patients, staff and assets. For example, people not washing their hands, people moving around, assets being moved around and so forth. Diseases are essentially spread through these types of human behaviours. When two things come together, that’s when you have a chance for a disease to make a leap. Our technology is able to monitor down to about eight inches where everything is in a hospital. For example, we put our tracking technology inside of gel dispensers. When a doctor comes in contact with a gel dispenser, we get a positive signal that gel has been dispensed. Why do you need to track hand washing? Is it not mandatory within hospitals? Everybody reports 90% hand washing compliance within a hospital, but our best guess is that it only happens about 40% of the time. If you consider a shift change at 3:00 a.m. on a long weekend, hand washing rates can drop as low as 10%. It comes down to the fact that while everybody knows what they should be doing to prevent infection, they do not always follow through on it. 26 Is this seen as being too Big Brother? This is something we considered early on, that we were delivering a product that could be considered Big Brother. Especially since we have expertise on our team on surveillance systems, and how they overwhelmingly fail. If people feel like they’re under surveillance, they will find ways to defeat the system. With HospitalWatchLive, we focus on preventing infections and protecting hospital staff. We are not interested in analyzing any other types of behaviour with the data we collect. We work on communicating this benefit to the staff and helping them understand that their safety is our first priority. This is what really helps us in obtaining their support and engagement. Any challenges with collecting this data? The biggest challenge around data collection is the privacy requirements associated with personal health information. Part of me feels these privacy policies have created negative impediments to the building, design and delivery of value-added solutions. The other part of me understands why they are necessary. Overall, I feel the pendulum has swung too far. Patient data has gone from being on a clipboard at the end of a bed, which anybody walking through the room could access, to being part of an enormous system with complex algorithms to protect the information and access to it. However, rather than trying to affect change in the area of privacy, we treat it as a necessary requirement and simply work around it. The data we collect is visually overlaid on a map of the hospital. This provides evidence of how infectious disease is actually being spread within the building. The visualizations tend to be a bit of eye candy to engage audiences and provide them with an understanding of what the data shows. That being said, I think visualizations by themselves have little value if you do not act on what the data tells you. With HospitalWatchLive, what becomes more important than the data analysis is the ability to drive behavioural change among staff to limit the spread of infectious disease. The data alone will not be able to do that. Where do you see Infonaut heading as a company? Our goal is to leave behind the pure health IT play and to become more of a knowledge organization by assuming some responsibility for change management. We may also reach the point where we give away our software for free so that we can charge for the knowledge services, which is essentially the “so what” part that follows data visualization. This is where hospitals are going to get the most value out of what we do. Where is HospitalWatchLive being used today? It is being used at Toronto General Hospital, on two floors in the multi-organ transplant unit. Patients in this unit are at the highest risk for infection because they are on immunosuppressive drugs and are an older population. Even though staff are a lot more vigilant in keeping these patients protected, without our solution these patients would still have a higher-than-normal incidence of infections. I really like the visualization of Napoleon’s march across Russia. It does a great job of conveying complex information in a way that can be easily consumed. I would also have to give credit to my iPhone, which is designed in such a thoughtful and elegant way when it comes to retrieving information. Apple, in general, has done a great job in trying not to overwhelm users with too much information when using their products. 27 Polychart Polychart is a web-based application for visually analyzing data and creating charts. Through drag-and-drop functionality, it enables managers, marketers, analysts and other users to understand data visually without having to code or perform statistical analysis. MaRS Market Intelligence spoke with Lisa Zhang, Co-founder of Polychart. What led to the creation of this technology? I did a couple of internships at Facebook’s data science team, so I’ve seen some of the trends and opportunities available in the data-analysis market. I’ve also witnessed the rapid growth of the data-analysis software, Tableau, which is a great tool. What we felt was missing from it was the ability to bring that type of analysis to the web. The advantage of being web-based is that we don’t have to make assumptions about which operating systems people are working under, or how willing people are to download software and plugins. It’s just very accessible. Why Polychart rather than a more traditional tool like MS Excel? The best thing about Polychart is the speed at which you can create a chart. I think iterability is extremely important when you’re analyzing data, since you tend to think of ideas as you’re working. If there’s a lot of friction between when you thought of an idea and when it shows up on the screen, then that idea just gets lost. In data analysis, this can mean the difference between having a key business insight and not. Based on your experience at Facebook, how well do companies exploit their data? Particularly, how well do web companies leverage their large amounts of user data? I think there are a lot of ways in which companies could be using their data but are not due to a lack of talent in the data space — this is particularly true when you’re dealing with big data. There is a Fast Company article I came across which talks about there being 340,000 big data positions in 2012, of which more than half will go unfilled. I think a lot of this is a talent issue and if we can increase the accessibility of data analysis, then companies can go a much longer way. Why is there such a lack of talent? Well, in order to be a good data scientist, you need to understand statistics and you also need to have programming skills in order to manipulate data. In order to visually present data in an impactful way, you need to understand human perception and how to communicate well. Those are a lot of different skill sets at play that are difficult to find in one person. 28 Visualizations can often lead to different interpretations, simply by the way in which the data is displayed. Does Polychart address this challenge? This is one thing we take very seriously. There is ample research into the field of perception that tells us what our visual system pays attention to. For example, people are very good at comparing areas, and so it’s helpful to start the y-axis of a bar chart at zero. It’s also why 3D effects on bar charts and pie charts can distort the data being displayed. 3D effects do a great job at grabbing someone’s attention, but when doing data analysis, accuracy is much more important. The fact that people are good at comparing areas is also why when representing values using sizes of objects (say, a circle), the area should grow proportional to the value represented (as opposed to the radius). Say you are representing the numbers 1, 2 and 3, and you use circles that have a radius of 1, 2 and 3, then the third circle will actually look nine times bigger than the first because people perceive areas more readily than the diameters. Colour is something else that is tricky to use. While colours are great for representing categorical values, they’re not very good for representing quantities. We’re very bad at seeing if a shade is one-and-a-half times, two times or three times darker than another. In terms of choosing the type of charts to use, there is an interesting flowchart that suggests which visualization to use based on the data that you have and the purpose of the visualization. Any examples of poorly made visualizations? The chart titled “Percentage of Comments by Identity” is an example of a visualization that ignores best practices. The 3D effect and the different heights shown give a disproportional area to “Pseudonyms,” and make the area representing “Real Identity” a lot more than 10 times smaller than “Anonymous.” Similarly, this graphic by Gizmodo about the change in iPad battery size has tied the increase of battery size to the height of the image rather than to the area, misrepresenting the increase. Fox News is a large source of misleading visualizations! This chart on Bush tax cuts does not start the y-axis of the chart at zero, which magnifies the change in tax rates. Another chart created by Fox News about unemployment rates is borderline dishonest. The last data point of 8.6% is shown as being a non-change on the graph. Putting aside all these bad visualizations, what are some of your favourites? Napoleon’s March is a classic visualization and a great example of an effective way to present statistics. More recently, the interactive database We Feel Fine is one of the biggest data visualization projects in the past decade 29 Quinzee Quinzee focuses on helping users be smart about energy consumption. To do so, Quinzee presents data from smart meters in a way that educates, motivates and enables residents to make more intelligent decisions about their energy use. MaRS Market Intelligence spoke with Faizal Karmali, Co-founder of Quinzee. Why did you choose to launch Quinzee in Ontario? The regulatory and political environment in Ontario is fostering energy efficiency, particularly through the Green Energy Act. And Ontario is ahead of the curve globally with the adoption of smart meters. There are smart meters on just about every single house and small business in the province. We saw this as an opportunity given there is so much energy data being collected. What is being done with this energy data today? A recent study by Accenture talks about how the average North American spends six to nine minutes a year looking at their energy consumption. We’re talking about an average $2,000 spent per person and it’s being looked at for six minutes. Smart meters are collecting data for utilities for the purposes of billing and energy management, but the data is not yet in the hands of consumers for their own energy management. The challenge is significant. Few consumers understand as yet what a kilowatt-hour is, although the information is available. Moreover, the market has a limited attention span and limited interest in energy. Quinzee is creating more value for utilities and their customers from the data. Why this lack of interest from people in their energy use? It’s because no one really connects their day-to-day behaviour with energy use. It’s just so far removed from an average’s person’s life. Sure you’ve got pockets of people who say they are very energy-conscious, but, in general, the average person does not even think about it or translate their words into action. It really boils down to our culture of excess and the fact that the effects of our overuse of energy are not visible to us. How can data visualization help? The current way in which data is presented requires interpretation and, considering energy use is an area of limited interest, the likelihood of that interpretation taking place is low. For Quinzee, the broader idea behind leveraging data visualization is to ensure people can rapidly interpret our data visuals. For now, we’re focused on providing quick nuggets of information that make it simple and easy for a user to act on. 30 What nuggets of information resonate the most with users? We’re finding people respond the most when their energy consumption data is put into context. For example, we provide a household with averages, so that they can place themselves in context of what’s normal for them, their neighbourhood, their city, country, etc. The natural motivation for humans to be “normal” is what will drive the average energy use lower and lower. This phenomenon is similar to what we saw with the blue box recycling program. The driver for adoption was having households put blue boxes outside: then other neighbouring households felt pressure to use one as well. There is actually a street in the UK that is adopting a similar methodology for energy consumption. All of the residents on the street have agreed to write their energy meter readings on the sidewalk. As other residents walk home, they know exactly how much better or worse their consumption is than their neighbours’. This has led to something like a 20% reduction in energy usage on that one street alone. Will this type of social pressure be enough? It’s only one method we intend to use. Another idea is to help people understand the aggregate of the impact their actions have. Today, we consume energy like it’s an endless resource, but that’s just not the reality of the planet. We are so far removed from the aggregate that it’s not very relevant to us on a micro level. I think one of the major goals of data visualization is to help us place our day-to-day personal behaviour and choices into greater data contexts, be it the context of a neighbourhood, a city, a province, a country or the world. Connecting individual choices and behaviour to something bigger will help people feel empowered. Eventually when someone switches off a light, they will feel like they’re actually contributing to something positive. They will understand the connection. Who owns energy data today: utility companies or households? The way it’s set up now is that utility companies own the data, but households have full access to it. The utility companies are working toward enabling this access, but not all of them are there yet. In the US, President Obama has undertaken what is called the Green Button initiative, which mandates all US utilities to provide energy data to the customer and enables third-party providers like Quinzee to analyze and present that data to end users. In Ontario, MaRS is actually helping drive a similar Green Button initiative for the province. While Ontario is ahead of the game in having implemented smart meter technology, it has been leapfrogged by the US in terms of driving data transparency. Is this a policy issue? Not really. Ontario’s Minister of Energy has been talking about getting information back into the hands of Ontarians. It’s just been a slower process. What is your long-term vision for the company? Resource management is important in many sectors, and our culture of excess and indifference stands in the way of that. Quinzee aims to improve the quality of life of communities, both rural and urban, around the world. By starting in a highly educated, relatively wealthy country like Canada, we hope to prove a model that can be translated to sectors such as food, healthcare and waste. For us, Quinzee is the first of many applications that will use data and information to influence behavioural change. Let’s use healthcare as an example. What if an individual saw the bill every time they went to see the doctor? This actually happened to one of my friends, who accidentally received a bill for his MRI service. He was shocked to see the amount. It has encouraged him to think twice the next time he feels the need visit the doctor. It’s not about changing the healthcare system; it’s about using data to drive transparency so that people can feel engaged and make informed decisions. 31 What are some of your favourite visualizations? I really connect with one visualization NASA has of the world at night where the lights are on everywhere. With just one glance you can see how well-lit a certain city is, and how much power countries are using. GE has also done some spectacular work with their visualizations. We look to what GE has done with visualizations as motivation for how we would like to share some of our information. Venngage Venngage is building a solution to automatically transform data into visually appealing infographic reports. These reports can be used for a variety of purposes, from content marketing to data analysis and reporting. MaRS Market Intelligence spoke with Eugene Woo, Co-founder of Venngage. Where did the idea for Venngage stem from? Our company started out as Vizualize.me, which was a simple tool for visualizing your resume. Basically you signed in using LinkedIn, and Vizualize.me converted your LinkedIn profile into an infographic. This tool got a lot of traction and press coverage from outlets such as TechCrunch and Forbes. Even today we get at least 1,000 sign-ups a day, and have over 200,000 users in total. The problem with Vizualize.me is that it offers limited engagement. Users only go to the site if they’re hiring or looking for a job. This represents something like 10% of the population, and we find a large percentage of our users don’t come back to the site because they have no reason to. Nevertheless, Vizualize.me helped us realize the power of infographics, which are a unique form of data visualization. It also created a lot of inbound interest from clients who wanted infographics for their custom data. This is how we came up the idea to automate the creation of infographics. Why do you think the use of infographics has become so popular? I think that, in general, and I know this is a cliché, a picture is worth a thousand words. Would you rather read a one-page article or just look at an infographic report? When done right, an infographic can help you to synthesize information very quickly and easily — sometimes in as little as 30 seconds. In today’s world, where we are bombarded by millions and millions of messages, we need something like this. If, after looking at an infographic report you want to dig deeper, well, that’s when you can read the actual analysis. 32 When it comes to content marketing, there are a lot of mediocre data visualizations out there. But the same can be said about images and blog posts. If you take all the blog posts ever written, you will probably have 99% that are not very good and 1% that are great. I think the same holds true with infographics. The difference being that infographics tend to get shared more often and tend to receive more press coverage, so people just see them more. For example, a bad infographic will surface a lot more than a bad blog post. Which visuals tend to resonate the most with users? I think one of the simplest things is knowing how to make text stand out. Take a number, for instance. Most people think they have to visualize this one number, whereas sometimes it’s just easier to highlight that number on its own, particularly if you’re not comparing it with anything else. Venngage tends to stay away from very complex visualizations. For example, something like a network graph can look very nice when seen from afar, but nine out of ten people won’t understand it. Our clients sometimes ask for things like a network graph and we have to convince them to use something simpler. For us, making something simple that is still visually appealing is a much bigger challenge than making a complex visualization. Apart from being a content marketing tool, how else are infographics being used? Our hope is to get companies using them internally. Today, the average office worker still uses Excel or PowerPoint to do their data analysis. That really hasn’t changed in the last fifteen years. Moreover, data is locked up in people’s Excel spreadsheets, which is an inefficient and old-fashioned way of working. We want to provide a tool that allows the average worker to easily convert their data into insights and to be able to share these insights with other people in the organization. This will help free up data and drive a lot more transparency. As technology advances, how do you see the exploration of data evolving? When companies talk about analyzing data, it’s still very much a domain for data scientists or business intelligence (BI) folks. It’s still a very high-tech, difficult process that involves lots of expensive tools and lots of specialized people. I mean, a typical enterprise business analytics tool can cost hundreds of thousands of dollars! I think data analytics will evolve with the consumerization of IT and become more of a consumer-based offering that everyone can use. With Venngage, we’re going to adopt a freemium model like GitHub. You’ll be able to create free visualizations up until the point that you want to use real company data, and then you’ll need to convert to a paid account. This will probably be adopted very quickly by, say, marketing departments, who don’t necessarily need to analyze a whole data warehouse but just a small set of data. I also see more of what I call the self-service model being used, where the end user can do the work themselves rather than relying on a team of analysts or BI experts. What are some of your favourite visualizations? I love the work of Nicholas Feltron. I also really like Facebook’s timeline and how it visualizes such a large amount of data. The funny thing is that Vizualize.me had a timeline as part of their site. We thought it was super cool and then, maybe two months later, Facebook came out with their timeline! We thought, “oh no, everyone will think we copied Facebook!” But really — we built ours first. 33