DV Fraud Lab Report
Transcription
DV Fraud Lab Report
DV Fraud Lab Report May 1, 2013 DoubleVerify has uncovered elaborate advertising Fraud activity on over 1,200 suspected websites that together are defrauding online advertisers an estimated $6.8 million per month. DoubleVerify has uncovered elaborate advertising Fraud activity on over 1,200 suspected websites that together are defrauding online advertisers an estimated $6.8 million per month1. This massive Fraud originates from user traffic on websites classified as copyright infringement. The ad impressions from this traffic are ‘laundered’ through a complex series of redirects that make the ads appear as though they originated on legitimate sites containing advertiser-friendly content. Further, the Fraudulent ads use code hiding the ad creative from being displayed in the user’s browser, resulting in advertisers paying for impressions that are never seen. .1% DoubleVerify’s Fraud Detection Lab has released new technology to uncover Publisher and protect against this new wave of impression laundering Fraud. This new 10 Billion technology is fully deployed into DoubleVerify’s current MRC-accredited Impressions Fraud identification and blocking platform in order to give customers the 3% most powerful verification protection available in the market today. The DV Network/RTB Fraud Lab is releasing this report with detailed information about this Fraud so that the industry can better understand the methodology employed by the sites involved, and we can work cooperatively to combat this type of activity on an ongoing basis. = $6.8M/Month Fraud on a Massive Scale This activity is by far the largest group of sites and amount of impressions our Fraud Lab has uncovered from a single identification instance since our company was founded five years ago. DoubleVerify data scientists analyzed over 1,000 advertiser campaigns delivering more than 10 billion impressions on 3 million plus websites in a recent 30-day period to reveal the following results: • Over 1,200 websites in copyright infringement categories were laundering ad impressions through a set of seemingly legitimate sites with advertiser-friendly content. • 3.0% of all Network/RTB-purchased impressions were delivered via sites participating in this Fraudulent activity. • 0.1% of Publisher-purchased impressions were delivered via sites participating in this Fraudulent activity. This likely occurs when publishers purchase third-party inventory for audience extension or other off-site fulfillment programs. • DoubleVerify estimates this is costing advertisers $6.8 million per month based on the following analysis: – Recent IAB studies estimate online display ads generate about $642M per month. DoubleVerify estimates one third of the revenue ($214M) is spent on Networks/RTB platforms and two thirds of the revenue ($428M) is spent on Publisher buys. – 3.0% of $214M + 0.1% of $428M = Estimated $6.8M per month in revenue spent on sites identified as participating in this Fraud. PAGE 1 How Fraud Damages the Industry Traffic fraud has become a hot topic rising from a recent spate of industry attention related to Botnet Fraud. For DoubleVerify, combatting online advertising Fraud and protecting our clients’ advertising integrity and performance have been top priorities since the company’s founding in 2008. Our capabilities were further enhanced with the establishment of the DV Fraud Lab where our engineers develop new technologies to uncover the continuously changing methods to perpetrate Fraud. The most obvious impact to the online advertising ecosystem is on the buy side, where advertisers lose millions of dollars on impressions that are never delivered to an end user. A second damaging effect is that many types of Fraud, especially the impression laundering described in this article, can funnel ad dollars to parties who use it to fund activities that advertisers are actively combatting. For example, an entertainment company would be doubly harmed if they are first bilked out of media spend for creative that was never exposed and then those ill-gotten gains are used to support illegal distribution of their copyrighted material to consumers -- further damaging their core revenue stream. Finally, fighting online ad Fraud complicates the media planning and delivery process. It costs advertisers time and money both to identify Fraud and to eliminate its impact from their analysis and media plans. Repurposing the money that is wasted on Fraudulent ads and the operational cost to combat it could have a significant benefit to the performance realized by the buy side. Fraud also significantly harms sellers and the overall economics of the ad market. Every media dollar wasted on Fraudulent impressions directly impacts the revenue earned by legitimate sellers that have unsold or under-monetized inventory. Less obvious, but equally concerning, are the indirect ripple effects that the billions of Fraudulent impressions have on market economics. For example, billions of Fraudulent impressions add supply to an ecosystem where over-supply conditions exist and CPMs are being driven down. Additionally, because legitimate and Fraudulent ad impressions are pooled together in marketplaces, the overall ROI of this segment of impressions will suffer (when compared to the performance of legitimate impressions alone). This reduction in ROI drives down the CPM in these marketplaces, which further damages legitimate sellers. For these reasons, online ad Fraud damages the very credibility of the ecosystem in which we all are participants and beneficiaries. Given these significant impacts across the industry, it is the belief of DoubleVerify that fighting Fraud isn’t a problem to be solved solely by the buyers or by the sell side alone. We also do not believe that the industry should pivot away from the existing, well-understood metrics that have facilitated significant online ad revenue growth and solve fraud by introducing new ad measures that reward only good behavior. Instead, all parties: buyers, sellers, intermediaries, measurement and verification companies have a responsibility to fight Fraud across multiple fronts so that we can build a better industry that benefits all of the parties that participate in it. Hidden Ads and Impression Laundering Working Together Hiding online ads is a common technique used by nefarious actors to increase served impression counts by serving them in conditions under which they are not visible to the user. Sites participating in Fraud utilize many methods of ad serving manipulation including placing the ads in tiny iframes (1 pixel wide and tall), creating off-page DIV elements in which to serve ads or stacking ad creative behind content or other advertisements. Ad Ad Call 1x1 iFrame X Sell-side Ad Server Ad Call Advertiser Ad Server Buy-side Ad Server Hidden Ad Fraud Site Figure 1 - Illustration of Hidden Ad Fraud PAGE 2 Hidden ad Fraud by itself clearly damages brand advertisers who are paying for impressions that are never seen by the user. This type of Fraud reaches beyond the brand advertiser, to the performance marketers. For example, CPA advertisers that give credit and pay for impression/view-based conversions may be compensating Fraudulent sites. When a hidden ad is delivered to the user’s browser that impression is recorded as being served (via an anonymous user cookie) to that browser. If the user later visits the advertiser’s website it may appear to attribution systems and ad servers as though the hidden impression influenced this conversion and credit the site where the ad ‘ran.’ Worse yet, if the user saw a legitimate ad impression, then was served a Fraudulent ad impression, then converted on the advertiser site, the attribution systems and ad servers may give some or all of the credit to the Fraudulent impression, damaging the performance of the seller of the legitimate impression. Sites that have user traffic but are undesirable to advertisers (such as those classified as Copyright Infringement) could not effectively use hidden ads as a standalone mechanism for conducting Fraud. Advertisers using verification services, contextualization and targeting services, blacklists or other techniques to prevent spending money on the sites they find objectionable would easily identify the placement of these ads even if they did not recognize that they were hidden. As a result, these sites add in another technical layer to their Fraud through impression laundering. Impression laundering is almost identical to the well-understood concept of money laundering. Both impression laundering and money laundering use obfuscation processes and seemingly legitimate sources in an attempt conceal the origination source of the ad impression or money. Specifically: Impression laundering is an obfuscation process that conceals the originating website which initiated the advertising impression call and replaces it with a website more appealing to the advertiser. Impression laundering often works by using a series of technically complex highly-nested ad calls through iframes. We have found cases where individual ad calls have been routed through over 20 iframes as part of this obfuscation process. This process makes it difficult for decisioning systems (Ad Servers, DSPs and Bidders) to accurately identify the true originating source of the website that initiated the ad call. Additionally this process can fool verification techniques that advertisers rely on to protect their ad spend. For clarity, we will use the following additional definitions when discussing the impression laundering process: Originating Site: The website the user visited that is the originating source of traffic that created the ad impression. Laundering Site: The website that identifies itself as the source of the traffic primarily as a mechanism to legitimize the ad impression. Ad Ad Call 1x1 iFrame X Ad Call Ad Call Advertiser Ad Server Buy-side Ad Server Sell-side Ad Server Ad Page on Lau ndering Site (with ad call) Ad Page on Lau ndering Site (with a dummy image hiding redirect) Originating Site (with Copyright Infringing Streaming Conte nt) Impression laundering often works by using a series of technically complex highly-nested ad calls through iframes. Figure 2 - Basic Impression Laundering Fraud Chain PAGE 3 Some may wonder, if Originating Sites can launder the ad impression, why bother hiding the ad creative from the user? When the laundering process is successful the advertiser believes their ads are running on the Laundering Site (which they perceive as legitimate). However, if the ad creative was not hidden it would appear alongside the Originating Site content (that same content that advertisers find objectionable) and because Originating Sites like copyright infringement websites are both highly visited and highly scrutinized (for example in the Ad Transparency Report published by the Annenberg Innovation Lab at the University of Southern California which highlights advertiser and sellers of inventory on copyright infringement websites), it is likely that someone would identify the advertiser as appearing next to the content and identify to the advertiser that they are appearing somewhere undesirable. As a result, combining impression laundering with hidden ads is a sophisticated Fraud technique for copyright infringement sites. It makes the Fraud difficult to identify by both human observation (on the Originating Site or Laundering Site) or by machine analytics designed to identify the location in which the ad is being served and allows Fraud perpetrators to divert money from both brand and performance marketers. Why Impression Laundering is used by Sites classified as Copyright Infringement DoubleVerify has found that sites typically have two characteristics that drive them to participate in impression laundering Fraud: • Significant consumer interest that drives substantial site traffic, combined with • Low advertiser interest in appearing on the site because of objectionable content Copyright infringement sites clearly exhibit the characteristics above. Since the origination of Napster in 1999 through the Megaupload saga it has been clear that a significant percent of Internet usage will revolve around the sharing of copyrighted material. Most advertisers have chosen to avoid appearing on sites identified as participating in copyright infringement and utilize verification companies or other media planning techniques to identify and avoid these sites. The high traffic generated by consumer demand and low advertiser acceptance of appearing on copyright infringement sites sets up the perfect motivation for impression laundering. How did this Impression Laundering Fraud activity work? The websites involved in this impression laundering Fraud use highly complex operations and multiple technical techniques that made it difficult to uncover. New technology developed in the DoubleVerify Fraud Lab was key in peeling back the layers of this most recent Fraud activity, and we believe it important to provide the detail of a specific example of the laundering process (albeit one of the simplest obfuscation chains we found) and sites involved so that the industry is educated about how this occurs and can work together to combat it. In this Fraud activity, every laundered ad impression began with an actual user visiting an Originating Site. In our example, we used the Originating Site www.icefilms.info that is a popular site for downloading or streaming copyrighted movies & TV shows. DoubleVerify had previously classified this site as “Copyright Infringement: Downloads” and would block ads for our customer PAGE 4 that appeared directly on the site. Upon arriving at the site, we searched for the most recent episode of “Breaking Bad” and arrived at a page on which it can be downloaded or streamed (www.icefilms.info/ip.php?v=157447&). Figure 3 represents how the page looked when we visited it: Figure 3 - icefilms.info page with Breaking Bad episode Despite the suspected copyright infringing content on the site, from an advertising perspective the page looks relatively normal and doesn’t raise immediate concern that Fraud is taking place. However, upon closer examination of the code we identified abnormalities in the “Play Now / Download Now” image in the lower right hand corner of the page and determined that it is not an actual advertisement. It is an image that overlays and hides the ads preventing the user from viewing them. Figure 4, below, examines the code behind the ad and provides a technical explanation of how the Fraud is perpetuated and the traffic laundered through the seemingly legitimate site, diychef.com. PAGE 5 Figure 4 - Coding on icefilms.com page demonstrating the hiding of ads behind an image. • (Figure 4 – Step 1) www.icefilms.info opens an iframe (via a series of redirects) to www.diychef.com, which appears to be a legitimate video recipe site. In this example, diychef.com is the Laundering Site involved in the Fraud, which prior to this breakthrough, had been classified as a site about Food & Drink. Figure 5 shows the home page of diychef.com with the ads redacted. It is important to note that the code opening the iframe (Figure 4 – Step 1) to the Laundering Site points to a very specific page on the site: diychef.com/ads/ o1/728.php?mn=18 which has special characteristics which minimize detection and maximize Fraudulent revenue: • (Figure 4 – Step 2) The code prevents search engine crawlers from accessing and indexing the page to minimize exposure by using the noindex/nofollow metatag: <meta name=”robots” content=”noindex, nofollow”>. • (Figure 4 – Step 3) The code refreshes the page every 30 seconds using a second meta tag: <meta name=”30;url=728.php?mn=18”httpequiv=”refresh”>. This ensures a new ad is called every 30 seconds allowing the Fraudster to rake up ad impressions and generate more income while the user is downloading or viewing the pirated content. Figure 5 - Home page of diychef.com (ads redacted) PAGE 6 • (Figure 4 – Step 4) The page html further obfuscates the process by overlaying another image designed to look like an 728x90 ad (The Play Now / Download image). This is not an ad, it is a static image overlaying and hiding the ads behind it. This image is coded: <img src=”../728.gif”> and the full URL is diychef.com/ads/728.gif. • (Figure 4 – Step 5) Finally the code opens the ads (behind the dummy image) by creating an iframe to a different URL on diychef.com. This iframe is coded <iframe src=”728.htm”…</iframe> and the full URL is diychef.com/ads/o1/728.html. The ads are being served in the last iframe (Figure 4 – Step 5) – within a transparent iframe, behind the dummy image, within the Laundering Site (diychef.com) that appears legitimate rather than directly on the Originating Site (icefilms.info) where the user was visiting in their browser when the ad serving process was initiated. Figure 6 - Details of page coding for ad serving page When examining the html coding (Figure 6) of the URL displaying the ads (728.html), note how the code includes expansive keywords, title and description of the page despite a notable lack of content. This data presents the page as content-rich and can be consumed by targeting and contextualization services that will use it to match a content-relevant, high-value advertisement to this page on the Laundering Site. An advertiser that is targeting recipes or cooking content may get matched to this page because of this coding and likely would not question appearing on a site with the diychef.com domain. However, in reality, their ads are served to the user visiting icefilms.info behind a dummy image that hides their creative from view. PAGE 7 Ad Ad Call 1x1 iFrame Ad Call Ad Call Ad Call Ad Call Ad Call Ad Call X 05-30s Ad Page on Laun dering Site (with a dummy image hiding redirect) Ad Ad Call Sell-side Ad Server Ad Ad Call Buy-side Ad Server Advertiser Ad Server Ad Page on Laun dering Site (with ad call) Originating Site (with Copyright Infringing Streaming Cont ent) The originating Site opens an iframe to a Laundering Site, which routes the call through several dummy ad pages before sending the add call to a Sell-side Ad Server, from which a Buy-side Ad Server will purchase the ad, leading to an ad call to the advertiser ad server and causing an ad creative to be “displayed.” To maximize Fraudulent income, ad pages may refresh every 30 seconds, generating additional Fraudulent ad impressions again and again. Figure 7 - Overview of the entire laundering obfuscation process Could a Laundering Site simply claim that in the process of buying cheap traffic for their site they were unwittingly participating in this Fraud activity? After all, sites like diychef.com appear to be legitimate businesses, have real content, and generally appear to unsuspecting eyes to be running a legitimate ad-driven website. While an error could be possible, the examples we have examined demonstrate that it is unlikely this obfuscation and ad hiding is simply a mistake. The pages receiving traffic and showing ads contradict best practices in consumer-oriented design. They maximize ad impressions by showing multiple or rotating ads while having little to no content outside of the advertisements. It is our opinion that Laundering Sites likely know that pages receiving redirected traffic from the Originating Sites will not be visible to the user and therefore they are coded to maximize advertising revenue derived from Fraud. As the ecosystem has gotten more complex and the intermediary platforms have become more open, these traits have been utilized to initiate increasing amounts of Fraud. PAGE 8 How did DoubleVerify identify the Impression Laundering Fraud? DoubleVerify’s leading ad verification solution relies on our dual-verification (tandem pixel and crawler) technology to penetrate the transparency gap created by the complex online advertising ecosystem. In many cases, the complexity that creates the transparency gap exists for completely legitimate reasons such as to facilitate inventory fluidity and transfer through a variety of or intermediaries that sit between impression buyer and seller. However, as the ecosystem has both gotten more complex and the intermediary platforms have become more open, these traits have been utilized to initiate increasing amounts of Fraud. DoubleVerify recently has made some significant proprietary advances in our dual-verification method allowing us to more accurately identify all the originating and Laundering Sites in this impression laundering Fraud and how traffic and ad impression serving chains flow between them. These advances, combined with new algorithms created by our data scientists, enabled DoubleVerify to rapidly identify this series of websites, linked together, and executing the large-scale impression laundering Fraud described herein. Protecting Against Fraud Finding solutions that protect against all types of Fraud is top of mind for many advertisers. DoubleVerify’s verification solutions are leaders in the two critical elements necessary for protection: penetrating the transparency gap that exists in Fraudulent impressions and identifying Fraud participants. DoubleVerify’s verification solutions are leaders in the two critical elements necessary for protection: penetrating the transparency gap that exists in Fraudulent impressions and identifying Fraud participants. As evidenced by the discovery outlined in this report, DoubleVerify has invested significant resources in identifying Fraud participants and classifying those sites accordingly. This identification of Fraud activity added over 1,200 sites to DoubleVerify’s Fraud categories in our classification taxonomy. It also enables us to easily identify new sites participating in impression laundering in a timely manner – especially important given how easy it is for Fraudulent parties to simply reroute their traffic through new Laundering Sites. These technology advances have also led us to identify other promising leads that we are actively researching to ensure we maintain protection into the future and continue to proactively identify all types of and participants engaged in online display advertising Fraud. Our track record and the level of resources deployed in this area are strong because DoubleVerify is aware that persistent effort is needed to combat the bad actors in the industry. It is easy for them to establish new websites and techniques to try and perpetuate their revenue stream and customers that work with DoubleVerify can be confident we will be leveraging all of our resources, expertise and systems to continually stay ahead of and identify sites and methods that perpetrate Fraud. PAGE 9 However, identifying sites participating in online advertising Fraud is only half of the battle against preventing it from impacting display campaigns and media spend. In order to effectively combat it, the technology being used must be able to penetrate the transparency gap caused by nested iframes that is both inherent in the complex ecosystem and exacerbated by laundering activity that deploys highly obfuscated ad serving chains. No matter how accurate and lengthy the list of Fraudulent sites may be, accurate determination of the URL on which the ad appears is necessary to have an effective Fraud prevention solution. Our data scientists recently examined impressions running through major online ad exchanges and in that data set we discovered that the RTB transparency gap on the prebid URL was nearly 30%. Consider for a moment the limitation of Fraud protection services that operate as a pre-bid targeting style solution in the RTB environment and how they address the transparency gap. Prebid Fraud identification relies on the URL identified by the RTB platform to determine whether Fraud is occurring or not. Our data scientists recently examined impressions running through major online ad exchanges and in that data set we discovered that the RTB transparency gap on the pre-bid URL was nearly 30%. The RTB transparency gap occurs when the URL the ad is displayed on differs from the URL utilized by the exchange for pre-bid targeting, Fraud prevention, blacklist compliance, or any other URL-based data decisions. A primary legitimate reason for this gap is the fact that impressions are passed through so many intermediaries that the original authentic site data simply gets lost in the transactions. A secondary valid reason for this gap is that some exchanges receive blinded traffic from their suppliers to protect the inventory source. In Fraudulent transactions the Fraudster could deliberately misidentify the URL with limited or no ability for the exchange to authenticate its veracity. As a result, pre-bid solutions using the most accurate list of sites engaged in Fraud or a complicated predictive Fraud score simply won’t work – they have not overcome the RTB transparency gap and therefore are often making a flawed decision and failing to protect the advertiser. The only effective method for Fraud prevention is employing a solution that combines accurate Fraud identification with technology that eliminates the transparency gap on each impression served in any complex serving environment. DoubleVerify offers these solutions to marketers through our verification blocking (BrandShield) and reporting (BrandAssure) products. Our real-time platform integration solution, BrandShield Connect, provides ad networks and exchanges the technology necessary to penetrate the transparency gap and correctly identify the true URL and safety of the ad impression they are selling. All of the DoubleVerify solutions combine our industry-leading Fraud identification with our superior ability to penetrate the transparency gap – the two critical elements that provide a complete and effective solution against online advertising Fraud. Moving the Industry Forward A discovery of Fraud as large as this has significant impact for our customers and it reinforces the significant negative effects that this Fraudulent activity has on the legitimate online display advertising ecosystem. It also confirms the value of vigilance via independent third-party verification continually driving for transparency on ad campaigns. DoubleVerify wants to thank our agency partners that helped us aggressively validate this Fraud activity and supported the rapid deployment of this new technology into our systems in order to protect our joint advertising customers. For advertisers and suppliers using our technology, we added all 1,220 sites into our recently enhanced content categories, and our systems are currently reporting and blocking impressions from these sites as determined by the campaign settings. DoubleVerify remains committed to overcoming the transparency gap in online display advertising and eliminating all types of Fraud that diminish the value and credibility of its participants so that together we can build a better industry. PAGE 10 1 Fraud Definition and Clarifications • DoubleVerify’s MRC-accredited Verification solutions utilize two content categories to classify sites as Fraud and provide clarity for our customers: • Confirmed Ad Impression Fraud: Sites with significant direct evidence confirming that Ad Impression Fraud is taking place. Ad Impression Fraud manipulates ad serving, ad display or traffic activity such that ad impression measurements are incremented inappropriately because the ads cannot be seen by a user, are not served within operationally viewable parameters or were displayed as a result of machine-generated traffic. • Suspected Ad Impression Fraud: Sites with significant circumstantial evidence that Ad Impression Fraud is taking place such as site data, traffic patterns, viewability data and relationships to sites with Confirmed Ad Impression Fraud. • Ad Impression Fraud manipulates ad serving, ad display or traffic activity such that ad impression measurements are incremented inappropriately because the ads cannot be seen by a user, are not served within operationally viewable parameters or were displayed as a result of machine-generated traffic. DoubleVerify wants to thank our agency partners that helped us aggressively validate this Fraud activity and supported the rapid deployment of this new technology into our systems in order to protect our joint advertising customers. • For the purposes of this report, any description of Fraud means Ad Impression Fraud. Sites described as Fraudulent means those in this study that we have classified into the “Confirmed Ad Impression Fraud” or “Suspected Ad Impression Fraud” contextual categories. Fraudulent activity or Fraudulent impressions means the advertising impressions on Fraudulent Sites. • As used by the IAB Verification Guidelines, DoubleVerify contextual category definitions, this release, and related posts and reports, the use of the words Fraud, Fraud Site, Ad Impression Fraud, Fraud activity, Laundering, Laundering Site and other related terms are not intended to represent Fraud or Laundering as defined in various laws, statutes and ordinances or as conventionally used in U.S. Court or other legal proceedings, but rather a custom definition strictly for advertising measurement purposes. • The statements made in this report regarding Fraud all relate to DoubleVerify’s observations and data collection during April 2013. These statements shall not be construed to imply information or conclusions regarding activity that took place outside of the studied period. Let’s build a better industry. Contact info@doubleverify.com. Visit us at doubleverify.com to learn more. PAGE 11