DoubleClick Publisher Blog: Working together to filter automated data-center traffic

Today the Trustworthy Accountability Group (TAG) announced a new pilot blacklist to protect advertisers across the industry. This blacklist comprises data-center IP addresses associated with non-human ad requests. We're happy to support this effort along with other industry leaders—Dstillery, Facebook, MediaMath, Quantcast, Rubicon Project, The Trade Desk, TubeMogul and Yahoo—and contribute our own data-center blacklist. As mentioned to Ad Age and in our recent call to action, we believe that if we work together we can raise the fraud-fighting bar for the whole industry.

Data-center traffic is one of many types of non-human or illegitimate ad traffic. The newly shared blacklist identifies web robots or “bots” that are being run in data centers but that avoid detection by the IAB/ABC International Spiders & Bots List. Well-behaved bots announce that they're bots as they surf the web by including a bot identifier in their declared User-Agent strings. The bots filtered by this new blacklist are different. They masquerade as human visitors by using User-Agent strings that are indistinguishable from those of typical web browsers.

In this post, we take a closer look at a few examples of data-center traffic to show why it’s so important to filter this traffic across the industry.

Impact of the data-center blacklist

When observing the traffic generated by the IP addresses in the newly shared blacklist, we found significantly distorted click metrics. In May of 2015 on DoubleClick Campaign Manager alone, we found the blacklist filtered 8.9% of all clicks. Without filtering these clicks from campaign metrics, advertiser click-through rates would have been incorrect and for some advertisers this error would have been very large.

Below is a plot that shows how much click-through rates in May would have been inflated across the most impacted of DoubleClick Campaign Manager’s larger advertisers.

hidden ad slots -- meaning that not only was the traffic fake, but the ads couldn’t have been seen even if they had been legitimate human visitors.

http://vedgre.com/7/gg.html is illustrative of these nine webpages with hidden ad slots. The webpage has no visible content other than a single 300×250px ad. This visible ad is actually in a 300×250px iframe that includes two ads, the second of which is hidden. Additionally, there are also twenty-seven 0×0px hidden iframes on this page with each hidden iframe including two ad slots. In total there are fifty-five hidden ads on this page and one visible ad. Finally, the ads served on http://vedgre.com/7/gg.html appear to advertisers as though they have been served on legitimate websites like indiatimes.com, scotsman.com, autotrader.co.uk, allrecipes.com, dictionary.com and nypost.com, because the tags used on http://vedgre.com/7/gg.html to request the ad creatives have been deliberately spoofed.

An example of collateral damage

Unlike the traffic described above, there is also automated data-center traffic that impacts advertising campaigns but that hasn’t been generated for malicious purposes. An interesting example of this is an advertising competitive intelligence company that is generating a large volume of undeclared non-human traffic.

This company uses bots to scrape the web to find out which ad creatives are being served on which websites and at what scale. The company’s scrapers also click ad creatives to analyze the landing page destinations. To provide its clients with the most accurate possible intelligence, this company’s scrapers operate at extraordinary scale and they also do so without including bot identifiers in their User-Agent strings.

While the aim of this company is not to cause advertisers to pay for fake traffic, the company’s scrapers do waste advertiser spend. They not only generate non-human impressions; they also distort the metrics that advertisers use to evaluate campaign performance—in particular, click metrics. Looking at the data across DoubleClick Campaign Manager this company’s scrapers were responsible for 65% of the automated data-center clicks recorded in the month of May.

Going forward

Google has always invested to prevent this and other types of invalid traffic from entering our ad platforms. By contributing our data-center blacklist to TAG, we hope to help others in the industry protect themselves.

We’re excited by the collaborative spirit we’ve seen working with other industry leaders on this initiative. This is an important, early step toward tackling fraudulent and illegitimate inventory across the industry and we look forward to sharing more in the future. By pooling our collective efforts and working with industry bodies, we can create strong defenses against those looking to take advantage of our ecosystem. We look forward to working with the TAG Anti-fraud working group to turn this pilot program into an industry-wide tool.

Vegard Johnsen
Product Manager, Google Ads Traffic Quality