Monday, October 6, 2008

Comment Spammers

The first arrivals drifted in innocuously, like the stray seagulls lurking during the early minutes of Hitchcock's The Birds. They were mostly phrased as salutations and praise, posted to the reader comments area of many popular blogs. Harmless notes, at face value, but they harbored a secret menace.

The bloggers hit with these strange messages were victims of an insidious new species, now called "comment spam." But this was a strange sort of spam: Why would someone go to the trouble of spamming thousands of blog pages to deliver only glad tidings and hollow compliments?

The answer, oddly enough, is that the spammers weren't trying to win the attention of the bloggers or their readerships. They were trying to win the attention of Google, like the high school bully beating up the class nerd to impress the homecoming queen. The nerd feels violated, but the truth is that it isn't really about him at all.

The spammers were exploiting the fact that open comment forums on the Web let bloggers post HTML for free. And not just any Web pages: These are heavily valued by Google's PageRank algorithm, thanks to the chronic interlinking of the blogosphere. If you could convince one of those bloggers to link to your new site, you'd have instant credibility. And if you could persuade dozens of bloggers to place links, you'd be an overnight PageRank sensation. Because so much of the Web's traffic now funnels through Google's search engine, that higher ranking translates directly into more "customers" for the spammer.

The (evil) genius of the comment spammers came when they realized that actually persuading the bloggers was unnecessary. All you have to do is post some benign text in the comment field and include a URL for your gambling site or Viagra emporium in discreet HTML. It doesn't hurt that some popular blog formats - notably Movable Type - have a standardized URL for posting comments, making it much easier to automate spam creation. The ultimate goal, of course, is to win the PageRank arms race against your competitors so the next time some hopeful soul types "penis enlargement" into Google, your site will arrive at the top of the list, having been validated by the sudden flood of links from the blogging community.

Comment spam proliferated throughout the blog sites with amazing speed. One blogger had 120 posts spammed over four days. Thoughtful discussion spaces were besieged by meaningless posts, sometimes in broken English, sometimes with bizarre keywords inserted into otherwise prosaic comments in a sort of spammer Tourette's: "I greatly appreciate atkins diet the comments here."

Skilled technicians who were angered by the incursions, the bloggers began to fight back. Within weeks of the comment spam explosion of late 2003, the blogosphere had strategies for stopping or neutralizing the invading hordes, most notably a plug-in created by blogger Jay Allen that blocks all comments that include text culled from an ever-expanding blacklist of spammers. In January, Movable Type's creators released a special update that contained fixes designed to thwart comment spam, including one that makes URLs posted in comment threads invisible to PageRank.

In a way, the rise of comment spam confirms what many of us have felt for years: Google has become part of the Web's infrastructure, as central as HTTP or packet switching. The centrality has been a boon: PageRank has let us feel that information we seek is at our fingertips. But in a kind of dialectical progression, that very success has bred its own antithesis. Build a universe where linking determines relevance and where relevance leads to financial reward, and sooner or later people link in bad faith. Can PageRank learn to tell the difference?

No comments: