Photo: Jeremy Zawodny (@jzawodn)
Everyone’s mad at Craigslist again!Here’s what happened: Craigslist doesn’t like it when other websites use data from its listings. So it blocks them. A company called 3Taps figured out how to get around that by picking up snippets of Craigslist listings from Google and other search engines.
3Taps claimed yesterday that Craigslist instructed all search engines to stop indexing it yesterday, The Verge noted.
Here’s the thing: The normal mechanism websites use to block search engines is a file called robots.txt.
We just checked Craigslist’s robots.txt file, and it only seems to block specific parts of the site, not all listings. It’s also unchanged from last year, and no one was claiming Craigslist was blocking search engines back then.
And sure enough, Google displays recent listings from Craigslist right now.
Here’s another theory: Last Thursday, Craigslist engineer Jeremy Zawodny told people on Twitter that he was working on a Web crawler—the same kind of automated software that search engines use to index sites like Craigslist. He went on to explain that he was working on making the site “more searchable.”
On Friday, he noted that he was having to rate-limit his software to prevent it from tripping “throttles”—mechanisms Craigslist uses to prevent Web crawlers from overwhelming the site with requests. (Unlike a robots.txt block, throttling would only have a temporary effect.)
So here’s a theory—and we stress that this is just a theory: Zawodny’s crawler may have put enough of an extra burden on Craigslist’s own servers that it stopped serving results to Google and other search engines trying to index it. That would have resulted in 3Taps being unable to scrape Craigslist listings—but accidentally, not deliberately.
We asked Craigslist PR for comment and haven’t heard back yet.
Zawodny appears to have been so heads-down in coding that he wasn’t aware of 3Taps’ complaint.
“What reports?” he replied on Twitter when we inquired.
After we filled him in, Zawody pooh-poohed our notion—but he did concede that his Web crawler got blocked until he made some changes.
So it may just be a crazy coincidence, but it does confirm that Craigslist sometimes blocks search engines to keep its site from getting overloaded. What really happened? Even a top Craigslist employee doesn’t seem to know.