Google announced today an easier than ever way to block a news site from getting on Google News.
Add a special robot text file, and publishers, you won’t have to be bothered with the extra traffic coming from Google News.
Will any publishers actually do this? We doubt it.
While the traffic coming from Google News might be worth less than direct hits to the site, it’s not like it costs extra to attract those hits.
Google Blog: Today we are announcing a new user agent for robots.txt called Googlebot-News that gives publishers even more control over their content. In case you haven’t heard of robots.txt, it’s a web-wide standard that has been in use since 1994 and which has support from all major search engines and well-behaved “robots” that process the web. When a search engine checks whether it has permission to crawl and index a web page, the “check if we’re allowed to crawl this page” mechanism is robots.txt.
Publishers could easily contact us via a form if they didn’t want to be included in Google News but did want to be in Google’s web search index. Now, publishers can manage their content in Google News in an even more automated way. Site owners can just add Googlebot-News specific directives to their robots.txt file. Similar to the Googlebot and Googlebot-Image user agents, the new Googlebot-News user agent can be used to specify which pages of a website should be crawled and ultimately appear in Google News.