Photo: Wikimedia Commons
- Google just reshuffled the deck. Google didn’t manually remove any sites from its index. The search spammers are still in results, just in different places.
- Everybody’s doing it. SEO companies now hire teams of Web surfers to post fake comments with links back to a customer’s Web site. Because Google doesn’t do a good job of filtering out this kind of trick, companies now have to play this kind of game to get on the first page of results.
- It’s getting worse. The problem is getting worse “at an accelerating rate” because people have found out how easy it is to make money by putting up an SEO-optimised page with no information on it, then littering it with ads from ad networks.
Blekko takes a different approach — it indexes the entire Web like Google and Bing, then has curators and users index useful sites and mark bad ones as spam. The really bad sites are kicked out of results entirely.
Here’s an edited transcript of our conversation:
Business Insider: Google’s recent changes to its search algorithm seem to have improved results, don’t you think?
Rich Skrenta: Google’s latest change got a lot of attention, but it wasn’t really a substantial change. They didn’t take anybody out of the index. They simply adjusted the rankings. Mahalo went from #3 to #7 on the first page. eHow actually rose. Numbers one and number two traded places. All the same sites are still there.
BI: So how do companies trick Google into appearing higher in results?
RS: There are companies very specifically focused on SEO and link building. You can outsource your search strategy to a company in India, and get 1,000 people in India sitting shoulder to shoulder in cubes posting comments on different blogs with links back. It’s basically human astroturfing, right? It’s human-authored content, but the person will read the post very quickly, then post a comment with a link back to bestnikeshoes.com or something. It’s sort of appalling.
The really sad thing, if you want a rank in Google, you HAVE to do this. Even if you’re legitimate. That’s what rises in Google search order. If you don’t do this, you won’t be on page one of results.
BI: So how does Blekko do it? You crawl the entire Web like Google, right?
We have a full Web crawl. Come to Blekko, type /date and you can see the sites going in in real time. Our
whole crawl is about 3 billion pages. It’s not as big as Google or Bing’s index, but we biased it to what we think are the best pages.
BI: But how do you determine the best pages?
RS: On the positive side, we’re using the Wikipedia model. We have curated slashtags, which are collections of sites related to a topic topic — for instance, /health has top health sites. We edited that.
We’re also asking members of public help us edit them, and to make their own slashtags. We had one called /colleges created by a user, he put every college in the U.S. in there, we thought it was great. Users have made 100,000 slashtags since our launch on November 1.
So if you come to Blekko and do a health query like “cure for headaches” or “avoiding swine flu”, we will automatically turn on /health, and push your results from those high quality health sites, like mayoclinic.com, with content written by doctors.
On Google, you’re getting content farms — eHow, WikiAnswers. The answers are not written by MDs, they’re written by people who don’t know anything about the topic at all, people paid 5 cents to write 100 word article.
BI: How do you get rid of those sites?
RS: On other side, we have a little “spam” link under each result. You can mark a site as spam and it will take that site out of results for all searches going forward.
We recently used the spam links and tallied up the top 20 most hated sites on the Internet. eHow is actually number three. Experts Exchange is number one. You search on a technical question and it looks like they have the answer. Then you click through and get a pop-up asking you to pay $30.
So we banned it.
But 20 sites is a drop in the ocean. So two weeks ago, rolled out an algorithmic identification of spam. We looked for intersection of low-value content and aggressive participation in online ad networks. So if there are a bunch of ads on the page, or a bunch of sites all running same ads with the same IDs, and the content doesn’t look good to our classifiers. we put you on the list. We got rid of over 1.1 million sites that way.
These aren’t headscratchers — eHow can have a debate about whether they’re valuable or not, but you look at this stuff, it’s “oh my god, it’s obviously junk.” The quantity was astonishing to us, so we blew them all out of our index, and we’re very happy with that.
BI: So how is Blekko doing? Are you getting traffic?
RS: The last announcement we put out said we’re over 1 million queries per day. [Ed: by way of comparison, Google gets more than 1 billion.]
We’re actually getting hit with a lot of traffic, 30 to 50 per second. It’s not all human — there are some bots and scrapers that copy all of our data. We go through our logs and try and throw out that stuff and count what’s left. So it’s about 1 million actual queries per day.
BI: It seems like you’ll never match deep-pocketed companies like Google or Bing, so what’s the endgame? How do you turn this into a profitable business?
RS: Search is one of best businesses on the Web. Search traffic monetizes really well. It can be 50 to 100 bucks per 1,000 page views (CPM).
When I was the CEO of Topix, a local news aggregator, we were doing pretty well monetizing local page news, about three or four bucks per thousand. We were getting one dollar even on forums, and those ads are generally worthless.
If you go to a social network like Facebook, CPM can be 15 cents. People aren’t in buy mode. They don’t possess intent to buy. But if you go to a search engine and type in “Palo Alto Real Estate,” you’ve declared your intention to buy a house in Palo Alto. So real estate agents buy as many clicks as they can possibly get.
Our intent is to put search ads on the site. We can be a profitable company with far less traffic if you were running a local news site. $50 is much higher than $3.
BI: And you won’t do an AdWords type network, right?
RS: Any self-serve ad system is an ATM machine. The cost of putting a URL on the Web is zero, maybe a few cents if you want human authored content. It’s just a drift net for catching searches. That’s why most of the pages on the Web at this point are just junk, and it’s getting worse at accelerating rate.
NOW WATCH: Tech Insider videos
Business Insider Emails & Alerts
Site highlights each day to your inbox.