Last week, an entrepreneur and writer named Marco Arment trashed us in a blog post, which was then picked up and forwarded around the Twitter-sphere.
Marco accused us of lots of horrible things, including “scraping” his content without permission and making it look like he wrote for us. He also complained that we didn’t send enough visitors to his web site.
Well, when we read Marco’s post, we were initially confused, because we actually don’t do what he said we did, at least not deliberately.
So we launched an investigation of sorts. And, eventually, we finally figured out what happened.
We thought that, by now, Marco’s frustration was last week’s news, so we weren’t going to take your time explaining what happened. But then Felix Salmon of Reuters said he wanted to write about it and asked us a bunch of intelligent questions. So we figured we might as well go ahead and explain.
Before we begin, though, we should say that one reason we didn’t explain is that it will take a lot of time to do so (thus the several-thousand words below).
In case you’re stretched for time, therefore, here’s the bottom line:
- We don’t “scrape” content, at least not in the way Marco thinks we do (For an automated feature of the site called “The Tape,” we publish headlines from RSS feeds that are directly linked to the publishers’ sites. We don’t include any publisher or writer in The Tape who doesn’t want to be there.)
- We don’t intentionally “make it look like” writers like Marco write for us
- We don’t create pages with short snippets of content from third-party sites to “game Google.” Right now, we DO create such pages, because our publishing system needs to create a page to direct-link a headline to the story on the writer’s site, but these pages include links to the original source and therefore don’t help us with SEO).
- We publish stories from hundreds of outside contributors, with permission and links back, and we make it clear that these stories are written by outside contributors
- We help our contributors reach a much larger and different audience than they would by just publishing on their own sites, thus increasing the impact and influence of what they write (and giving our readers more awesome stuff to read)
- We link directly to stories on tens of thousands of sites, including, in the past, Marco’s
- In aggregate, we send LOTS of readers to these sites—nearly 1 million in August.
- We often test different page layouts, features, and services—which we then modify, adopt, or kill. (Marco encountered one of these last week and assumed, logically, that it was permanent).
- Our contributors and direct-links to third-party sites are only part of what we do: We now have an editorial staff of more than 30 full-time folks who follow news from hundreds of thousands of sources and create original stories and features based on it.
- We allow anyone, including our competitors, to publish up to one of our original stories per day (in full), as long as the story is properly attributed and includes links back to our site and other headlines. This is what we do with contributors’ content, so we think it’s only fair that we allow our contributors to do the same thing with our content.
- We encourage anyone, including our competitors, to “over-aggregate” any story we write, as long as our work is properly attributed. Over the past month, a new perceived sin has emerged in online journalism, one that Marco actually didn’t complain about: “Over-aggregation.” I’ll follow-up with some separate thoughts on this. For now, let me say that our attribution policy internally is to “credit and attribute others the way you want to be credited and attributed yourself,” which usually means naming the writer and publication and including links back to the original story. As long as we are credited in this way, we actually not really concerned whether all the readers on the other site click through to the original story. We hope some do, of course, but we’re mostly honored and grateful to the writer for highlighting our story and us, and we’re happy to help the writer’s publication get some additional readers as a thank-you. (In the old days, traditional publications used to hire PR firms to get other publications to write about their stories. We’re thankful that the world has progressed and publications like ours no longer need to do that).
- We will never link to anyone who doesn’t want us to link to them. (We don’t know of any of these folks, but if they’re out there, and tell us not to link to them, we won’t).
- We have not yet figured out the perfect way to seamlessly combine external linking, syndicated content, and proprietary content in a way that is always a win-win for everyone involved—but we’re getting much closer to doing so. One thing we are sure of is that, in a world with a million sources of information and awesome content creators only a link away, it would be senseless to create a “walled garden” in which the only content we bring to our readers’ attention is content our employees have created. Our goal is to get closer to this “win-win for everyone—readers, contributors, and external sources” every day.
And with that “pre-amble” out of the way, here are the details on Marco’s complaints. (Grab a cup of coffee—it’s going to take a while).
In his post, Marco included the following screenshot of one of our pages. Marco concluded from this page that we had taken a lot of his content without his permission and made it look like he wrote for us—and also that we totally cluttered the page with ads.
There are a few different things to explain here, so we’ll take them one by one. But first, here’s the screenshot Marco showed:
First, the page Marco showed wasn’t created as a page that we ever expected any of our readers to see. It actually was created to get our publishing system to link directly to the original story on Marco’s site.
In this particular case, the “direct link” we set up was from the “ReadMe” box on the right rail of our site. Earlier this year, we created this box, which we often use to link out to stories that we think our readers should read. It looks like this:
Sometimes in the ReadMe box, we include stories that we have written or syndicated on our site, in which case a click on the headline leads you to a page on our site. Other times, as in Marco’s case, a click on the ReadMe headline takes you directly to the story on the author’s site. We created this box because there are a lot of great writers and articles out there, and we wanted an easy away to bring them to our readers’ attention and send our readers to them with a click.
Anyway, the way the ReadMe box works technically is that we create a post in our system with the headline and author information, and then we tell the ReadMe box to either link to that page or link directly to a third-party site. (This was easier technically than creating a separate headline, sub-headline, and author system just for the ReadMe box). To put something into the box, in other words, we need to have a page with the headline and sub-head and author on our site, even if the page will never be seen by our readers. So, for third-party sites, we manually create “stub” pages like the one Marco showed, and then feature them in the ReadMe box with a direct-link to the author’s site. Again, these pages aren’t designed to be read by our readers—they’re just designed to give the ReadMe box an author, photo, headline, and sub-headline with which to present the story.
The reader who sees the story in the ReadMe box, in other words, never sees the page that Marco showed in his post. This is why the page Marco showed has no “clicks” on it. (See the little “0” next to the flame icon in the by-line? That means that the page had never been loaded until Marco loaded it.) The page’s only “job,” as it were, is to tell the ReadMe box what headline and sub-headline and author to show, and then where to send any reader who clicks on it.
So why did we include three paragraph’s of Marco’s text on the page if we never expected anyone to see it? The editor who runs the ReadMe box for SAI, Ellis Hamburger, says he generally creates stub-pages by including one or two paragraphs of content and a link. The three-paragraph portion of Marco’s original post that Ellis took appeared long because it was a three-paragraph block quote Marco had himself used. In that ReadMe post, Ellis only used two sentences of Marco’s own words.
On a side note, it’s worth mentioning that Marco makes his content available for free to anyone who wants to syndicate it using a Creative Commons licence, so Ellis is technically allowed to take a substantial amount of it, even though he didn’t.
So, bottom line, the page that Marco included was not created for our readers. It was created to send our readers directly to Marco’s story on his blog through our ReadMe box.
How We Treat “Contributor” Content vs. Proprietary Content
This, by the way, is also the reason that the layout of the page Marco showed “makes it look like he writes for us.”
We are privileged to have several hundred excellent writers and publications who syndicate their work through us, including Fred Wilson, Mark Suster, Robert Scoble, Mark Cuban, The Atlantic, The Daily Beast, Gizmodo, US News & World Report, Harvard Business Review, Yahoo Finance, CNBC, and dozens of other high-profile folks. We have worked hard to make this syndication a win-win for both us and our contributors, and, on balance, we think we have succeeded.
To make it clear that these writers and publications are syndicating their own work through us, instead of creating original content for us, we add a second “slug” to their bylines. This slug includes the name of their publication and a link to it.
Here’s what the “publication” slug looks like. We include it whenever we are syndicating work from another publication, or when the writer is writing a guest post for us and wants to show his or her affiliation:
And below the headline and publication slug on our contributors’ work, we also include a “contributor” box that features a bio of contributor, as well as some recent stories he or she has written. And, below that, we include a “publication” box that includes direct-links to headlines on the writers’ site.
Those look like this:
The goal of including the Contributor bio box and the publication headline links is not just to tell our readers more about who wrote the story and where it came from. It is to give the writer and publication more visibility with our readers and provide links to some other stories that our readers might want to read.
All together, posts that are syndicated by contributors look quite different than those written by staff writers, as you can see in the live version of John Carney’s post from which we took the screenshots above.
If we had actually syndicated Marco’s post, instead of linking directly to it, we would have included Marco’s “publication” in Marco’s byline, with a link back to Marco.org. We also would have included the Contributor bio box and publication headline box, featuring other headlines on Marco’s site.
But What About Those Strange Page Formats And Ads
And then there’s the third thing we need to explain about the page Marco printed: Why the page looks so strangely formatted and cluttered with ads.
As it happened, the week that Marco decided to trash us on his blog was also a week in which we had begun experimenting with a new service by a company called Perfect Market, which re-formats pages for readers who arrive on them by using search engines.
The goal of this re-formatting is two-fold: First, to increase the revenue we can generate from readers who visit the site via search engines (we appreciate the visits from every one of these readers, but they don’t generate much revenue for us). Second, to increase the number of subsequent pages that these readers click on after they read the story they arrive on.
To begin testing the re-formatted pages using Perfect Market, we had to re-index the whole site. We then re-formatted and saved the story pages on a new URL, www.articles.businessinsider.com. The idea is that, whenever a reader clicks on our of our stories via a search engine, the reader will be served one of the re-formatted pages from www.articles.businessinsider.com. And, if/when the reformatting is out of beta and has been improved, the pages will look nice, and, theoretically, will allow us to generate a bit more revenue.
The initial testing of this new reformatting, which we did for a couple of days last week, did not go particularly well, mainly because of a bunch of reasons that are typical in the testing phases of any rollout. So, we switched them off and began tinkering with them.
But not before Marco found one and complained about it.
So that’s why that page looks weird—because it’s a beta page for a new service we’re testing for search referrals.
Regular readers of our site—those who come to the site directly or arrive via any non-search referral link—won’t these pages. They’re just for folks who arrive via search engines. We like our regular layout the way it is.
So those are the three reasons we were initially confused by Marco’s flame-post. And they are also the three reasons why we hope he might consider revising his view that we’re horrible people who steal his content and try to make it look like he writes for us.
Of course, Marco wasn’t just enraged by the ReadMe stub-page above, which was accidentally re-formatted when we began testing Perfect Market (the ReadMe stub-pages weren’t supposed to be indexed, because they weren’t supposed to be read—another beta glitch).
Marco also blasted us for “scraping” his headlines from Techmeme and including annoying little double-underlined ads in his text.
Here, too, we think there are some mitigating factors that we hope Marco will consider when he next decides to write about how horrible we are. But this, too, unfortunately, requires a fairly lengthy explanation.
First, the “scraping” of headlines that Marco refers to is something we do to create a page called “The Tape.” “The Tape” pulls in headlines from hundreds of RSS feeds and displays them in a timeline, the same way a Bloomberg terminal (or RSS reader) might do.
Here’s what The Tape looks like:
Importantly, the Tape does NOT pull in full stories, except from writers and publications that have given us explicit permission to publish everything they write (which many have).
Equally importantly, The Tape headlines do not link to pages within our site—they link directly to the sites of the writers who published the stories. In other words, if you click on a “Tape” headline, you’ll immediately find yourself on another site—like Techmeme, for example, or Marco.org.
We created the Tape because we didn’t want to bother with RSS readers anymore. We also wanted to create a good experience for readers and make publishers happy that we were including their headlines in the Tape, which is why we made them link out directly. Some of us internally use the Tape religiously, the way others use news terminals in RSS readers. But we have rarely promoted The Tape to readers, so it doesn’t get much external traffic.
(You can always get to it in the navigation bar above. Look for the link in each section header that says “Tape”.)
Now, as with the “ReadMe” boxes I described above, the way our “Tape” works technically is by creating short stub pages that aren’t designed to be seen by readers. These pull in the headline of the post, the originating site, and about 50 words, and the headline and originating site are then displayed on The Tape.
In the past, these pages have been indexed by Google, but because they include a link back to the originating site and page, they do not generate much (if any) SEO value for us. They exist only because it was easier for our developers to use the existing post-headline-author metaphor in our publishing system than to create the Tape entirely from scratch.
But it turns out there’s a problem with this!
The problem is that folks like Marco occasionally find the stub pages and immediately conclude that we’re horrible people who are using the stub pages to secretly game Google and drive up our SEO ranking. Some folks have even concluded that we get most (all?) of our traffic to these Tape stub pages, thus fueling the large reader numbers that we occasionally talk about (12 million uniques last month).
The truth is that those stub pages actually don’t generate much traffic for us. According to our logs, “The Tape” pages got about 8,000 pageviews in August—spread across tens of thousands of headlines. That’s more than nothing, but it’s not much.
But we don’t want anyone like Marco to think that we’re secretly trying to rip them off and game Google, etc. And we also don’t want Google to think that. So we’re going to see if we can add “no follow” links to the stub pages to make sure that Google doesn’t index them. If we can’t do that, we’ll eventually redesign The Tape, so it doesn’t create stub pages at all.
Direct Links To Third-Party Stories
And, finally, we come back to Marco’s second slate of complaints.
The page that Marco showed to illustrate our horrible “scraping” technique, presented below, actually was NOT a “Tape” page. It also was not “scraped.”
The page that Marco showed was a page used to set up another form of “direct-linking” we do, which is to link to third-party sites from special “touts” in our content rivers.
As with the ReadMe box above, these pages are not designed to be read by our readers. They are mechanisms with which we cause our publishing system to link out to other sites from our headlines. On our content “rivers,” these posts look similar to our own posts, except that they include a little black box with an arrow.
Specifically, our direct-links look like this:
If you click on that headline, you go directly to the story on the third-party site—in this case, Doug Short’s. (Doug makes fabulous market and economic charts, by the way, which he graciously allows us to syndicate. Check them out here.)
The page Marco showed was created for this purpose—to create a “direct link” to take readers straight to his story on Marco.org. We liked his story about the problem with Android tablets, and we thought our readers would like it, too, so we created a page to create a “tout” that would send our readers to it.
Of course, given the way our system is designed, when we create a “direct-link,” we also create a stub page on our site. We always include a link to the original post on this stub page, so Google won’t conclude that we produced the original story. But it is possible to find and link to these pages, and occasionally they do get some views.
(The page Marco showed, for example, has been read 158 times in the past three months. These readers likely came to it via Twitter, a search engine, or some other mechanism. Our hope and assumption is that we sent a lot more than 158 readers directly to Marco’s post on Marco.org, via the direct link).
So, in short…
As with the “ReadMe” story above, the second page that convinced Marco we were horrible was not designed to generate traffic for us. It was designed to send readers directly to his site.
We Sent A Million Of Our Readers To Other Sites Last Month
Based on Marco’s own account, our effort to send readers to his stories have been at least modestly successful: Marco says we’ve sent him about 8,000 readers over the last couple of years. We hope these readers enjoyed reading Marco’s posts (we did).
Now, Marco complains that 8,000 readers is not as many readers as he has gotten from other sources, like Stumbleupon and Google, and, therefore, that we are somehow being horrible or swindling him.
We don’t recall ever telling Marco that we would send him any readers—all we did was link to his stories. But it’s certainly true that we can’t (yet) send as many readers to other sites as Google.
But our hope is that, as we continue to grow, we will be able to send lots and lots more readers to the folks we link to. There’s a lot of great content out there, and we want our readers to discover and enjoy it. And we want the folks who created it to be thankful that we’ve helped spread the word.
Also, it seems worth noting that, although Marco is unhappy with the 8,000 readers we sent to him by linking to his stories, many other sites we have linked to and/or partnered with are not.
In August, we sent 923,121 readers to other sites—nearly 1 million.
We are glad to have been able to do this. These links help our readers find great stuff to read, and they give the folks we link to and/or partner with a lot more traffic and exposure. Again, as we grow, we hope to be able to radically increase both the amount of exposure we can provide and the number of readers we send.
(And this point about “exposure” is also worth mentioning by the way. Although Marco seems deeply offended that we brought some of his stories to our readers’ attention, many other publishers are not. As I mentioned above, we are privileged to have several hundred excellent contributors who share all or some of their content with us. They do this for two reasons: First, to put it in front of our readers, who are often different than their readers, and, second, to get some additional readers to visit their sites. Unlike other sites that syndicate content, we include all of our partners’ original links in our stories, and often these drive significant traffic back to their sites. We also include llinks to other headlines, which help steer other readers there. And it’s also worth mentioning that many of our syndication partnerships are two-way: We give our partners the right to publish a few of our stories while we publish a few of theirs. And we’re thrilled to have them do it. It’s a big, fragmented web out there, and we like to present our stories to readers wherever we can.)
(Almost done! Click to next page for the thrilling conclusion…)
So, anyway, now you can see why we didn’t respond immediately to Marco’s post last week. First, we had to figure out exactly what had happened. Second, we realised it would take a long, long time to explain.
And what are our takeaways from getting trashed by Marco?
Well, first, need to do a better job explaining what we do and why we do it, so intelligent folks like Marco don’t conclude that we’re horrible.
Second, we need to address the “stub page” problem in our publishing system, so folks like Marco don’t find the stub pages on Google and conclude that we’re trying to rip them off or game Google. (This will take some time, so please bear with us.)
As a final note, one thing we don’t do—and won’t—is link to the work of writers or publishers who don’t want us to link to them. And having gathered that Marco is one of these folks, we won’t link to him anymore.
So, henceforth, dear readers, if you wish to figure out whether Marco has written anything compelling lately, please visit Marco.org.
NOW WATCH: Tech Insider videos
Business Insider Emails & Alerts
Site highlights each day to your inbox.