There’s been a lot of talk about HTML5 recently and, in some geek circles, there have been snickers when companies have done a poor job of implementing it. But what is the true state of html5. To find out, I decided to check whether the top sites on the internet had implemented it and how successful they were in doing so.
One of the first thing in this effort was to get a decent list of sites. Unfortunately, it seems that it has become increasingly difficult to get a sense of which sites are the most popular when it comes to number of visits. I eventually settled down on Alexa’s Top Sites list because it featured most of the sites people think of when considering what large sites are and includes a few non-US sites.
I then used the W3C Validator against each of the top 25 sites. This allowed me to get 3 different pieces of information:
- Doctype: This is what the site declares as its HTML code version. In other words, how the site identifies what version of HTML it supports.
- Encoding: This is the language the site uses, which gives us a better understanding as to whether they are targeting a particular language or trying to offer a global site.
- Validation: This is how the site validated when tested for errors relating to the HTML version it purported to be offering. It gives us an idea as to how compliant with the standards the site truly is.
Surprisingly, a number of popular Web 2.0 sites were not in Alexa’s Top 25 so I created a separate list for them.
Looking at the top 25, here are the results:
Looking at the data, the first thing that is interesting is how many sites have made the switch to HTML 5. Of the top 25 sites, 14 have made the switch to HTML 5. This means than in the last year, 56 per cent of the largest sites on the internet have completely modified their code base to comply with a new standard. 6 sites are still left on the old HTML standard and 5 are sticking to the somewhat more recent XHTML standard.
However, it is also interesting to note that none of the sites which have made the transition comply with proper HTML standards. In fact, of the top 25 sites in the Alexa list, only MSN was found to provide completely valid code. Maybe Microsoft could point those people towards their other properties. Amazon was the worst offender, with 516 errors in their code, showing that disregard for standard compliance does not seem to have an impact on economic performance. However, Ebay and Yahoo came closely behind with hundreds of errors in their code, maybe highlighting Amazon as an exception.
Another interesting phenomenon is that most of the large sites have adopted UTF 8, the encoding type that support most languages, as their default language. Once again, over half (56%) of the sites have switched with Amazon and Google being among the rare exceptions. An interesting aside here is that the W3C validator may have issues when it comes to validating chinese sites as it was not able to finish the job.
Web 2.0 Companies
Looking at Web 2.0 companies, the data was surprising:
I captured the data for companies other than those in the top 25 and a few interesting trends seem to pop up. The first thing that came as a surprise is that there seems to be that a lower number of sites have made the transition to HTML 5, with only 5 sites out of 11 (or 45 per cent) having completed the transition. There seems to still be a strong preference for XHTML as the way to encode pages.
Also of note is that all sides have plans for globalization, encoding their page in the UT-8 format that can support both western and non-western alphabets.
However, none of the sites successfully validate in any of their preferred standard. It looks like there is still much room for improvement in the world of HTML validation.
Read more posts on TNL.net »