Generating HTML5 using XSLT

by on Jan.28, 2011, under Technology, Tutorials

HTML5 Logo

Recently, I have been updating some of my HTML generation tools to output valid HTML5, rather than the XHTML 1.0 standard I have been using for the last few years. The main advantage from my perspective is the ability to use the more semantic block elements, such as the nav, section and article elements.

In general this is a fairly straightforward task, as I am generating clean XHTML using XSLT and my template library works pretty well, but I ran into some problems whilst validating the output using the W3C Validator.

The first issue is to sort the DOCTYPE out. The XHTML doctype looks like this:

<!DOCTYPE html
    PUBLICĀ "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

This is easy to generate in XSLT using the following output element.

<xsl:output encoding="UTF-8" indent="yes" method="xml"
    omit-xml-declaration="yes"
    doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
    doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" />

This unfortunately forces the document to validate against the XHTML 1.0 specification which does not include all the lovely new semantic elements – which means that my new documents are suddenly invalid!

We need to generate:

<!DOCTYPE html>

which is really hard to do using XSLT. I have read a number of articles that suggest you output the element as text, however this is extremely ugly and as it turns out, incorrect.

The correct XSLT incantation is:

<xsl:output
     method="xml"
     doctype-system="about:legacy-compat"
     encoding="UTF-8"
     indent="yes" />

This forces use of a dummy DTD (about:legacy-compat), which is the W3C recommended way of not using a standard DTD URI.

Now the W3C validator will happily validate against the HTML5 specification rather than the XHTML 1.0 specification.

:, , , , , ,

12 Comments for this entry

  • Kuldeep

    Hi Mike,
    Its a nice article with good information of xsl & html5.

  • Joerka Deen

    Thanks. Just what I needed

  • Brendan

    This causes errors for me as tags are now treated like which is incorrect.

  • Mike

    Hi Brendan,

    using the method shown does mean the document is treated as strict XML rather than HTML, so extra care needs to be taken with things like empty tags and so on, but is the best half way house I have come up with until the parsers have a flag to indicate that you are working in HTML5 with XML syntax.

    The other alternative I have used is to use a pure XHTML 1.0 transform and then run a string replace before sending it to the client (to replace the doctype with the HTML5 one), but that seems a bit wrong IMHO.

    What are your thoughts?

    Mike

  • Brian Z.

    Mike,

    Please delete my last comment, I entered in the tags and it hid the tags in my last comment, so my last comment won’t make sense. Here’s the comments with out the tag markup.

    Nice post. How do you deal with elements like textarea? textarea is an empty element, but if it treats it as xml, it will self close the tag, and this causes a problem…

    I found away to work around elements like style and script, by putting xsl:comments in the middle them, it keeps them from self-closing. But, by putting a comment inside of textarea, it causes problems with the textarea.

    Any thoughts?

  • Giorgos

    Hi Mike,
    I am trying the one you recommend with no luck. and some other tags still get an error. I am using the Umbraco CMS but I can’t make it work with html 5.
    Cheers, Giorgos

  • Jim Michaels

    I get total garbage using this. the last method gives me the text of everything in the page all run together with spaces between. will look elsewhere for solution for now.

  • Jim Michaels

    this is modified as a fix from what you gave. it outputs

    and it doesn’t output my html5 like a garbage pile. it looks very nicely formatted, as it should.

    the difference was I changed xml to html. that fixed it.

  • Mike

    That’s interesting – which XSL parser are you using? I use LibXML/LibXSLT in Python which doesn’t show that behaviour.

  • Mike

    Definitely interested in which parser combination you are using! I have issues with the “html” output type, possibly because I use mixed namespaces in my output, but would be interested in seeing your transform chain to compare.

    To be honest, I’m still not entirely happy with the “pure” XSL approach to HTML5 transforms – there are a couple of instances where the output isn’t right, so still have a string replace stage in my output code (which doesn’t feel like a great solution).

  • Mike

    Hi Giorgios,

    I’m sorry, I’m not really familiar with how Umbraco handles XSLT, although as a .NET application I would imagine the parser would be pretty compliant with W3C specs. I’ll ask around and see if I can work it out.

    Cheers, Mike

  • Tracy

    This is just what i need, Thanks a lot.

Leave a Reply

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!