Generating HTML5 using XSLT

by Mike on Jan.28, 2011, under Technology, Tutorials

HTML5 Logo

Recently, I have been updating some of my HTML generation tools to output valid HTML5, rather than the XHTML 1.0 standard I have been using for the last few years. The main advantage from my perspective is the ability to use the more semantic block elements, such as the nav, section and article elements.

In general this is a fairly straightforward task, as I am generating clean XHTML using XSLT and my template library works pretty well, but I ran into some problems whilst validating the output using the W3C Validator.

The first issue is to sort the DOCTYPE out. The XHTML doctype looks like this:

<!DOCTYPE html
    PUBLICĀ "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

This is easy to generate in XSLT using the following output element.

<xsl:output encoding="UTF-8" indent="yes" method="xml"
    omit-xml-declaration="yes"
    doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
    doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" />

This unfortunately forces the document to validate against the XHTML 1.0 specification which does not include all the lovely new semantic elements – which means that my new documents are suddenly invalid!

We need to generate:

<!DOCTYPE html>

which is really hard to do using XSLT. I have read a number of articles that suggest you output the element as text, however this is extremely ugly and as it turns out, incorrect.

The correct XSLT incantation is:

<xsl:output
     method="xml"
     doctype-system="about:legacy-compat"
     encoding="UTF-8"
     indent="yes" />

This forces use of a dummy DTD (about:legacy-compat), which is the W3C recommended way of not using a standard DTD URI.

Now the W3C validator will happily validate against the HTML5 specification rather than the XHTML 1.0 specification.

:, , , , , ,

5 Comments for this entry

  • Kuldeep

    Hi Mike,
    Its a nice article with good information of xsl & html5.

  • Joerka Deen

    Thanks. Just what I needed

  • Brendan

    This causes errors for me as tags are now treated like which is incorrect.

  • Mike

    Hi Brendan,

    using the method shown does mean the document is treated as strict XML rather than HTML, so extra care needs to be taken with things like empty tags and so on, but is the best half way house I have come up with until the parsers have a flag to indicate that you are working in HTML5 with XML syntax.

    The other alternative I have used is to use a pure XHTML 1.0 transform and then run a string replace before sending it to the client (to replace the doctype with the HTML5 one), but that seems a bit wrong IMHO.

    What are your thoughts?

    Mike

  • Brian Z.

    Mike,

    Please delete my last comment, I entered in the tags and it hid the tags in my last comment, so my last comment won’t make sense. Here’s the comments with out the tag markup.

    Nice post. How do you deal with elements like textarea? textarea is an empty element, but if it treats it as xml, it will self close the tag, and this causes a problem…

    I found away to work around elements like style and script, by putting xsl:comments in the middle them, it keeps them from self-closing. But, by putting a comment inside of textarea, it causes problems with the textarea.

    Any thoughts?

Leave a Reply

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!