A bit on the formatting of code on this site and HTML/RSS

Win-Vector Blog 2017-01-11

I am running into what looks like a WordPress bug involving formatting of code blocks. I think this is mostly affecting our RSS subscribers. They have been seeing posts rendered almost entirely in ugly fixed-font, the font error typically starting after the first substantial code-block in an article.

I’d like to apologize for any trouble this may be causing. I am looking into it, but I don’t currently have a solution. A work-around would be to not attempt to put pre-rendered code blocks into code font, but I would rather wait on a fix. I do have a diagnosis (it is likely a WordPress issue, and not user error, editor weirdness, or an RSS fault). (edit: please see the comments below for the solution, I was wrong to nest pre inside code- but I still think the WordPress transformations that made things much worse and are in fact a bug.) If you are interested in the details (or can help) please read on.

I am going to avoid “<code></code>” tags in this note, for reasons that will soon be clear.

The HTML formatting issue I have right now with WordPress is:

  • “<code>y</code>” entered directly into the WordPress HTML editor is rendered as: “

    <p><code>y</code></p>

    This is okay, and we mention it only for comparison.

  • However, “<code><pre>x</pre></code>” entered directly into the WordPress HTML editor is rendered as:

     <p><code></p><pre>x</pre><p></code></p>

    And this weird structure is copied to RSS (I originally wondered if RSS conversion was introducing the extra paragraph tags, but they are in the HTML article presentation). To be clear: the HTML source is the first form and the external HTML and RSS presentations are both the second form. Notice in no sense do the “code” blocks surround the text as intended. Also it isn’t the editor causing the damage, the correct form is preserved and remains available to view in the editor. The problem is in the rendering step where input article HTML is converted to output presentation HTML.

This is with current self-hosted WordPress 4.7 running Twenty Fifteen theme.

Now I normally don’t directly use the WordPress online HTML editor (I use Mars Edit 3 on OSX), but I am doing this directly here (and not using the rich text options) to trace down the likely problem.

To my mind the likely issue is the following: think of the parse tree of the second damaged HTML form. In a controlled XML style (as used in RSS) world the parse would have to be:

NewImage

And not the intended double nesting:

NewImage

The extra paragraph tags were inserted at non-harmless places even when I was careful enough to allow no line-breaks in the input (which I am usually not so careful to ensure). Neither tree is a refinement of the other, so they can not be interconverted. My guess is the RSS world the open code tag is active and the closing is lost in some deep context (leaving the opening tag active). Likely in the wilder HTML world the DOM tree ends up looking more like the desired second tree and the closing code tag is not lost to the renderer.

In fact using the DOM inspector in Safari on OSX (instead of view page source) gives us a third tree-structure for the same fragment:

<p><code></code></p><code><pre>x</pre></code><p><code></code></p>

NewImage

Notice the above DOM tree has usable matched “<code></code>” throughout, and contains our intended vertical tree as a sub-tree. This is why viewing of the HTML looks okay (at least on Safari- remember the DOM tree is a function of both the input HTML, which in this case is malformed, and the browser).

Issue reported as WordPress trac 39324. I have forwarded this and a brief description to JetPack support.