weixin_39720510
eternal?
2020-12-29 17:08

Selectively trim and collapse inter-node space

This pull request makes changes to the algorithm for trimming and collapsing space in text nodes.

The goals of the changes are:

  1. Don't mess with space in <pre> tags.
  2. Try to better honor how HTML and CSS present space between contiguous, inline text- and non-text elements.
  3. Keep removing the most obviously pointless, space-only text nodes around children elements.

On point number two, an example:

javascript
bel`
<p>
  To go back to the homepage
  <a href="/">
    click here
  </a>.
</p>
`

... currently produces ...

html
<p>To go back to the homepage<a href="/">click here</a>.</p>

Browsers display no space between "homepage" and "click here".

I believe most folks would expect the "natural-looking" HTML template to display as if were pasted directly into HTML, equivalent to:

html
<p>To go back to the homepage <a href="/">click here</a>.</p>

There are various workarounds. All the ones I've found look regrettably hackish.

javascript
bel`
<p>
  To go back to the homepage <a href="/">
    click here
  </a>.
</p>
`
javascript
bel`
<p>
  ${document.createTextNode('To go back to the homepage ')}
  <a href="/">
    click here
  </a>.
</p>
`

I'm guessing my good friend has it in for all my beautiful whitespace for very valid and impressive performance reasons. I suppose the situation boils down to something like:

html
<p><span>some</span> beautiful <code>code

... is faster to allocate, diff, render, &c. than ...

html
<p> <span>some</span> beautiful <code>code 

... due to the pointless text nodes before the <span> and after the <code>.

This patch tries to let template authors have their whitespace, and let the rendering algos eat it, too. Essentially, I've tried to write a loose version of the HTML/CSS inter-element space logic into the bel code that reduces whitespace.

The hard call is space between non-text elements:

javascript
bel`
<p>
  <strong>whitespace</strong>
  <em>is beautiful</em>.
</p>
`

Unless we want the algorithm to start looking at the kinds of child elements (or parents) to start making educated guesses about when to drop whitespace, and when to keep it, making this work as expected means a lot of pointless text nodes for non-inline content. If the perf gain is really that impressive, perhaps it would be worth exposing two template functions, one that munches all the whitespace it can, like XML's collapse, and one that leaves it alone, for the HTML/CSS renderer to figure out.

该提问来源于开源项目:choojs/nanohtml

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

8条回答

  • weixin_39640520 weixin_39640520 4月前

    Can you give me an example of "keyed reordering of nodes"? I want to make sure I understand. I'm guessing it's a diff-algo heuristic?

    https://github.com/yoshuawuyts/nanomorph/issues/8

    If the optimization is peculiar to a diff algo, isn't there an argument for making bel as generic as possible, and applying a uniform transform---say, that removes known-pointless text nodes---to bel's output before passing it to diff? That way bel stays simple and clean, and doesn't hide any subtle deviations from well ingrained HTML expectations.

    I think my general feeling is that I'd like bel's output to be as unexpected as possible. E.g. keep the output as close as possible to what DOM node creation through node.innerHTML = '' looks like: screen shot 2017-05-20 at 20 33 41

    I alluded to, but didn't flesh out, another approach that I like more and more: put the choice to optimize in the API and let users opt into it selectively.

    I'm confident we can get the algorithm good enough that we don't need to implement multiple modes :sparkles:

    点赞 评论 复制链接分享
  • weixin_39720510 eternal? 4月前

    Thanks for the links.

    IIRC, innerHTML is de-facto, but not standardized. I'm not sure whether it applies the browser's CSS logic to differential block formatting context from inline formatting context or not. For reference, from CSS3 WD:

    https://www.w3.org/TR/css-text-3/#white-space-property

    https://www.w3.org/TR/css-text-3/#white-space-processing

    点赞 评论 复制链接分享
  • weixin_39640520 weixin_39640520 4月前

    IIRC, innerHTML is de-facto, but not standardized

    Haha, didn't know that - hadn't seen the specs before. But isn't it so that people expect certain space behavior, even if it might not be fixed in a spec? Wouldn't it make sense to implement the behavior people generally expect?

    Or am I missing something? Hope we're not talking past each other haha

    点赞 评论 复制链接分享
  • weixin_39720510 eternal? 4月前

    Or am I missing something? Hope we're not talking past each other haha

    , brother, we're cool! Don't sweat it. And big thanks for speaking my language ;-)

    On innerHTML, it looks like I'm partially out of date. innerHTML and outerHTML are in the current HTML5 working draft. The draft spec says setting innerHTML invokes the standard HTML5 parser.

    It's been a while since I was deep down in the specs, but I believe HTML itself pretty much leaves whitespace alone. There are some rare exceptions, like the first newline in a <pre> tag. But it's the styling engine that does most of the collapsing and trimming.

    Oddly, enough, I just fired up Chrome 58.0.3029.81 to try your console example. It looks like the parser---if not the pretty printer---retains pretty much all space:

    
    > var div = document.createElement('DIV')
    undefined
    > div.innerHTML = '<div>\n<div>hello</div>\n<div>hello</div>\n</div>'
    "<div>
    <div>hello</div>
    <div>hello</div>
    </div>"
    > div
    <div>
      <div>
        <div>hello</div>
        <div>hello</div>
      </div>
    </div>
    > div.childNodes[0]
    <div>​…​</div>​
    > div.childNodes[0].childNodes
    (5) [text, div, text, div, text]
    > div.childNodes[0].children
    (2) [div, div]
    > div.childNodes[0].childNodes[0].data
    "
    "
    

    Note that children returns only Element child nodes, and that the Chrome pretty printer automatically indents trees.

    点赞 评论 复制链接分享
  • weixin_39964819 weixin_39964819 4月前

    thanks for the response re whitespace.

    Once whitespace behaviour is locked in, then a short explanation in the README would be great, including how to visually "force whitespace" if needed. I'd hope 'forcing' can be done in the HTML string (i.e. in the tagged template string itself), rather than using ${document.createTextNode(' ')}. Or if everything just works as expected, then no forcing explanation is needed.

    Regarding 'predictability', it depends on who the target audience is. For library builders (and maybe some developers), a DOM without 'unintended text nodes' is no doubt predictable. But for developers/designers writing straight HTML - who aren't focussed on internal details of rendering libraries - then predictable means "bel HTML renders visually in the browser as if I'd loaded the HTML straight from a static .html file".

    I hope there's a definition of 'predictable whitespace' for bel that works for both groups :-)

    点赞 评论 复制链接分享
  • weixin_39640520 weixin_39640520 4月前

    ping - are you planning to work on this patch further? Is there any way I can assist? - if not let me know; this is an important issue to fix and we shouldn't sit on it for too long haha. Thanks!

    点赞 评论 复制链接分享
  • weixin_39720510 eternal? 4月前

    Checking back in. For some reason, I stopped getting e-mail notifications!

    From my point of view, the answers to "guess how many text nodes" were clear. The counts output by the parser spec'd in the standard are the right answers. More practically, you can load the markup in a web browser and traverse with JS.

    As I understand it, the 4.x to 5.x change was performance-motivated, through and through. It's expensive to diff and patch a bunch of text nodes, and there are good, if imperfect, heuristics for deciding when space matters. So it's worth some thought.

    But to your point, HTML is huge. We're unlikely to write a perfect algorithm based on nodeName. In fact, that wouldn't be enough. CSS can change the presentation of various elements, say with display: inline or display: inline-block. So the best we could do is a "should space stay or should it go" algorithm that's mostly good enough, and document it.

    Insofar as all of this is really about optimization, and the good choice right now boils down to pinning 4.x to preserve too much space and rolling with 5.x to preserve too little, I like the idea of exposing the optimization in the API more and more. I'm going to end up implementing a stripped down version of that API on top of hyperx for one particular app, at least, because the perf gains are just that good, on the one hand, and the purpose of the app is to display pages and pages of highly structured text and inline elements, on the other. In a way, it reminds me of the various memoization optimizations for tree generation. They usually don't come standard. If you hit a perf wall or your app stabilizes, then you go in and apply to the hot paths, as an upgrade.

    , I am so super grateful for your work here on bel, not to mention all the other places! If you want to try to do the best you can with nodeName or similar, feel free to start with my code if that's helpful. But I won't lie and say I have it in me to turn this patch into that kind of rule. It's an awfully big project for a little module!

    点赞 评论 复制链接分享
  • weixin_39720510 eternal? 4月前

    Can you give me an example of "keyed reordering of nodes"? I want to make sure I understand. I'm guessing it's a diff-algo heuristic?

    Two thoughts:

    1. If the optimization is peculiar to a diff algo, isn't there an argument for making bel as generic as possible, and applying a uniform transform---say, that removes known-pointless text nodes---to bel's output before passing it to diff? That way bel stays simple and clean, and doesn't hide any subtle deviations from well ingrained HTML expectations.

    2. I alluded to, but didn't flesh out, another approach that I like more and more: put the choice to optimize in the API and let users opt into it selectively.

    For example:

    js
    var bel = require('bel')
    var paragraph = bel`
      <p>
        Click
        <a href="/help">here</a>
        for help.
      </p>
    `
    var render = bel.collapse`
      <header>Site Map</header>
      <main>
        <aside>
          ${paragraph}
        </aside>
      </main>
    `
    

    would render:

    html
    <header>Site Map</header><main><aside><p>
        Click
        <a href="/help">here</a>
        for help.
      </p></aside></main>
    
    点赞 评论 复制链接分享

相关推荐