2020-12-08 18:58

Escaping with Bleach

When escaping Markdown markup with Bleach, > are always escaped and replaced with >, making blockquotes impossible.

My first thought was to "teach" Bleach to handle > correctly, but I don't think that is appropriate, as Bleach is not Markdown-specific, and cannot be bothered to learn about every markup that can include HTML.

My next thought is to "teach" the Markdown parser about escaped >'s that would otherwise result in a blockquote.

Using a regex similar to the one used to transform > into a blockquote, but that looks for > instead of >, and possibly packaging it into a Markdown extension: unescape_blockquotes. - Does this make sense? - Is there a better way to solve this problem?


  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答


  • weixin_39664995 weixin_39664995 5月前

    Admittedly, things will be easier if Markdown is run before Bleach. This discussion, however, assumes that Bleach must be run first. Example: if I want to store safe Markdown text, and render it later.

    点赞 评论 复制链接分享
  • weixin_39994806 weixin_39994806 5月前

    I would not try running bleach on markdown text. Bleach uses html5lib under the hood, and I would expect the output to be mangled by bleach. Yes, we recommend Bleach as a way to sanitize markdown - but only after rendering the markdown text as html.

    点赞 评论 复制链接分享
  • weixin_39804603 weixin_39804603 5月前

    There are multiple reasons using Bleach to sanitize Markdown is VERY lacking and has a lot of issues: - You can't disallow the user from writing HTML tags themselves - You have to specify all the tags and attributes that you want to allow, is there a list somewhere of all the tags and attributes that can be generated by markdown? - If you use a plugin that outputs an special tag with classes or something (for example to embed a YouTube video), you don't want to user to be able to be able to put arbitrary iframes in markdown, and now you have to write an special callable filter for bleach to allow this. - If you process something like <script></script>Hi! with markdown it outputs '<script></script>\n\n<p>Hi</p>', if you now process that with bleach allowing <p> and disallowing <script>, the script tags end up outside a paragraph.

    点赞 评论 复制链接分享