weixin_39985365
weixin_39985365
2020-12-09 03:48

Tidy up bleach sanitizing & add compatibity code for Bleach2

This PR has two commits:

Tidy up bleach sanitizing & markdown parsing

Previously, we sanitized markdown input from the user before we sent it through the markdown parser, removing all html markup. This commit changes this so we now santize the html output generated by python-markdown ( & the extra fedora-flavoured markdown stuff) rather than before. This sanitization now includes a whitelisted set of tags and attributes that are allowed, rather than blocking all tags. Sanitizing before was causing issues like #1656 where things were escaped weirdly.

This commit also changes the regex in ffmarkdown.py to not match with characters directly before them (e.g. ryan) as it was matching email addresses.

Finally, this commit makes the markdown support to use bleach's linkify feature, rather than the link-checking regex that was in ffmarkdown.py. This fixes issues like #1721

Fixes #1656 Fixes #1721

Add compatibility code for Bleach2

The Bleach API differs slightly from bleach1 to bleach2. This commit handles these differences

Fixes #1718

该提问来源于开源项目:fedora-infra/bodhi

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

6条回答

  • weixin_39985365 weixin_39985365 5月前

    Thanks for all your work on this !

    I amended the second commit with your patch, and attributed you in the commit message. However, feel free to take ownership of that patch if you want to, as most of the work was in reality done by you :)

    Feel free to merge away with whatever one you choose if you are happy with this!

    点赞 评论 复制链接分享
  • weixin_39525007 weixin_39525007 5月前

    Here's a patch that writes release notes, adjusts the requirements.txt, and gets the tests to mock bleach versions so we can cover all the code.

    Please split the stylistic changes to the quotes (' vs.") out into their own commit, and if you are satisfied with my patch feel free to merge the result!

    
    diff --git a/bodhi/server/util.py b/bodhi/server/util.py
    index 3f0fbdf..cc86afc 100644
    --- a/bodhi/server/util.py
    +++ b/bodhi/server/util.py
    @@ -296,23 +296,17 @@ def markup(context, text):
             basestring: HTML representation of the markdown text
         """
    
    -    # determine the bleach version installed.
    -    # this is the same approach that Pagure uses to determine the bleach version
    +    # determine the major component of the bleach version installed.
    +    # this is similar to the approach that Pagure uses to determine the bleach version
         # https://pagure.io/pagure/pull-request/2269#request_diff
    -    bleach_v = bleach.__version__.split('.')
    -    for idx, val in enumerate(bleach_v):
    -        try:
    -            val = int(val)
    -        except ValueError:  # pragma: no cover
    -            pass
    -        bleach_v[idx] = val
    +    bleach_major_v = int(bleach.__version__.split('.')[0])
    
         # the only difference in the bleach API that we use between v1 and v2 is
         # the formatting of the attributes parameter. Bleach 1 only allowed you
         # to specify attributes to be whitelisted for all whitelisted tags.
         # Bleach 2 requires you to specify the list of attributes whitelisted for
         # specific tags.
    -    if tuple(bleach_v) >= (2, 0, 0):  # pragma: no cover
    +    if bleach_major_v >= 2:
             markdown_attrs = {
                 "img": ["src", "alt", "title"],
                 "a": ["href", "alt", "title"],
    diff --git a/bodhi/tests/server/test_utils.py b/bodhi/tests/server/test_utils.py
    index 1cc4cc9..c1a500f 100644
    --- a/bodhi/tests/server/test_utils.py
    +++ b/bodhi/tests/server/test_utils.py
    @@ -357,7 +357,7 @@ class TestUtils(unittest.TestCase):
             for element in result:
                 assert isinstance(element, unicode)
    
    -    def test_markup(self):
    +    def test_markup_escapes(self):
             """Ensure we correctly parse markdown & escape HTML"""
             text = (
                 '# this is a header\n'
    @@ -371,6 +371,52 @@ class TestUtils(unittest.TestCase):
                 '<script>alert("pants")</script>
    ' ), html + .patch('bodhi.server.util.bleach.clean') + .patch.object(util.bleach, '__version__', u'1.4.3') + def test_markup_with_bleach_1(self, clean): + """Use mocking to ensure we correctly use the bleach 1 API.""" + text = ( + '# this is a header\n' + 'this is some **text**\n' + '
    点赞 评论 复制链接分享
  • weixin_39525007 weixin_39525007 5月前

    Ugh, so my patch above still doesn't quite work on Rawhide - for some reason the input to

    bleach.clean()
    is different on Rawhide than it is on F25:
    
    ====================================================== FAILURES ====================================================== 
    ________________________________________ TestUtils.test_markup_with_bleach_1 _________________________________________ 
    
    self = <bodhi.tests.server.test_utils.testutils testmethod="test_markup_with_bleach_1">                                  
    clean = <magicmock name="clean" id="140222777514576">      
    
        .patch('bodhi.server.util.bleach.clean')          
        .patch.object(util.bleach, '__version__', u'1.4.3')                                                           
        def test_markup_with_bleach_1(self, clean):            
            """Use mocking to ensure we correctly use the bleach 1 API."""                                                 
            text = (                                           
                '# this is a header\n'                         
                'this is some **text**\n'                      
                '<script>alert("pants")</script>'              
            )                                                  
    
            util.markup(None, text)                            
    
            expected_text = (                                  
                u'<div class="markdown"><h1>this is a header</h1>\n<p>this is some <strong>text'                           
                u'</strong>\n<script>alert("pants")</script></p></div>')                                       
            expected_tags = [                                  
                "h1", "h2", "h3", "h4", "h5", "h6", "b", "i", "strong", "em", "tt", "p", "br", "span",                     
                "div", "blockquote", "code", "hr", "pre", "ul", "ol", "li", "dd", "dt", "img", "a"]                        
            # The bleach 1 API shoudl get these attrs passed.  
            clean.assert_called_once_with(expected_text, tags=expected_tags,                                               
    >                                     attributes=["src", "href", "alt", "title", "class"])                             
    
    bodhi/tests/server/test_utils.py:394:                      
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _  
    /usr/lib/python2.7/site-packages/mock/mock.py:948: in assert_called_once_with                                          
        return self.assert_called_with(*args, **kwargs)        
    /usr/lib/python2.7/site-packages/mock/mock.py:937: in assert_called_with
        six.raise_from(AssertionError(_error_message(cause)), cause)
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    
    value = AssertionError('Expected call: clean(u\'<div class="markdown"><h1>this is a he...', \'hr\', \'pre\', \'ul\', \'ol\', \'li\', \'dd\', \'dt\', \'img\', \'a\'])',)
    from_value = None
    
        def raise_from(value, from_value):
    >       raise value
    E       AssertionError: Expected call: clean(u'<div class="markdown"><h1>this is a header</h1>\n<p>this is some <strong>text</strong>\n<script>alert("pants")</script></p></div>', attributes=['src', 'href', 'alt', 'title', 'cla
    ss'], tags=['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'b', 'i', 'strong', 'em', 'tt', 'p', 'br', 'span', 'div', 'blockquote', 'code', 'hr', 'pre', 'ul', 'ol', 'li', 'dd', 'dt', 'img', 'a'])
    E       Actual call: clean(u'<div class="markdown"><h1>this is a header</h1>\n<p>this is some <strong>text</strong>\n<script>alert("pants")</script></p></div>', attributes=['src', 'href', 'alt', 'title', 'class'], tags=['h1', 'h2', 'h3', 
    'h4', 'h5', 'h6', 'b', 'i', 'strong', 'em', 'tt', 'p', 'br', 'span', 'div', 'blockquote', 'code', 'hr', 'pre', 'ul', 'ol', 'li', 'dd', 'dt', 'img', 'a'])
    
    /usr/lib/python2.7/site-packages/six.py:718: AssertionError
    
    
    </h1><p>Since the test isn't there to assert that the input is one way or the other, I will make a new patch that just lets the text be whatever it wants to be. I'm really just trying to assert that the right attributes are used.</p></div></magicmock></bodhi.tests.server.test_utils.testutils>
    点赞 评论 复制链接分享
  • weixin_39525007 weixin_39525007 5月前

    ok ok ok ok ok. This patch allows Bodhi to build on EPEL 7, Fedora 25, and Rawhide:

    
    diff --git a/bodhi/server/util.py b/bodhi/server/util.py
    index 3f0fbdf..cc86afc 100644
    --- a/bodhi/server/util.py
    +++ b/bodhi/server/util.py
    @@ -296,23 +296,17 @@ def markup(context, text):
             basestring: HTML representation of the markdown text
         """
    
    -    # determine the bleach version installed.
    -    # this is the same approach that Pagure uses to determine the bleach version
    +    # determine the major component of the bleach version installed.
    +    # this is similar to the approach that Pagure uses to determine the bleach version
         # https://pagure.io/pagure/pull-request/2269#request_diff
    -    bleach_v = bleach.__version__.split('.')
    -    for idx, val in enumerate(bleach_v):
    -        try:
    -            val = int(val)
    -        except ValueError:  # pragma: no cover
    -            pass
    -        bleach_v[idx] = val
    +    bleach_major_v = int(bleach.__version__.split('.')[0])
    
         # the only difference in the bleach API that we use between v1 and v2 is
         # the formatting of the attributes parameter. Bleach 1 only allowed you
         # to specify attributes to be whitelisted for all whitelisted tags.
         # Bleach 2 requires you to specify the list of attributes whitelisted for
         # specific tags.
    -    if tuple(bleach_v) >= (2, 0, 0):  # pragma: no cover
    +    if bleach_major_v >= 2:
             markdown_attrs = {
                 "img": ["src", "alt", "title"],
                 "a": ["href", "alt", "title"],
    diff --git a/bodhi/tests/server/test_utils.py b/bodhi/tests/server/test_utils.py
    index 1cc4cc9..618cc3a 100644
    --- a/bodhi/tests/server/test_utils.py
    +++ b/bodhi/tests/server/test_utils.py
    @@ -357,7 +357,7 @@ class TestUtils(unittest.TestCase):
             for element in result:
                 assert isinstance(element, unicode)
    
    -    def test_markup(self):
    +    def test_markup_escapes(self):
             """Ensure we correctly parse markdown & escape HTML"""
             text = (
                 '# this is a header\n'
    @@ -371,6 +371,46 @@ class TestUtils(unittest.TestCase):
                 '<script>alert("pants")</script>
    ' ), html + .patch('bodhi.server.util.bleach.clean', return_value='cleaned text') + .patch.object(util.bleach, '__version__', u'1.4.3') + def test_markup_with_bleach_1(self, clean): + """Use mocking to ensure we correctly use the bleach 1 API.""" + text = '# this is a header\nthis is some **text**' + + result = util.markup(None, text) + + self.assertEqual(result, 'cleaned text') + expected_text = ( + u'
    点赞 评论 复制链接分享
  • weixin_39525007 weixin_39525007 5月前

    OK, the above patch is deployed on top of your patches to staging, and I made an example comment there:

    https://bodhi.stg.fedoraproject.org/updates/FEDORA-2017-b0a6f37173

    点赞 评论 复制链接分享
  • weixin_39525007 weixin_39525007 5月前

    On Mon, 2017-07-31 at 18:18 -0700, Ryan Lerch wrote:

    The changing of the way we clean the output meant that the quotes around the class="markdown" changed in the output generated by bleach. which made the tests fail. Do you want me to just change those lines, and do the other quotes style changes on the other lines in a seperate commit

    Oh I didn't realize it was related to test failure. In that case, let's keep the changes. Sorry for the noise!

    点赞 评论 复制链接分享

相关推荐