weixin_39849762
weixin_39849762
2020-12-09 14:34

Fix bleach domain parsing

First PR in order to find a workaround about #2453 ...

  • I had a little line in .gitignore to avoid committing some scripts in a /scripts folder ('cos I use some scripts to facilitate the reinstalls, the repo is here ) ... but I scratched that ;
  • upgrade Bleach package : from 3.1.0 to 3.1.5 ;
  • a little switch in markdown.py when cleaning links, so to include the mailto scheme and adapt it in the href attribute...
  • added a little setting to allow mailto links or not if desired ...
  • as it's a first PR on uData for me I let (for this time only) all my logs/prints so you could see my debug process yourselves... I added the ìn progress` tag to the PR, and we could erase all logs from the PR once you've reviewed the code. WDYT ?

该提问来源于开源项目:opendatateam/udata

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

12条回答

  • weixin_39849762 weixin_39849762 5月前

    I have a problem with this PR. After making a lot of things and then simplifying it, it seems to me that is does not fix what we want. The mailto has never been a problem and was already managed.

    A mail writen in the form [rte-inspire-infos-france.com](mailto:rte-inspire-infos-france.com) will be put in a <a> tag whitout href, which is what we want.

    The problem that led to the creation of the issue was that an email written in the form rte-inspire-infos-france.com will be put in rte-inspire-infos<a href="http://-france.com" rel="nofollow">-france.com</a>.

    This is clearly stated in the issue and after testing this PR does not fix this. It actually tries to fix something not broken

    So I think you might have gotten quite lost here.

    That's a bit true indeed :) cos I tried a bit to find a solution allowing a setting 'ALLOW_MAILTO', but I scratched that today and I missed (again) some errors...

    Also just to prove you the bleaching as it was was far from working ok just try the following, add a text like that in a dataset in the live datagouv website :

    
    test **BLEACH**
    
    link 01 : [rte-inspire-01-france.com](mailto:rte-inspire-infos-france.com)
    link 02 : rte-inspire-02-france.com
    link 03 : [rte-france-03.com](rte-france-03.com)
    link 03b : [rte-france-03b.com/route](rte-france-03b.com/route)
    link 04 : rte-france-04.com
    link 04b : rte-france-04b.com/route
    link 05 : www.rte-france-05.com
    link 05b : www.rte-france-05b.com/route
    link 06 : [www.rte-france-06.com](www.rte-france-06.com)
    link 06b : [www.rte-france-06b.com/route](www.rte-france-06b.com/route)
    link 07 : [local/route/rte-france-07](/local/route/rte-france-07)
    link 08 : [/ ... root-08](/)
    

    datagouv today =>

    Capture d’écran 2020-06-13 à 02 57 12

    This PR =>

    Capture d’écran 2020-06-13 à 02 56 03

    Now I think the original issue is fixed BUT there is perhaps another issue to open (?) : I noticed the text you add in the admin part made different results in the admin frontend and in the public frontend... that's a bit weird...

    You'll have those results quite different if you're either seeing the text on the admin part of the udata, or seeing it from the public part of udata :

    The admin with this PR (still problematic) =>

    Capture d’écran 2020-06-13 à 02 58 03

    I would consider this PR fixes the original issue (the problem on the public rendering of a text, neutralizing mailto: hrefs), but not on the admin part... For now I admit I don't understand why is there two separate ways to display the same text and why it is behaving like that, that's why I'd propose to open a distinct issue to cover that...

    点赞 评论 复制链接分享
  • weixin_39849762 weixin_39849762 5月前

    Are the test all passing locally?

    yep... last time I did the tests too, but only the tests related to markdown... The CI error I had at the time was this (unrelated to my changes, who knows...) =>

    
    self = <udata.tests.dataset.test_dataset_model.resourcemodeltest object at>
    
        def test_ignore_post_save_signal(self):
            resource = ResourceFactory()
            DatasetFactory(resources=[resource])
            unexpected_signals = Dataset.after_save, Dataset.on_update
    
            with assert_not_emit(*unexpected_signals), assert_emit(post_save):
                resource.title = 'New title'
    >           resource.save(signal_kwargs={'ignores': ['post_save']})
    
    udata/tests/dataset/test_dataset_model.py:270: 
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
    
    self = <resource: resource object>, args = ()
    kwargs = {'signal_kwargs': {'ignores': ['post_save']}}
    
        def save(self, *args, **kwargs):
    >       if not self.dataset:
    E       ReferenceError: weakly-referenced object no longer exists
    
    udata/core/dataset/models.py:359: ReferenceError
    </resource:></udata.tests.dataset.test_dataset_model.resourcemodeltest>

    To avoid bad surprises this time I ran all tests locally to be sure with pytest (took me a while), and all was ok, so only then I decided to push to fork => julien/bleach and watch the CI running to check where it could have been wrong

    But apparently everything is ok now (!?)

    点赞 评论 复制链接分享
  • weixin_39849762 weixin_39849762 5月前

    If that's ok w/ you - now that pytests are ok and that we agreed upon this mattern this morning - I could continue on my path, so :

    • [ ] write down the issues this PR is fixing, but were not documented ;
    • [ ] write some tests and/or make some adjustements to the existing tests relative to markdown.py ;
    点赞 评论 复制链接分享
  • weixin_39849762 weixin_39849762 5月前

    fix #2453 fix #2496 fix #2497

    点赞 评论 复制链接分享
  • weixin_39849762 weixin_39849762 5月前

    as it's a first PR on uData for me I let (for this time only) all my logs/prints so you could see my debug process yourselves... I added the ìn progress` tag to the PR, and we could erase all logs from the PR once you've reviewed the code. WDYT ?

    you can use draft PRs for that instead of in progress https://github.blog/2019-02-14-introducing-draft-pull-requests/

    right, I gonna use that feature now (I never had to before)

    点赞 评论 复制链接分享
  • weixin_39580124 weixin_39580124 5月前

    I have a problem with this PR. After making a lot of things and then simplifying it, it seems to me that is does not fix what we want. The mailto has never been a problem and was already managed.

    A mail writen in the form [rte-inspire-infos-france.com](mailto:rte-inspire-infos-france.com) will be put in a <a> tag whitout href, which is what we want.

    The problem that led to the creation of the issue was that an email written in the form rte-inspire-infos-france.com will be put in rte-inspire-infos<a href="http://-france.com" rel="nofollow">-france.com</a>.

    This is clearly stated in the issue and after testing this PR does not fix this. It actually tries to fix something not broken

    So I think you might have gotten quite lost here.

    点赞 评论 复制链接分享
  • weixin_39580124 weixin_39580124 5月前

    The bleach upgrade solve the problem by itself?

    点赞 评论 复制链接分享
  • weixin_39870664 weixin_39870664 5月前

    as it's a first PR on uData for me I let (for this time only) all my logs/prints so you could see my debug process yourselves... I added the ìn progress` tag to the PR, and we could erase all logs from the PR once you've reviewed the code. WDYT ?

    you can use draft PRs for that instead of in progress https://github.blog/2019-02-14-introducing-draft-pull-requests/

    点赞 评论 复制链接分享
  • weixin_39870664 weixin_39870664 5月前

    And congrats for your first bug fix ;-) 🙌 🐛

    点赞 评论 复制链接分享
  • weixin_39849762 weixin_39849762 5月前

    I tested locally with a dataset description like :

    
    test **BLEACH**
    
    link 01 : [rte-inspire-01-france.com](mailto:rte-inspire-infos-france.com)
    
    link 02 : rte-inspire-02-france.com
    
    link 03 : [rte-france-03.com](rte-france.com)
    
    link 04 : rte-france-04.com
    
    link 05 : www.rte-france-05.com
    

    seems to get the desired behaviour on mailto links

    点赞 评论 复制链接分享
  • weixin_39849762 weixin_39849762 5月前

    so this fix allows several things, and admin can choose in settings.py to allow mailto hyperlinks or not :

    • if the admin wants udata to allow mailto hrefs in texts, it translates the href into a common mailto:myemail.com

    • if the admin doesn't want any mailto link but wants those links to be only visible/neutralized , every mailto reference - either from plain text like myemail.com or from markdown text like [myemail.com](mailto:myemail.com) - will be translated in an empty html tag like : myemail.com

    点赞 评论 复制链接分享
  • weixin_39849762 weixin_39849762 5月前

    Can you add a test w/ the use case described in #2453?

    gonna do that :) gotta domesticate the test folder now...

    点赞 评论 复制链接分享

相关推荐