weixin_39550587
weixin_39550587
2021-01-01 16:37

NAS-108666 / 21.02 / prevent netcli from hanging on sockstat calls by redoing get_ui_urls

This fixes a few issues.

  • sockstat can block on FreeBSD getpwuid() calls if joined to a directory service and the connection to said service is not "healthy". I've since patched sockstat to actually ignore trying to resolve numeric UIDs to user names here: freenas/os repo which was pushed upstream here: upstream commit
  • However, I've removed the call to sockstat since we can get the same information by running psutil.net_connections()
  • Furthermore, (based on a previous discussion in this PR) there is absolutely no reason to run requests.head() for each IP address that returned from the psutil.net_connections() and/or interface.ip_in_use. There is only 1 consumer of this method (netcli) and all this function does is return a list of formatted URLs to be displayed on the console.
  • Running requests.head() doesn't scale either since we have customer systems with 50+ IP addresses on their system so this could potentially do a HEAD request for each http IP and https IP with timeouts for each set at 10 and 15 seconds respectively. This is unacceptable. Furthermore, trying to run all these requests in parallel would solve 1 problem but would introduce another one altogether. I.E. what happens if the total time given to run all the HEAD requests expires? Do we take only the completed parallel tasks or do we not include any of them? What if a system has 50+ IPs on their system? Do we increase the total time for all parallel tasks 3*total_ips? Do we have to increase the total timeout if the system has above a certain threshold of IPs? 100? 200? An unnecessary problem to give ourselves when this is simply returning formatted URLs to the console.
  • before changes, the method didn't account for HA systems (VIPs are the only functioning IP that can be used when accessing the webUI)

To summarize: - remove sockstat call - no longer perform any type of HTTP HEAD request on the IPs dramatically speeding up the time it takes for this method to run - make it work with HA VIPs

该提问来源于开源项目:freenas/freenas

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

9条回答

  • weixin_39880621 weixin_39880621 4月前

    I don't see any DNS lookups performed in https://github.com/freebsd/freebsd/blob/master/usr.bin/sockstat/sockstat.c but I feel it might be worth investigating and fixing, because the proposed method of determining the listening ports seems a little bit crazy for me :) Or am I missing something?

    点赞 评论 复制链接分享
  • weixin_39550587 weixin_39550587 4月前

    What specifically is crazy?

    All I'm doing is: 1. interface.query and get VIPs (if HA system) other wise aliases 2. pass that to a function that runs interface.ip_in_use 3. generate a list of addresses that are configured and show up in interface.ip_in_use 4. then I simply try to make a (http/https) HEAD request to each IP based on what the end-user has configured in the webUI

    点赞 评论 复制链接分享
  • weixin_39880621 weixin_39880621 4月前

    retrieving listening ports should be as simple as reading some kernel data structures, I don't like step 4 especially, you shouldn't need to connect to local port to check if we are listening on it, it's like:

    
     - How do you determine if a database exists?
     - if Query.Exec('drop database "name"') then showmsg('database existed').
    

    If you want to stick to that approach for any other reason, let's make this for loop parallel (so in worst case it hangs for 5 seconds instead of 5*N seconds)

    点赞 评论 复制链接分享
  • weixin_39550587 weixin_39550587 4月前

    Before my change, we were serializing request.head() requests after running sockstat output and filtering for what IP's nginx is listening on. We were doing that for each detected IPv4 and IPv6 addresses in separate loops and specifying different timeouts whether or not it was http or https.

    点赞 评论 复制链接分享
  • weixin_39878247 weixin_39878247 4月前

    I do see a call to getpwuid() in sockstat output. I have seen some pathological cases with directory services where this can hang (often because DNS). It would be nice if there was a switch for sockstat to not try this (just display numeric ids).

    点赞 评论 复制链接分享
  • weixin_39550587 weixin_39550587 4月前

    I do see a call to getpwuid() in sockstat output. I have seen some pathological cases with directory services where this can hang (often because DNS). It would be nice if there was a switch for sockstat to not try this (just display numeric ids).

    Yeah, I should clarify a little more. I don't see a flag to disable this behavior (unless you see one)

    点赞 评论 复制链接分享
  • weixin_39880621 weixin_39880621 4月前

    I don't see a flag to disable this behavior (unless you see one)

    That should be pretty simple to patch

    Before my change, we were serializing request.head()

    Sorry, didn't see it. My proposal to make these requests parallel is still valid then :) Also why don't you remove __get_urls method if no one is using it anymore?

    点赞 评论 复制链接分享
  • weixin_39550587 weixin_39550587 4月前

    That should be pretty simple to patch

    So instead of simply working around the issue, you'd rather patch an in-base utility that will almost never get up-streamed further diverging our freeBSD OS port? I'm not sure which method seems more crazy 😄

    Sorry, didn't see it. My proposal to make these requests parallel is still valid then :) Also why don't you remove __get_urls method if no one is using it anymore?

    Yep, seems I've forgotten to remove the unused __get_urls method. I agree, and will investigate parrallelizing the requests method.

    点赞 评论 复制链接分享
  • weixin_39880621 weixin_39880621 4月前

    you'd rather patch an in-base utility that will almost never get up-streamed further diverging our freeBSD OS port?

    If we are doing HTTP requests anyway, I am alright with not patching sockstat if we are not using it :)

    点赞 评论 复制链接分享

相关推荐