weixin_39590868
weixin_39590868
2020-12-02 18:31

Update huge pages KEP for container isolation of huge pages

Propose new enhancement for hugepage.

  • add example of CRI update
  • Support container isolation of huge pages

More information is available in below issue. https://github.com/kubernetes/kubernetes/issues/80716

该提问来源于开源项目:kubernetes/enhancements

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

17条回答

  • weixin_39878401 weixin_39878401 5月前

    Welcome -chun!

    It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

    You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

    You can also check if kubernetes/enhancements has its own contribution guidelines.

    You may want to refer to our testing guide if you run into trouble with your tests not passing.

    If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

    Thank you, and welcome to Kubernetes. :smiley:

    点赞 评论 复制链接分享
  • weixin_39878401 weixin_39878401 5月前

    Hi -chun. Thanks for your PR.

    I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

    Once the patch is verified, the new status will be reflected by the ok-to-test label.

    I understand the commands that are listed here.

    Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
    点赞 评论 复制链接分享
  • weixin_39590868 weixin_39590868 5月前

    /ok-to-test

    点赞 评论 复制链接分享
  • weixin_39878401 weixin_39878401 5月前

    -chun: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

    In response to [this](https://github.com/kubernetes/enhancements/pull/1199#issuecomment-527106675): >/ok-to-test Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
    点赞 评论 复制链接分享
  • weixin_39590868 weixin_39590868 5月前

    May I ask you to trigger testing?

    点赞 评论 复制链接分享
  • weixin_39586235 weixin_39586235 5月前

    /ok-to-test

    点赞 评论 复制链接分享
  • weixin_39590868 weixin_39590868 5月前

    Oh... it seems that CI does not allow updating the table of contents of existing KEP.

    点赞 评论 复制链接分享
  • weixin_39590868 weixin_39590868 5月前

    Below PR updates cAdvisor to discover and store pre-allocated huge pages per NUMA node as mentioned in KEP updates. https://github.com/google/cadvisor/pull/2304

    点赞 评论 复制链接分享
  • weixin_39590868 weixin_39590868 5月前

    It seems that to set a limit of hugepages over CRI message, LinuxContainerResources should be updated and the function WithResources in containerd should also be updated.

    But, I'm not sure whether the additional update of containerd is required or not. The additional update means updating the other part of containerd like a containerd-shim orrunc.

    This week, I will check it then I will leave the result of this work.

    点赞 评论 复制链接分享
  • weixin_40000131 weixin_40000131 5月前

    /cc

    点赞 评论 复制链接分享
  • weixin_39590868 weixin_39590868 5月前

    I am trying to list down the required changes of container runtimes to support huge pages seamlessly when the CRI message LinuxContainerResources is extended to support huge pages.

    Should I add the below content to the KEP update?

    In the case of Docker as CRI runtime, dockershim and dockerd need to change.

    
    The path of kubelet to kernel(cgroup)
    [Kubelet----<cri message>----dockershim]----[dockerd]----[containerd]----[containerd-shim]----[runc]----kernel
    </cri>

    In dockershim which is the part of kubelet, the function UpdateContainerResources need to update to take the size and value of hugepages from extended CRI message.

    The type Resources also need to update to have hugepages field, but it seems that both of kubernetes and docker(moby?) use old/vendored version; github.com/docker/docker/api/types/container.

    In dockerd, it seems the function toContainerdResources need to update to handle hugepages field.

    In the case of containerd as CRI runtime, only cri-plugin need to change.

    
    The path of kubelet to kernel(cgroup)
    Kubelet----<cri message>----[cri-plugin/containerd]----[containerd-shim]----[runc]----kernel
    </cri>

    It seems that containerd requires minimal changes to support the extended CRI message. Only one function WithResources for cri-plugin need to update to handle hugepage filed. I tested with a custom version of cri-plugin to check whether hugepage limit in contianer level cgroup is updated or not, and it works well.

    In the case of CRI-O as CRI runtime,

    I didn't dig the code of CRI-O yet. But my team has a use case for katacontainer with CRI-O, so i will dig it in some weeks...

    
    The path of kubelet to kernel(cgroup)
    Kubelet----<cri message>----[CRI-O]-----[runc]-----kernel
    </cri>
    点赞 评论 复制链接分享
  • weixin_39996496 weixin_39996496 5月前

    /assign

    点赞 评论 复制链接分享
  • weixin_39979617 weixin_39979617 5月前

    For me it looks ok the overall idea of the isolation and extending CRI protocol, just maybe restructure text a bit, similar to what proposed.

    One thing that potentially might be good to clarify - what would happen to hugetlbfs volume mount? will it be still initialised with aggregated size of all containers?

    点赞 评论 复制链接分享
  • weixin_39590868 weixin_39590868 5月前
    
    One thing that potentially might be good to clarify - what would happen to hugetlbfs volume mount? will it be still initialised with aggregated size of all containers?
    

    => , I mention about it on kep update, see below.

    
    The following is an example of the pod that requests multiple sizes of
    huge pages for multiple containers. It requests 1Gi huge pages of size 1Gi and
    2Mi for the container1 and 1Gi huge pages of size 2Mi for the container2 with
    emptyDir backing. Note that `hugetlbfs` offers `size` mount option to specify
    the maximum amount of memory for the mount, but huge pages medium does not use
    the option to set limits so that the huge pages usage of containers will be
    controlled by container cgroup sandboxes individually:
    
    点赞 评论 复制链接分享
  • weixin_39590868 weixin_39590868 5月前

    , , ,
    Guys, I updated KEP to focus on only container isolation and spread the content to whole document. May i ask you guys to review it again?

    点赞 评论 复制链接分享
  • weixin_39878401 weixin_39878401 5月前

    [APPROVALNOTIFIER] This PR is APPROVED

    This pull-request has been approved by: bg-chun, derekwaynecarr

    The full list of commands accepted by this bot can be found here.

    The pull request process is described here

    Needs approval from an approver in each of these files: - ~~[keps/sig-node/OWNERS](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/OWNERS)~~ [derekwaynecarr] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
    点赞 评论 复制链接分享
  • weixin_39996496 weixin_39996496 5月前

    -chun for hugetlbfs, the writing container will be charged for its usage (similar to memory). assuming more than one container wants to write to hugetlbfs, requests alone should be sufficient, as limits should have been limiting the local container.

    点赞 评论 复制链接分享

相关推荐