weixin_39755890
weixin_39755890
2021-01-07 08:52

add IPVLAN support

add IPVLAN support for network configuration, exposes IPVLAN configuration to container through devLXD

该提问来源于开源项目:lxc/lxd

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

45条回答

  • weixin_39901685 weixin_39901685 4月前

    What is incomplete? lied to me, there is a someone who is interested implementing a feature? It would be best just to say, 'Look, we have to maintain this, we are not interested'?

    点赞 评论 复制链接分享
  • weixin_39901685 weixin_39901685 4月前

    I think, they either don't like your work or they are not interested. Just close the PR because I see other PR get Merged and this stay stale. You might as well close it and take you talent elsewhere

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    It's still on my mind but I keep having other work to do which takes precedence and this is the kind of thing where I need a solid day to think about it and make it suitable for upstreaming.

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    I do think that having ipvlan will be useful but we need to do it right, from what I remember, I've been meaning to look into: - Dropping the devlxd part of this change entirely, instead having liblxc do the setup on startup so the container is functional when it starts. And also saves us from exposing container configuration through devlxd which we've been avoiding doing for security reasons. - For live add/remove/change, we'll need a new forknet subcommand which modifies the configuration in the running container. - Allow configuring the operating mode (bridge vs private), we seem to be getting more requests for private networking lately.

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    has some interest around this kind of networking too and will be joining the LXD team next week

    点赞 评论 复制链接分享
  • weixin_39901685 weixin_39901685 4月前

    I'm happy

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    I do not see how devlxd part is a security issue, with macvlan setup container has full control over network setup

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    We try to stay away from exposing container configuration through devlxd. That's why the /config endpoint only shows you the user keys and not all the other config keys so I'd like to avoid exposing devices information too, especially when it's not actually needed.

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    I am not entire sure that network configuration is not needed inside container.

    Without devlxd we would have to define list of IPs in 2 places (super inconvenient): - LXD container config - netplan inside container

    With this information present in devlxd, netplan configuration could be generated on demand inside CT with simple daemon process.

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    why does netplan need this information?

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    For IPVLAN (no DHCP) in L3s mode to work, routing and proxy arp needs to be configured on hypervisor node. So list of IPs are allowed to be used for one specific CT, other IPs would not work, no routing is defined for them.

    So defining a list of configured IPs for specific CT inside container and exposing this list for OS inside container helps to unify network configuration, makes single point of authority, LXD container config.

    There is no way that DHCP will work in this mode, there is no L2, so no MAC Spoofing.

    L3 IPs are routed to defined container interface, so no IP Spoofing.

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    I don't think you understood what I said.

    I'm saying that we will have liblxc pre-configure the network namespace of the container, so when the container starts, it already has its IP addresses set on the nic and already has the needed routes setup, so there is nothing for netplan to do in the container.

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    in my understanding at some point of development you were against such configurations. Then yes, devlxd is not needed. Thanks for clarifying this.

    点赞 评论 复制链接分享
  • weixin_39982269 weixin_39982269 4月前

    Started work on adding this to the underlying LXC here https://github.com/lxc/lxc/pull/2950

    点赞 评论 复制链接分享
  • weixin_39982269 weixin_39982269 4月前

    PRs for adding layer2 proxy and static routes for IPVLAN:

    https://github.com/lxc/lxc/pull/2964 https://github.com/lxc/lxc/pull/2968

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    Closing in favor of #5716

    点赞 评论 复制链接分享
  • weixin_39903176 weixin_39903176 4月前

    Hey, thanks for the patch. :) But you need to sign-off your commits. Should be as simple as git commit -s. Otherwise we can't review it. :)

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    done, thanks

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    ready for review

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    Can you rebase your branch so that the different commits make more sense? Starting with common infrastructure and then finishing with the new feature, that should make it easier to review this change.

    I see a lot of things that I'd expect be handled differently in here but it's spread in so many commits that then get modified by a later commit that it feels pointless for me to comment at this point so hopefully we can get things split better so we can review the different pieces on their own, possibly landing some early as we go before we get to ipvlan itself.

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    done

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    any news on this? maybe we can cherry-pick some commits to master, so development and rebasing will go faster?

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    Ready for review

    点赞 评论 复制链接分享
  • weixin_39982269 weixin_39982269 4月前

    This is very cool, and similar to something I've been working on in https://github.com/tomponline/ctctl/blob/master/ctctl-netup/ctctl-netup.go to emulate OpenVZ's venet style networking using routing and proxy arp/ndp on the hardware node. Great to see official support for this. :)

    点赞 评论 复制链接分享
  • weixin_39982269 weixin_39982269 4月前

    do you think its worth adding some form of ARP/NDP advert when an IP comes online to allow container migrations? E.g. https://github.com/tomponline/ctctl/blob/master/ctctl-netup/ctctl-netup.go#L204-L214

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    not official support yet, hoping this we be merged sooner than later :)

    I do actually have some ARP/NDP code for tests.

    GARP is working great, have some troubles with IPv6 counterpart.

    Was planning this in another PR, would be nice if you will join this effort after IPVLAN is merged :)

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    any news on code changes? would be nice to have some comments before weekend

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    Unlikely to have comments before the weekend as I'm flying back from a conference today.

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    bump

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    any news on this?

    点赞 评论 复制链接分享
  • weixin_39903176 weixin_39903176 4月前

    Stéphane already has a plan. It's just that we're currently flooded with tasks. Hell get to you soon. sorry for the delay :)

    On December 19, 2018 7:32:18 PM GMT+01:00, s3rj1k wrote:

    any news o this?

    -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/lxc/lxd/pull/5182#issuecomment-448698437

    -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

    点赞 评论 复制链接分享
  • weixin_39690391 weixin_39690391 4月前

    Hi, I am also interested in this functionality, is there any ETA for this?

    点赞 评论 复制链接分享
  • weixin_39607620 weixin_39607620 4月前

    IPVLAN support in LXD would also be quite helpful for me, however I would want it to operate in L2 bridge mode. Could that be made configurable? L2 mode should also avoid a lot of the complexity with proxy ARP etc.

    Btw, the docs don't mention that the host_name attribute is supported by ipvlan, even though it seems to be.

    点赞 评论 复制链接分享
  • weixin_39982269 weixin_39982269 4月前

    I was exploring IPVLAN functionality the other day and was trying a basic network namespace test on my home server/router. Communication with the local network devices using proxy ARP, local routes (not shown in example below as device is default gateway for all hosts on network) and IPVLAN L3S mode worked great as in your implementation.

    However, this device has 2 NICs, one is connected to the local LAN, and one is connected to the internet via a PPP connection.

    What I noticed was that packets were always sent outbound via the parent NIC defined for the l3s IPVLAN slave, even if the host's routing table had the default gateway pointing out of the PPP interface.

    This would result in the host sending ARP packets out to the local network for external IPs, e.g. 8.8.8.8 in the example below.

    If I modified the host's routing table and made the default gateway another host on the network, but via the local network nic, then the host's routing table was consulted and the ARP resolution was done for the gateway IP rather than the destination IP.

    My setup:

    Routing table:

    
    192.168.1.0/24 dev eth1
    default via ppp0
    

    Setup:

    
    sysctl -w net.ipv4.ip_forward=1
    
    ip netns add ns1
    ip link add name ipv1 link eth0 type ipvlan mode l3s
    ip link set dev ipv1 netns ns1
    ip netns exec ns1 ip addr add 192.168.1.100/32 dev ipv1
    ip netns exec ns1 ip link set ipv1 up
    ip netns exec ns1 ip link set lo up
    ip netns exec ns1 ip -4 r add default dev ipv1
    
    ip netns exec ns1 ping 8.8.8.8
    

    This would result in ARP requests for 8.8.8.8 going out of eth1.

    If I changed the routing table to:

    
    192.168.1.0/24 dev eth1
    default via 192.168.1.254 dev eth1
    

    Then for the same experiment, ARP requests would be sent for 192.168.1.254.

    It seems that IPVLAN is only suitable for single-honed devices, and not devices with multiple devices/routes. This means I have to either use a bridge or something like this https://github.com/tomponline/lxc-routednet

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    L2 is pretty simple to implement but it does not give you the ability to use netfilter hooks like L3s mode.

    L2 mode is similar to macvlan.

    We could probably do L2/L3s mode switch but all depends on what will decide.

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    I think you need to do policy routing for your setup to work

    点赞 评论 复制链接分享
  • weixin_39607620 weixin_39607620 4月前

    the primary reason why I need L2 ipvlan is that I have to do this in environments where external switches/routers (not under my control) have issues with multiple MAC addresses on the same port for performance and/or security reasons. L3(s) mode, as far as I understand, does not allow for multiple containers in the same subnet, which is a requirement for my application. I'm also aware that I cannot use netfilter directly to manipulate the packets going in and out of the containers using L2 mode. However IIUC it should be able to behave like a regular router, with the container host just being the gateway for the containers for IPv4 traffic that needs to be NATed.

    I've tried to adapt the patches from this pull request to use L2 mode instead but didn't have success so far. Will do some more experiments later today.

    点赞 评论 复制链接分享
  • weixin_39982269 weixin_39982269 4月前

    L3(s) mode, as far as I understand, does not allow for multiple containers in the same subnet, which is a requirement for my application

    that is not my understanding, I've been able to create multiple namespaces with different IPs in the same subnet and it works fine.

    点赞 评论 复制链接分享
  • weixin_39982269 weixin_39982269 4月前

    I think you need to do policy routing for your setup to work

    I was expecting it to consult the host's main routing table, like packets do when using a veth pair between host and container. But with IPVLAN the packets seem to bypass the host's routing table and always go directly out of the master interface.

    Because there is no 'endpoint' in the host's namespace, there is nothing to put a policy on AFAIK, as the first time packets 'appear' on the host is as they emerge from the master interface out onto the LAN.

    I've done some more digging, and think that the core issue here is that the skb->dev property is set to the master device when it crosses the namespace:

    https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/drivers/net/ipvlan/ipvlan_core.c#L589

    https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/drivers/net/ipvlan/ipvlan_core.c#L313

    Which in turn is used as the output interface when constructing the routing flow:

    https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/drivers/net/ipvlan/ipvlan_core.c#L428

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    works perfectly fine with multiple CTs in same big subnet, you only need to add IPv4/32 per CT inside CT.

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    In ipvlan/macvlan packet is always forwarded to parent (upper) interface, this was done intentionally be kernel devs to implement super-simple bridging without FDB and SPF.

    My best guess would be that you probably should use veth pairs and policy routing directly with CT.

    Second option would be use of veth pair as a parent to ipvlan/macvlan interface so your custom routing would be on upper veth pair and CT will pseudo-bridge on lower veth pair using ipvlan/macvlan.

    Second option would look a bit ugly but should work fine, done similar things with QEMU.

    点赞 评论 复制链接分享
  • weixin_39607620 weixin_39607620 4月前

    works perfectly fine with multiple CTs in same big subnet, you only need to add IPv4/32 per CT inside CT.

    Turns out the problem was really the containers not being able to communicate with the host with ipvlan, with both L2 and L3 modes, intentionally. So I'll probably have to resort to a local bridge + routing from/to the physical interface.

    点赞 评论 复制链接分享
  • weixin_39982269 weixin_39982269 4月前

    In ipvlan/macvlan packet is always forwarded to parent (upper) interface, this was done intentionally be kernel devs to implement super-simple bridging without FDB and SPF.

    My best guess would be that you probably should use veth pairs and policy routing directly with CT.

    Second option would be use of veth pair as a parent to ipvlan/macvlan interface so your custom routing would be on upper veth pair and CT will pseudo-bridge on lower veth pair using ipvlan/macvlan.

    Second option would look a bit ugly but should work fine, done similar things with QEMU.

    Thanks , I had come to same conclusion, but good to have 2nd opinion :)

    I was thinking about using the veth pair with IPVLAN, as you say, not the cleanest of approaches, but it would be interesting to see whether that actually performed any better than using veth pairs directly for each container, as there would still be some MAC emulation involved and would go through the routing table and iptables twice (which is good if like me you need it, but bad if you dont).

    The other thing I noticed about that L3S IPVLAN mode is that iptables on the host works for packets coming to/from containers from/to external hosts, but that the host's iptables rules are not apparently checked when packets are moving between containers. Presumably another purposeful performance decision for IPVLAN.

    点赞 评论 复制链接分享
  • weixin_39755890 weixin_39755890 4月前

    Iptables would not work between containers because packet is forwarded inside 'dumb' bridge before any netfilter hook can act. Never tested this, good thing that you confirmed that :)

    点赞 评论 复制链接分享
  • weixin_39530839 weixin_39530839 4月前

    This pull request didn't trigger Jenkins as its author isn't in the whitelist.

    An organization member must perform one of the following:

    • To have this branch tested by Jenkins, use the "ok to test" command.
    • To have a one time test done, use the "test this please" command.

    Those commands are simple Github comments of the format: "jenkins: COMMAND"

    点赞 评论 复制链接分享

相关推荐