weixin_39644915
weixin_39644915
2021-01-07 08:03

LXC container does not start up after isolating user ids - Error: Common start logic: Failed to change ACLs on /var/snap/.../rootfs/var/log/journal

The issue seems to happen for older containers only (around mid May 2020). I'm not sure if it is worth the effort to track the issue...

Required information

  • Distribution: ubuntu-20.04
  • Distribution version: 20.04
  • The output of "lxc info"

config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIICEDCCAZagAwIBAgIRAPNxCVgOoRn7JjgOIk8IUfAwCgYIKoZIzj0EAwMwODEc
    MBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEYMBYGA1UEAwwPcm9vdEBoZXR6
    bmVyLWRlMB4XDTIwMDYzMDA3NTE0M1oXDTMwMDYyODA3NTE0M1owODEcMBoGA1UE
    ChMTbGludXhjb250YWluZXJzLm9yZzEYMBYGA1UEAwwPcm9vdEBoZXR6bmVyLWRl
    MHYwEAYHKoZIzj0CAQYFK4EEACIDYgAEysYaml3x1Sq5TmS3jSIijLAGLXsZNFzB
    DNqe5mOqBh2rxQB4OD/8vv4d5e9gTbvX6MCPuPLTw/FtIPSb9cHgnu75IiM8xx0T
    xZ1B+Jk4mJU1gRXsX0lkSKXIiEsVgez3o2QwYjAOBgNVHQ8BAf8EBAMCBaAwEwYD
    VR0lBAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAtBgNVHREEJjAkggpoZXR6
    bmVyLWRlhwR/AAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2gAMGUC
    MQCOALzKZuS+sq/wZnBjxgQkjXexg4psEhCPicTE15Hx9heVcOUnWNwXdrtNLvJR
    ee8CMBRjnulfo4k6JroYF8O/sDhGcGyazrbZyuzGlXsfYCamOkdAFtFHvCdVh71T
    lYI4Iw==
    -----END CERTIFICATE-----
  certificate_fingerprint: d2c6bedd778628ed236d7bafc63946dad0ac94d849be4b2c120885262312177d
  driver: lxc
  driver_version: 4.0.3
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.4.0-40-generic
  lxc_features:
    cgroup2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_notify: "true"
  os_name: Ubuntu
  os_version: "20.04"
  project: default
  server: lxd
  server_clustered: false
  server_name: hetzner-de
  server_pid: 2469
  server_version: "4.3"
  storage: btrfs
  storage_version: 4.15.1

/etc/subuid:


lxd:100000:1000000000
root:100000:1000000000

/etc/subgid:


lxd:100000:1000000000
root:100000:1000000000

Issue description

I'm setting up a regular ubuntu-2004 container and try to activate isolated user ids afterwards. Starting the container produces the error. The issue seems to happen for older containers only (around mid May 2020). I'm not sure if it is worth the effort to track the issue...

Steps not to reproduce

Via container property

  1. lxc launch images:ubuntu/focal ubuntu-2004
  2. lxc list ubuntu-2004 --format csv -> ubuntu-2004,RUNNING,10.2.110.23 (eth0),,CONTAINER,0
  3. lxc stop ubuntu-2004
  4. lxc config set ubuntu-2004 security.idmap.isolated true
  5. lxc start ubuntu-2004

Via default profile property

  1. lxc launch images:ubuntu/focal ubuntu-2004
  2. lxc list ubuntu-2004 --format csv -> ubuntu-2004,RUNNING,10.2.110.23 (eth0),,CONTAINER,0
  3. lxc stop ubuntu-2004
  4. lxc profile set default security.idmap.isolated true
  5. lxc start ubuntu-2004
  6. lxc profile unset default security.idmap.isolated

LXC container via export/import

The issue does not happen when I create a fresh container, export its image, delete everything via lxc ... delete ..., import the image and launch the container.

Steps to reproduce

  1. Import an older export of a 20.04 container image (around mid of May 2020)
  2. Start it up -> OK
  3. Stop it
  4. Activate the idmap isolation
  5. Start it

root-de ~ # lxc start ubuntu-2004
Error: Common start logic: Failed to change ACLs on /var/snap/lxd/common/lxd/storage-pools/default/containers/ubuntu-2004/rootfs/var/log/journal
Try `lxc info --show-log ubuntu-2004` for more info

root-de ~ # lxc info --show-log ubuntu-2004
Name: ubuntu-2004
Location: none
Remote: unix://
Architecture: x86_64
Created: 2020/07/18 07:18 UTC
Status: Stopped
Type: container
Profiles: hostonly

Log:

Information to attach

  • [ ] Any relevant kernel output (dmesg)
  • [ ] Container log (lxc info NAME --show-log)
  • [ ] Container configuration (lxc config show NAME --expanded)
  • [ ] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
  • [ ] Output of the client with --debug
  • [ ] Output of the daemon with --debug (alternatively output of lxc monitor while reproducing the issue)

该提问来源于开源项目:lxc/lxd

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

5条回答

  • weixin_39688875 weixin_39688875 4月前

    Yeah, I believe we fixed the source of the issue in the shifting logic though we can't safely fix existing containers.

    /var/log/journal is the usual culprit so I'd recommend you just delete it on all your old containers. It will get recreated automatically by journald on next container startup.

    点赞 评论 复制链接分享
  • weixin_39644915 weixin_39644915 4月前

    : Thanks a lot for taking the time looking at the issue.

    I do have lots of old containers which I don't want to recreate since they do contain months of data. Do you think that things will work out when I try to copy the ACLs of a new container onto an old one? Using something like:

    
    cd ...new-container.../var/log/journal
    getfacl -R .\
    | (cd ...old-container.../var/log/journal|setfacl --restore=-)
    

    Needs some fiddling with the user/group id, I know...

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    Maybe though that assumes that it doesn't save the broken entries that are causing the shifting issue.

    So you actually care about the accumulated binary journal data in those containers? I've personally rarely found much use for the data prior to the latest boot or past what's saved in the plaintext files in /var/log

    点赞 评论 复制链接分享
  • weixin_39644915 weixin_39644915 4月前

    : Seems I don't have a clear understanding of the issue. I don't care about the journal. Latest boot and plaintext logs are fine for me. I just want to avoid recreating all my "older" containers. Do I have any option to achieve this? Thx!

    点赞 评论 复制链接分享
  • weixin_39688875 weixin_39688875 4月前

    Just delete /var/log/journal in all of them before marking them isolated and restarting them.

    点赞 评论 复制链接分享