2021-01-09 07:58

`buildah from` needs a lock on naming the new container


I'm building images driven by a build tool and when it is run in parallel (e.g., -j2), I get errors when building images due to conflicting names.

Steps to reproduce the issue: 1. buildah from scratch & buildah from scratch (may need to be run a few times)

Describe the results you received:

Grabbing a name for the new container is racy.

error creating container: the container name "working-container" is already in use by "b1bd69760d74e629af9d7494a0c2084eb8493243d06a908bcce638a1830c485d". You have to remove that container to be able to reuse that name.: that name is already in use

Describe the results you expected:

I should get two containers named working-container and working-container-2.

Output of rpm -q buildah or apt list buildah:


Output of buildah version:

Version:         1.3
Go Version:      go1.10.3
Image Spec:      1.0.0
Runtime Spec:    1.0.0
CNI Spec:        0.3.1
libcni Version:  v0.6.0
Git Commit:      4888163
Built:           Sun Aug  5 07:31:55 2018
OS/Arch:         linux/amd64

Output of cat /etc/*release:

Fedora release 28 (Twenty Eight)
VERSION="28 (Twenty Eight)"
PRETTY_NAME="Fedora 28 (Twenty Eight)"
Fedora release 28 (Twenty Eight)
Fedora release 28 (Twenty Eight)

Output of uname -a:

Linux rotor 4.18.12-200.fc28.x86_64 #1 SMP Thu Oct 4 15:46:35 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Output of cat /etc/containers/storage.conf:

# This file is is the configuration file for all tools
# that use the containers/storage library.
# See man 5 containers-storage.conf for more information
# The "container storage" table contains all of the server options.

# Default Storage Driver
driver = "overlay"

# Temporary storage location
runroot = "/var/run/containers/storage"

# Primary Read/Write location of container storage
graphroot = "/var/lib/containers/storage"

# Storage options to be passed to underlying storage drivers

# AdditionalImageStores is used to pass paths to additional Read/Only image stores
# Must be comma separated list.
additionalimagestores = [

# Size is used to set a maximum size of the container image.  Only supported by
# certain container storage drivers.
size = ""

# Path to an helper program to use for mounting the file system instead of mounting it
# directly.
#mount_program = "/usr/bin/fuse-overlayfs"

# OverrideKernelCheck tells the driver to ignore kernel checks based on kernel version
override_kernel_check = "true"

# mountopt specifies comma separated list of extra mount options
mountopt = "nodev"

# Remap-UIDs/GIDs is the mapping from UIDs/GIDs as they should appear inside of
# a container, to UIDs/GIDs as they should appear outside of the container, and
# the length of the range of UIDs/GIDs.  Additional mapped sets can be listed
# and will be heeded by libraries, but there are limits to the number of
# mappings which the kernel will allow when you later attempt to run a
# container.
# remap-uids = 0:1668442479:65536
# remap-gids = 0:1668442479:65536

# Remap-User/Group is a name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid or /etc/subgid file.  Mappings are set up starting
# with an in-container ID of 0 and the a host-level ID taken from the lowest
# range that matches the specified name, and using the length of that range.
# Additional ranges are then assigned, using the ranges which specify the
# lowest host-level IDs first, to the lowest not-yet-mapped container-level ID,
# until all of the entries have been used for maps.
# remap-user = "storage"
# remap-group = "storage"

# Storage Options for thinpool

# autoextend_percent determines the amount by which pool needs to be
# grown. This is specified in terms of % of pool size. So a value of 20 means
# that when threshold is hit, pool will be grown by 20% of existing
# pool size.
# autoextend_percent = "20"

# autoextend_threshold determines the pool extension threshold in terms
# of percentage of pool size. For example, if threshold is 60, that means when
# pool is 60% full, threshold has been hit.
# autoextend_threshold = "80"

# basesize specifies the size to use when creating the base device, which
# limits the size of images and containers.
# basesize = "10G"

# blocksize specifies a custom blocksize to use for the thin pool.
# blocksize="64k"

# directlvm_device specifies a custom block storage device to use for the
# thin pool. Required if you setup devicemapper.
# directlvm_device = ""

# directlvm_device_force wipes device even if device already has a filesystem.
# directlvm_device_force = "True"

# fs specifies the filesystem type to use for the base device.
# fs="xfs"

# log_level sets the log level of devicemapper.
# 0: LogLevelSuppress 0 (Default)
# 2: LogLevelFatal
# 3: LogLevelErr
# 4: LogLevelWarn
# 5: LogLevelNotice
# 6: LogLevelInfo
# 7: LogLevelDebug
# log_level = "7"

# min_free_space specifies the min free space percent in a thin pool require for
# new device creation to succeed. Valid values are from 0% - 99%.
# Value 0% disables
# min_free_space = "10%"

# mkfsarg specifies extra mkfs arguments to be used when creating the base.
# device.
# mkfsarg = ""

# use_deferred_removal marks devicemapper block device for deferred removal.
# If the thinpool is in use when the driver attempts to remove it, the driver 
# tells the kernel to remove it as soon as possible. Note this does not free
# up the disk space, use deferred deletion to fully remove the thinpool.
# use_deferred_removal = "True"

# use_deferred_deletion marks thinpool device for deferred deletion.
# If the device is busy when the driver attempts to delete it, the driver
# will attempt to delete device every 30 seconds until successful.
# If the program using the driver exits, the driver will continue attempting
# to cleanup the next time the driver is used. Deferred deletion permanently
# deletes the device and all data stored in device will be lost.
# use_deferred_deletion = "True"

# xfs_nospace_max_retries specifies the maximum number of retries XFS should
# attempt to complete IO when ENOSPC (no space) error is returned by
# underlying storage device.
# xfs_nospace_max_retries = "0"

# If specified, use OSTree to deduplicate files with the overlay backend
ostree_repo = ""

# Set to skip a PRIVATE bind mount on the storage home directory.  Only supported by
# certain container storage drivers
skip_mount_home = "false"


  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答


  • weixin_39961943 weixin_39961943 4月前

    I thought we had a test to make sure we did not end up with a conflict.

    I like the idea of making this random. Someone even opened a PR for this. WDYT?

    点赞 评论 复制链接分享
  • weixin_39615956 weixin_39615956 4月前

    were you testing with v1.4? put in a few lock changes for container creation in v1.4. I wonder if this has already been cured? The lock monopolization is another matter.

    if you've the time, it would be great to know if v1.4 made things happier for you. (https://bodhi.fedoraproject.org/updates/FEDORA-2018-b13b48abbe)

    点赞 评论 复制链接分享
  • weixin_40003512 weixin_40003512 4月前

    I'm not using btrfs here, but ext4.

    Yep, seems that 1.4 is much better in this respect (ended up bumping to F29 anyways for new bits in systemd v239 anyways).

    点赞 评论 复制链接分享
  • weixin_39615956 weixin_39615956 4月前

    Thanks for checking with v1.4, good info to know.

    点赞 评论 复制链接分享
  • weixin_39945816 weixin_39945816 4月前

    , , thanks! I entirely missed that it was a v1.3 issue. Happy that it's working.

    , there are two things: the lock monopolization and the naming algorithm.

    点赞 评论 复制链接分享
  • weixin_39998795 weixin_39998795 4月前

    The creation slowdown is probably happening because I dropped the logic that checked if a name was already used in #1009, but store.CreateContainer() isn't rejecting our attempt to use a duplicate name until after it's done most of its work, and iterating over that step takes longer.

    点赞 评论 复制链接分享
  • weixin_39961943 weixin_39961943 4月前

    Yes it looks like the last thing CreateContainer does is attempt to reserve the name. But the name is based on the imageID.

    Why not just use random numbers between 1 and 10000, or something like that?

    点赞 评论 复制链接分享
  • weixin_39945816 weixin_39945816 4月前

    I've opened a PR to speed up the naming algorithm: https://github.com/containers/buildah/pull/1100

    Curiously, the lock acquisition seems fairer now. I assume that the error path contributed to that symptom.

    点赞 评论 复制链接分享
  • weixin_39961943 weixin_39961943 4月前

    After talking to , he would prefer to re-add the check before creating the initial container. Basically pick the name, and check the store for the existence of the name. Continue until you find an open container name. Then attempt to create. Since this could cause a race, we would still need to check the dups and continue with the current code.

    点赞 评论 复制链接分享
  • weixin_39961943 weixin_39961943 4月前

    I believe this is fixed in buildah 1.5

    点赞 评论 复制链接分享
  • weixin_40003512 weixin_40003512 4月前

    Ah, I can use --name to avoid it in my setup, but this might be something nice to fix anyways.

    点赞 评论 复制链接分享
  • weixin_39615956 weixin_39615956 4月前

    Yep, you're right if you can make use of the --name param that should ease your pain short term. Mean while we'll take a look at this, thanks for the report.

    点赞 评论 复制链接分享
  • weixin_39945816 weixin_39945816 4月前

    I cannot reproduce on openSUSE (with btrfs). I put my machine under heavy load with a few parallel jobs but the naming logic (see https://github.com/containers/buildah/blob/master/new.go#L286) seems to work.

    I have two suspicions (please take with a grain of salt): - rootless specific? - filesystem-lock specific?

    Can you have a look?

    点赞 评论 复制链接分享
  • weixin_39945816 weixin_39945816 4月前

    One thing I noticed is that the lock acquisition doesn't seem to be fair as always one job seems to monopolize the locks. I made this observation with a short script running multiple times:

    for i in `seq 1 20`;               
            sudo buildah from scratch &

    Another observation: it takes increasingly more time to create the working containers as we are going through the sequence incrementally. If we have 80 working containers, we try to create 80 containers first before finding the right suffix of "-81". Maybe, we can create random suffixes to speed things up. This can also be tested with the upper bash script.

    点赞 评论 复制链接分享