weixin_39760967 2020-11-29 22:01
浏览 0

chore(docker): smaller development distribution

Description

This PR reduces the docker image sizes that we produce by a little less than 50%, and adds a new cloud build configuration such that developers can build their images in the cloud, optionally without having to upload anything at all.

Size

The current official image weighs in at around ~500mb (give or take depending on the actual version). Diving into it, we can see that it is split in these main layers (ignoring layers using < 20mb):

  • 114mb for the base image
  • 125mb installing dependencies
  • 102mb copying the distribution tar ball
  • 130mb decompressing the tar ball

One thing to note here is that whenever a developer changes any line and recompiles, the last two layers above are touched. This means that most of the times, to test a development image in the cloud, a developer has to rebuild those last two layers, and push them out, which on a home DSL/cable connection is typically quite slow (assuming you live in Germany).

The PR addresses this by introducing multi-stage builds, such that the instead of copying and un-tarring, the final image only copies the un-tarred distribution, resulting in one layer less.

  1. Builder

The second stage is a simple Alpine based image which takes in the distribution tar ball and decompresses it. The reason for this stage is to combine the copying of the tar-ball and its decompression into a single logical layer, resulting in less data to be pushed to the registry later on.

  1. Zeebe

The last stage is the actual application container, which is now based on a slim build of openjdk:11-jre. It's ordered such that the most commonly changed layers are the last to be created, e.g. the copying of the distribution from the builder is the very last command, and we should strive to keep it that way.

The end result is an ~300mb image, with 9 layers, but with only one application specific layer of 80-90mb, which will normally be the only layer that is changed. Should you have to push this, then you would only really have to upload a single 80-90mb layer.

In order to reduce the distribution size, a new docker maven profile was introduced. It allows us to select a platform-specific version of RocksDB (e.g. linux64), reducing the size from 29mb to 9mb, and only packages a platform specific zbctl (again for linux), reducing the size from 36mb to 12mb for zbctl variants. In order to selectively execute antrun tasks, it was necessary to externalize the ant target to a dist/ant.xml file, where we can then define multiple (in)dependent, conditional tasks. This was not possible with the plugin's provided DSL.

Cloud Build

There is a new cloudbuild.yaml which provides a simple setup to build the distribution on cloud build. It will build the Go binaries, then the distribution (skipping as all checks and tests), and finally the docker image. It's currently configured to use the branch name as the image tag, which we can change later - it supports many pre-configured Git-based variables. If building locally the developer has to provide the Git branch.

One thing to note, building locally means uploading the current context, which is a ~60mb tar ball (done automatically by cloud build). You can try it using gcloud builds submit .. It's alright, but not ideal. Other than this, I didn't set up the GitHub integration, so at the moment you can use build triggers made specifically for branches, and it will build on every push to a given branch using a mirrored-repository. That said, it uses a token through my account, so also just a temporary measure.

All that to say, after looking into Cloud Build, I would be inclined to add credentials to our Jenkins such that it can build and push images rather than do everything on Cloud Build. Let me know what you think.

I ran into issue with our zbctl build script, it's not exactly usable outside a Git repo. I guess it could be improved passing in the right variables. So right now you need to give the correct RELEASE_HASH if you're building locally.

I had trouble getting it not to time out with less powerful but cheaper machines. We could try splitting download the dependencies and building the image, but I'm hoping this cloud build stuff is a temporary measure for when developers work from home. I would propose we find a better long term solution, which should probably be having a Jenkins job building and pushing the image.

Makefile

This PR also introduces a Makefile to ease the building of docker images locally, which tries to pick some smart defaults to easily package and build images. make package will package the distribution using the docker profile and skip all checks; make dist will make the distribution docker image using the current project version as tag. make dev will instead use the tar ball sha1 sum as tag, allowing you to get a new tag based on your changes.

I find make dev useful for me, but not sure if it's particularly useful for others :man_shrugging:

Pull Request Checklist

  • [x] All commit messages match our commit message guidelines
  • [x] The submitting code follows our code style
  • [x] If submitting code, please run mvn clean install -DskipTests locally before committing

该提问来源于开源项目:zeebe-io/zeebe

  • 写回答

6条回答 默认 最新

  • weixin_39760967 2020-11-29 22:01
    关注

    I'm currently running a benchmark with the latest snapshot but using the development Dockerfile with a custom JRE under the np-test namespace - it's been running for some hours now and looks fine. There's a strange 1 hour cycle where performance spikes, then degrades slowly over an hour, then spikes again, but I don't think this particularly related to the JRE changes (which is really what I am trying to test here). Let's have a look Monday if the custom JRE has any impact.

    Another thing here, I added a new Makefile since this is what is simplest for me, but I've been thinking - our team is being more "cross-platform" (for lack of better word). Are Makefile and shell scripts the best we can come up with? Is it time to evaluate how to write tooling for devs across multiple platforms? I had a quick look at SCons as a make replacement, but wasn't sold - though python is more portable than bash, I guess. There's also doit as well in the Python world, could be an option. In the Java world I know ant is still used, and seems a little bit better than maven for this one off tasks, but I have little experience with it. The please tool also seemed interesting. Anyway, just a thought...

    EDIT: menski, let me know if you don't have any time for this, I could also ask Alastair to take a look at this for some Docker experienced :eyes:

    EDITEDIT: marked as WIP since I didn't discuss this prior so happy to do so and change anything. I'm also not recommending this for our production distribution as I don't feel very confident about distributing a custom JRE :sweat_smile:

    评论

报告相同问题?