Here's my take: If a system is in "starting" status as reported by
systemd, we shouldn't be arbitrarily recycling system services until the system is started up fully. On a properly functioning system, the
starting status should only be set during the system boot phase, and yield to
degraded based upon the end state of the enabled unit files.
This seems to be a special case when this is run by CloudFormation, where it's running the installer script prior to the system being fully online. It's fine from a perspective of staging changes to be reloaded later on, but I have concerns about kicking a system service to restart while the rest of the system is still coming online (considering the case where we have
systemd unit files that explicitly depend on
sshd. My first thought was an old terminal style CRM application where the application daemon is set as a user's shell, so the daemon would be in a race condition to have
sshd restart itself before
systemd tried to start the daemon)
Since this is specific to CloudFormation (and provided that the system is in
running state prior to starting, which it is when the system starts accepting console logins), my feeling is that the proper fix would lie in the CloudFormation YAML document to inform CloudFormation that part of the scripted setup is touching the SSH configuration and to therefore restart the service at the end of its execution.
From what I have read from you, it seems that your preferred fix is to invert the logic to have the check against
systemd is-system-running to select it's if block in the absence of an explicit failure, as this then covers the scenario when CloudFormation is running the installer.
My suggestion in that case is we should add a condition to the existing if test to catch this desired state. It doesn't need a separate else block, since we want to take the same action if the system is in any of the three states (running, degraded, or starting). I'm pushing a patch to this to my fork right now, and if you want, I can re-open the PR for that branch once it's pushed out from my development machine.
Just for the record: during my testing on Ubuntu AMIs, if you use a console session after boot completes,
systemctl is-system-running returns
Running. CentOS returns degraded due to an issue in their AMI and one of their enabled services. The if block that searches for these looks for either one due to that. We can just tack the "Starting" status on to that.