sys v init scripts are inherently racy since creating a PID file takes
a while. In particular collectived needs about 0.6 seconds to daemonize
itself and create its PID file. If the service gets restarted in this
interval the second instance of the daemon gets started without stopping
the previous one. Apparently mcollectived gets restarted very often
during the final phase of IBP. Hence get rid of sys V init script and
use an upstart job to manage mcollectived.
openstack-ci-packtest-deb is expected to fail without the corresponding
nailgun patches:
https://review.openstack.org/186027
https://review.openstack.org/186028
Some background around this patch:
- mos6.0 was released with mcl 2.3.3, but the repository was not properly
updated (it has 2.3.1);
- that causes the regression, resulting in mcl 2.3.1 being initially imported
to 6.1;
- some work was done on mcl 2.3.1, mainly to fix init scripts (see
https://review.fuel-infra.org/gitweb?p=packages/trusty/mcollective.git;a=commit;h=176926bf9e3ff5de2ad044776820c064b933b80c
and
https://review.fuel-infra.org/gitweb?p=packages/trusty/mcollective.git;a=commit;h=d3cac30b312dd7c6f5425be4a4bdb0d30d52ceed)
So this package contains mlc 2.3.3 with updated sys V init script.
Fix sys V init script so start/restart actions work properly
The config file shipped with the package tells mcollectived to daemonize.
However the init script assumes mcollectived runs in the foreground
(start-stop-daemon -b) and tells start-stop-daemon put mcollective into
background and to manage the pid file. So start-stop-daemon forks, records
its pid, and starts mcollective which forks again. Therefore PID recorded
in the PID file is wrong. As a result the restart action spawns a new
instance of mcollective without stopping the old one, and start launches
a new mcollective even if it's already running.
Rewrite the init script, drop the trickery which start-stop-daemon does
on its own (provided that PID file is correct). While at it add dependency
on ruby-stomp (mcollectived won't even start without it).
Fix the init script so mcollective can be restarted properly
The check for a running mcollective process is totally wrong and fails
to handle a stale pid file:
if [ -f $(cat /proc/$(cat $pidfile)/exe > /dev/null) ] ; then
Basically it's
if [ -f ]; then echo "Someone can't write shell scripts"; fi
This error breaks image based deployment. During the first boot after
the image based provisioning cloud init reconfigures mcollective to use
rabbitmq for communication and restarts it. However the restart gets
suppressed due to a bug in the sysv-style init script, so OpenStack
deployment does not start (and eventually times out).
The error is non-deterministic since sometimes cloud-init manages to
configure mcollective before it actually gets started (thus the correct
configuration is used despite the skipped restart).
The version of mcollective (2.0.0) shipped with Ubuntu 14.04 is unable
to communicate via AMQP and is not suitable for Fuel. Therefore pick
the source from packages/precise/mcollective and adapt it so it works
on Ubuntu 14.04