From: Miguel Angel Ajo Date: Tue, 3 Nov 2015 09:51:56 +0000 (+0100) Subject: devref: Rolling upgrade mechanism for rpc-callbacks X-Git-Url: https://review.fuel-infra.org/gitweb?a=commitdiff_plain;h=0ddb66f0ed5468f43ff43771a94edfbf12f3b998;p=openstack-build%2Fneutron-build.git devref: Rolling upgrade mechanism for rpc-callbacks Versioned Object push notifications require the server to be aware of supported versions in the agents, since they are subscribed to neutron-vo--. During upgrade time, the server would need to downgrade and serialize the objects across version subset, and send it to the fanout queues for agent consumption. One manual solution could be manual admin pinning, but we can do better than that, making administrator lives easier if we provide a reliable mechanism for remote version auto discovery. Change-Id: I02b694137eb2d58e5f2f3e7631f0e4b90f7c17ad --- diff --git a/doc/source/devref/rpc_callbacks.rst b/doc/source/devref/rpc_callbacks.rst index de79e24a6..bb0aeac2a 100644 --- a/doc/source/devref/rpc_callbacks.rst +++ b/doc/source/devref/rpc_callbacks.rst @@ -65,28 +65,6 @@ versioned_objects serialization/deserialization with the obj_to_primitive(target_version=..) and primitive_to_obj() [#ov_serdes]_ methods is used internally to convert/retrieve objects before/after messaging. -Considering rolling upgrades, there are several scenarios to look at: - -* publisher (generally neutron-server or a service) and subscriber (agent) - know the same version of the objects, so they serialize, and deserialize - without issues. - -* publisher knows (and sends) an older version of the object, subscriber - will get the object updated to latest version on arrival before any - callback is called. - -* publisher sends a newer version of the object, subscriber won't be able - to deserialize the object, in this case (PLEASE DISCUSS), we can think of two - strategies: - - -The strategy for upgrades will be: - During upgrades, we pin neutron-server to a compatible version for resource - fanout updates, and the server sends both the old, and the newer version. - The new agents process updates, taking the newer version of the resource - fanout updates. When the whole system upgraded, we un-pin the compatible - version fanout. - Serialized versioned objects look like:: {'versioned_object.version': '1.0', @@ -101,6 +79,102 @@ Serialized versioned objects look like:: 'name': u'aaa'}, 'versioned_object.namespace': 'versionedobjects'} +Rolling upgrades strategy +------------------------- +In this section we assume the standard Neutron upgrade process, which means +upgrade the server first and then upgrade the agents: + +:doc:`More information about the upgrade strategy `. + +The plan is to provide a semi-automatic method which avoids manual pinning and +unpinning of versions by the administrator which could be prone to error. + +Resource pull requests +~~~~~~~~~~~~~~~~~~~~~~ +Resource pull requests will always be ok because the underlying resource RPC +does provide the version of the requested resource id / ids. The server will +be upgraded first, so it will always be able to satisfy any version the agents +request. + +Resource push notifications +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Agents will subscribe to the neutron-vo-- fanout queue +which carries updated objects for the version they know about. The versions +they know about depend on the runtime Neutron versioned objects they started with. + +When the server upgrades, it should be able to instantly calculate a census of +agent versions per object (we will define a mechanism for this in a later +section). It will use the census to send fanout messages on all the version +span a resource type has. + +For example, if neutron-server knew it has rpc-callback aware agents with +versions 1.0, and versions 1.2 of resource type "A", any update would be sent +to neutron-vo-A_1.0 and neutron-vo-A_1.2. + +TODO(mangelajo): Verify that after upgrade is finished any unused messaging +resources (queues, exchanges, and so on) are released as older agents go away +and neutron-server stops producing new message casts. Otherwise document the +need for a neutron-server restart after rolling upgrade has finished if we +want the queues cleaned up. + + +Leveraging agent state reports for object version discovery ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +We would add a row to the agent db for tracking agent known objects and version +numbers. This would resemble the implementation of the configuration column. + +Agents would report at start not only their configuration now, but also +their subscribed object type / version pairs, that would be stored in the +database and would be available to any neutron-server requesting it:: + + 'subscribed_versions': {'QoSPolicy': '1.1', + 'SecurityGroup': '1.0', + 'Port': '1.0'} + +There's a subset of Liberty agents depending on QoSPolicy that will +require 'QoSPolicy': '1.0' if the qos plugin is installed. We will be able +to identify those by the binary name (included in the report): + +* 'neutron-openvswitch-agent' +* 'neutron-sriov-nic-agent' + +Version discovery ++++++++++++++++++ +With the above mechanism in place and considering the exception of +neutron-openvswitch-agent and neutron-sriov-agent requiring QoSpolicy 1.0, +we could discover the subset of versions to be sent on every push +notification. + +Agents that are in down state would be excluded from this calculation. +We would use an extended timeout for agents in this calculation to make sure +we're on the safe side, specially if deployer marked agents with low +timeouts. + +Starting at Mitaka, any agent interested in versioned objects via this API +should report their resource/version tuples of interest (the resource type/ +version pairs they're subscribed to). + +Caching mechanism +''''''''''''''''' +The version subset per object will be cached to avoid DB requests on every push +given that we assume that all old agents are already registered at the time of +upgrade. + +Cached subset will be re-evaluated (to cut down the version sets as agents +upgrade) after configured TTL. + +As a fast path to update this cache on all neutron-servers when upgraded agents +come up (or old agents revive after a long timeout or even a downgrade), we could +introduce a fanout queue consumed by servers, to additionally notify from one +agent to all neutron-servers about the "versions of interest" in the agent just +comming up. + +All notifications for all calculated version sets must be sent, as non-upgraded +agents would otherwise not receive them. + +It is safe to send notifications to any fanout queue as they will be discarded +if no agent is listening. + Topic names for every resource type RPC endpoint ------------------------------------------------ @@ -205,8 +279,7 @@ The server/publisher side may look like:: References ---------- - -.. [#ov_serdes] https://github.com/openstack/oslo.versionedobjects/blob/master/oslo_versionedobjects/tests/test_objects.py#L621 -.. [#vo_mkcompat] https://github.com/openstack/oslo.versionedobjects/blob/master/oslo_versionedobjects/base.py#L460 -.. [#vo_mkcptests] https://github.com/openstack/oslo.versionedobjects/blob/master/oslo_versionedobjects/tests/test_objects.py#L111 -.. [#vo_versioning] https://github.com/openstack/oslo.versionedobjects/blob/master/oslo_versionedobjects/base.py#L236 +.. [#ov_serdes] https://github.com/openstack/oslo.versionedobjects/blob/ce00f18f7e9143b5175e889970564813189e3e6d/oslo_versionedobjects/tests/test_objects.py#L410 +.. [#vo_mkcompat] https://github.com/openstack/oslo.versionedobjects/blob/ce00f18f7e9143b5175e889970564813189e3e6d/oslo_versionedobjects/base.py#L474 +.. [#vo_mkcptests] https://github.com/openstack/oslo.versionedobjects/blob/ce00f18f7e9143b5175e889970564813189e3e6d/oslo_versionedobjects/tests/test_objects.py#L114 +.. [#vo_versioning] https://github.com/openstack/oslo.versionedobjects/blob/ce00f18f7e9143b5175e889970564813189e3e6d/oslo_versionedobjects/base.py#L248