Option to remove routers from dead l3 agents
Add a configuration-enabled periodic check to examine the
status of all L3 agents with routers scheduled to them and
admin_state_up set to True. If the agent is dead, the router
will be rescheduled to an alive agent.
Neutron considers and agent 'dead' when the server doesn't
receive any heartbeat messages from the agent over the
RPC channel within a given number of seconds (agent_down_time).
There are various false positive scenarios where the agent may
fail to report even though the node is still forwarding traffic.
This is configuration driven because a dead L3 agent with active
namespaces forwarding traffic and responding to ARP requests may
cause issues. If the network backend does not block the dead
agent's node from using the router's IP addresses, there will be
a conflict between the old and new namespace.
This conflict should not break east-west traffic because both
namespaces will be attached to the appropriate networks and
either can forward the traffic without state. However, traffic
being overloaded onto the router's external network interface
IP in north-south traffic will be impacted because the matching
translation for port address translation will only be present
on one router. Additionally, floating IPs associated to ports
after the rescheduling will not work traversing the old
namespace because the mapping will not be present.
DocImpact
Partial-Bug: #
1174591
Change-Id: Id7d487f54ca54fdd46b7616c0969319afc0bb589