Neil Jerram [Wed, 2 Sep 2015 14:14:30 +0000 (15:14 +0100)]
DHCP agent: log when reloading allocations for a new VM port
This will help us to see more clearly if failing DHCP for a new VM -
as seen in failures such as the following traceback from [1] - is
caused by DHCP configuration being too slow, as suggested by the
referenced bug.
Traceback (most recent call last):
File "/home/aqua/tempest/tempest/test.py", line 125, in wrapper
return f(self, *func_args, **func_kwargs)
File "/home/aqua/tempest/tempest/scenario/test_server_basic_ops_38.py", line 105, in test_server_basicops
self.verify_ssh()
File "/home/aqua/tempest/tempest/scenario/test_server_basic_ops_38.py", line 95, in verify_ssh
private_key=self.keypair['private_key'])
File "/home/aqua/tempest/tempest/scenario/manager.py", line 310, in get_remote_client
linux_client.validate_authentication()
File "/home/aqua/tempest/tempest/common/utils/linux/remote_client.py", line 55, in validate_authentication
self.ssh_client.test_connection_auth()
File "/home/aqua/tempest/tempest/common/ssh.py", line 150, in test_connection_auth
connection = self._get_ssh_connection()
File "/home/aqua/tempest/tempest/common/ssh.py", line 87, in _get_ssh_connection
password=self.password)
tempest.exceptions.SSHTimeout: Connection to the 172.17.205.21 via SSH timed out.
User: cirros, Password: None
By returning the need for binding, the (no-op) port processing
is skipped in the main sync loop. In fact, the port is already
detected as deleted and avoiding going over it again will save
us one log message, and thus many, many, many, many bytes over
time in logs.openstack.org.
Kyle Mestery [Wed, 21 Oct 2015 17:43:58 +0000 (17:43 +0000)]
Clarify what gerrit repositories can target neutron-specs
It wasn't made clear by the existing documentation that we only want
neutron, neutron-fwaas, neutron-lbaas, and neutron-vpnaas repositories
to target specs at neutron-specs. This makes it 100% crystal clear.
Eugene Nikanorov [Mon, 12 Oct 2015 12:21:02 +0000 (16:21 +0400)]
Spawn dedicated rpc workers for state reports queue
By default spawn one additional rpc worker to process
state report queue.
State report queue will also be processed by regular
rpc workers, but in case these workers are busy with
processing heavy requests, state reports queue will
automatically be consumed by dedicated rpc workers.
This change applies to ML2 plugin only.
Other plugins should implement start_rpc_state_reports_listener
to enable additional rpc workers.
Assaf Muller [Tue, 20 Oct 2015 21:42:57 +0000 (17:42 -0400)]
Fix l2pop regression
Patch https://review.openstack.org/#/c/236970/ introduced an issue
where get_agent_by_host can return a random host (Including L3,
DHCP or metadata agents), not only L2 agents. The caller then
tries to get tunneling_ip, which might not exist on the returned
agent, causing l2pop code to bail out with a WARNING:
'Unable to retrieve the agent ip...'.
The issue was found by manual introspection of the code, and
verified by modifying the l2pop fullstack test to register L3
agents. Both a unit test was added, as well as modifying the
fullstack connectivity test to register L3 agents if l2pop
is enabled.
The code will now check for agents with a tunneling_ip key
in their configurations dict, which is required for l2pop
to work correctly, essentially a form of duck typing.
John Schwarz [Mon, 12 Oct 2015 12:53:49 +0000 (15:53 +0300)]
Don't remove ip addresses if not master
When setting --admin-state-up=False on an HA router with a gateway set,
standby nodes don't have any ip addresses set on the devices (although
the devices themselves are present). In such cases, Neutron should not
try to un-set those ip addresses before deleting the device itself.
Jakub Libosvar [Tue, 13 Oct 2015 11:18:56 +0000 (13:18 +0200)]
Include alembic versions directory to the package
If package is built without access to git metadata, all the migration
scripts are not included in the build. We need to explicitly specify to
package the scripts.
The project has been removed in I8def7fc2e92f967785b9ab05f8496de641e8f866
and it's been retired from stackforge [1]. So it's safe to remove it from
the list with the remaining bits.
Remove the port-forwarding sub-project from the list
This project is no longer maintained, there are newer talks
going on to better support the use case [1], possibly in
tree. To this aim, it's safe to remove this project from
the list.
Set security group provider rule for icmpv6 RA in DVR
Security group provider rules for RA is set for the VM ports
when a router interface is added or updated after the VM
instance is created.
In the case of DVR Routers the security group provider rule
to allow the RA packets to flow through the VM port input
chain was missing and so the VM was not able to get a
SLAAC/DHCP address when associated with a DVR Router.
This fix will add the security group rule to the VM port input
chain to allow the RA packets to flow into the VM and hence
the VM will obtain an IP address assigned by the Router.
Thomas Herve [Tue, 20 Oct 2015 13:42:59 +0000 (15:42 +0200)]
Properly handle segmentation_id in OVS agent
The segmentation_id of a OVS VLAN can be None, but a recent change
assumed that it was always an integer. It highlighted the fact that we
try to store None in the OVS database, which got stored as a string.
This fixes the storage, and handles loading the value while keeping
compatibility.
Ihar Hrachyshka [Tue, 13 Oct 2015 16:11:29 +0000 (18:11 +0200)]
ovs: remove several unneeded object attributes from setup_rpc()
It's currently not clear what the interactions and interdependencies
between existing attributes set inside setup_rpc() and other code are.
This complicates its perception. So reduce the complexity a bit by using
local variables instead of object attributes where applies; and moving
agent_id calculation out of setup_rpc() since it's not related to AMQP.
Brian Haley [Fri, 16 Oct 2015 22:02:16 +0000 (18:02 -0400)]
Set ip_nonlocal_bind in namespace if it exists
Somewhere in the 3.19 kernel timeframe ip_nonlocal_bind was
changed to be a per-namespace attribute. To be backwards
compatible we need to try that first, then fall-back to
setting the one in the root namespace if it fails.
Romil Gupta [Mon, 19 Oct 2015 13:17:04 +0000 (06:17 -0700)]
Remove SUPPORTED_AGENT_TYPES for l2pop
In a world where agents can be out of tree,
this check seems no longer necessary.
As part of networking-vsphere project which runs ovsvapp agent
on each ESXi host inside service VM, which talk to neutron-server
having l2pop enabled in a multi-hypervisor mode like KVM, ESXi.
The tunnels are not getting established between KVM compute node
and ESXi host. The l2pop mech_driver needs to embrace ovsvapp agent
to form the tunnels.
Hence, this patch-set addresses the issue by removing the
SUPPORTED_AGENT_TYPES for l2pop.
changzhi [Thu, 20 Aug 2015 13:40:42 +0000 (21:40 +0800)]
DVR: Notify specific agent when update floatingip
The L3 agent was determined when update floatingip.
So notify the specific agent rather than notify all agents.
This will save some RPC resources. This is only for DVR routers.
Legacy and HA routers notify only the relevant agents.
Provide links to who belongs to what team, and
rename the section to better reflect its content.
Add the Neutron drivers team, which apparently
was missing and only captured on the wiki.o.o.
Carl Baldwin [Fri, 16 Oct 2015 16:16:15 +0000 (16:16 +0000)]
Refactor _populate_ports_for_subnets for testability
I want to add to this function but I found it very difficult to test
the additions that I was making (because this code isn't well covered
by UTs). This patch adds some unit testing so that I can change it
more confidently but does not change any functionality.
Split the FIP Namespace delete in L3 agent for DVR
Right now we are seeing a race condition in the l3 agent
for DVR routers when a floatingip is deleted and created.
The agent tries to delete the floatingip namespace and
while it tries to delete there is another call to add a
namespace. There is a timing window in between these two
calls where sometimes the call to create a namespace succeeds
but, when tried to execute any commands in the namespace
it fails, since the namespace was deleted concurrently.
Since the fip namespace is associated with an external net
and each node has only one fip namespace for an external net,
we would like to only delete the fip namespace when the
external net is deleted.
The first step is to split the delete functionality into two.
The call to fip_ns.cleanup will only remove the dependency that
the fipnamespace has with the router namespace such as fpr and
rfp veth pairs.
The call to fip_ns.delete will actually delete the
the fip namespace and the fg device.
Martin Hickey [Wed, 14 Oct 2015 21:32:49 +0000 (22:32 +0100)]
Add stevedore aliases for interface_driver configuration
Changed the interface_driver configure for agents from class
imports to stevedor aliases. The loading method needed to be
updated to load as a DriverManager. Backward compatability
for configuration as class import.
It is easy to oversubscribe ourselves during a release
cycle. We approve specs only to experience that we fail
to review consistently; code is posted last minute,
we get lost into chasing race conditions, the odd dependency
throws the gate under the bus, and the ever-present cosmetic
refactoring patch steers our attention from what matters,
which is *reviewing what we commit to deliver*.
In order to keep ourselves focused on the bigger picture,
this patch proposes/formalizes changes to the way we
register blueprints so that we can improve our ability
to better foresee what we'll deliver at the end of a
release.
We obviously can't predict the future or chain people
to a desk, but a regular pulse check of a well planned
task list should at least help mitigate the feeding frenzy
that usually happens closer to FF. That inevitably
leads to half-baked and buggy solutions.
The patch broke notifications about FIP updates and triggered 100%
gate failures for Ironic gate.
I believe that I0cbe8c51c3714e6cbdc48ca37135b783f8014905 is also
breaking notifications, but for FIP create, which probably was not
utilized in any gate before and hence not caught in time.
The change the reverted patch introduced made update_floatingip to
fetch router based on FIP router_id field on every call, which was
not the case before the patch. For some reason unknown at the
moment, we get NotFound from database on this fetch.
The patch does not answer the question why we get NotFound from
database on fetching a FIP router_id, but that's another issue that
should be investigated while Ironic gate is happy.
Fix AttributeError on port_bound for missing ports
If the port is concurrently deleted, db_get_val returns None, and
that causes the Exception to be raised. However, the exception is just
log noise if the port has been deleted concurrently and it does not
lead to failures.
This can happen if port_update and port_delete operations occur
in short sequence and interleave. To prevent this trace from
occurring, this patch checks that the port is being eliminated and
emits an error trace only if the port is indeed to be expected
amongst the list of ports to be updated.
We do not raise an exception to avoid disrupting the agent sync
process, and leave to the admin the investigation of the issue
(should that be cronic rather than transient).
Under normal circumstances, if the port is expected it should
be there, and if it isn't this should be treated as a bug to be
investigated further.
changzhi [Thu, 20 Aug 2015 13:40:42 +0000 (21:40 +0800)]
DVR: Notify specific agent when update floatingip
The L3 agent was determined when update floatingip.
So notify the specific agent rather than notify all agents.
This will save some RPC resources. This is only for DVR routers.
Legacy and HA routers notify only the relevant agents.
Do not try to delete a veth from a nonexistent namespace
Currently, VethFixture cleanup tries to delete veth from their
namespaces even if they don't exist anymore (without crashing the
cleanup). It implies an extra trace if an error is raised which can be
confusing.
This change avoids to try deleting a veth from a nonexistent namespace
in the fixture VethFixture.