Kevin Benton [Wed, 3 Jun 2015 05:52:51 +0000 (05:52 +0000)]
Revert "Add VIF_DELETED notification event to Nova"
We need to wait until the nova support is added in
I998b6bb80cc0a81d665b61b8c4a424d7219c666f. Otherwise
this generates a ton of error messages in the nova api
log as well as on the neutron side.
Eugene Nikanorov [Tue, 26 May 2015 16:17:20 +0000 (20:17 +0400)]
Catch broad exception in methods used in FixedIntervalLoopingCall
Unlike other places where it might make sense to catch specific
exceptions, methods that are used to check L3 and DHCP agents
liveness via FixedIntervalLoopingCall should never allow exceptions
to leak to calling method and interrupt the loop.
Further improvement of FixedIntervalLoopingCall might be needed,
but for the sake of easy backporting it makes sense to fix the issue
in neutron before pushing refactoring to 3rd-party library.
Assaf Muller [Tue, 2 Jun 2015 16:21:11 +0000 (12:21 -0400)]
Add devref that explains fullstack testing and its direction
The goal of this doc is to communicate what are full stack tests,
how they benefit you and when would you write such a test.
Additionally I'd like to communicate the way forward, and gather
feedback about any areas in the code that can benefit from full
stack tests, and any additional thoughts!
Assaf Muller [Mon, 1 Jun 2015 21:05:56 +0000 (17:05 -0400)]
Remove get_dhcp_port RPC method
This method was last used in Icehouse. I think we can safely
remove all of its code and tests. Icehouse to Liberty rolling
upgrades are in no way expected to work so I just bumped
the RPC version and removed all traces of the code.
Gal Sagie [Tue, 2 Jun 2015 05:49:10 +0000 (08:49 +0300)]
Update rootwrap.conf to add /usr/local/bin
When working with OVN i found on Fedora 21 that
my ovs-vsctl is installed in /usr/local/bin, since this wasnt in
rootwrap DHCP didnt work properly.
This change adds it to rootwrap
Windows VMs try to resolve metadata ip 169.254.169.254 as
local address by default, which results in very slow access
to metadata url during boot.
Injecting direct route to metadata ip through a subnet's default
gateway helps Windows to avoid wasting time on mac resolution.
So this patch injects host route for metadata ip for networks plugged
into a router.
Assaf Muller [Fri, 29 May 2015 23:17:34 +0000 (19:17 -0400)]
Modify ipset functional tests to pass on older machines
Production code uses ipset exclusively in the root namespace,
however functional testing uses ipset in namespace for isolation.
This poses an issue as ipset is not supported in namespaces on
all kernels and distributions (I'm looking at you CentOS/RHEL 7.1).
This patch changes the ipset functional tests to work in the root
namespace while taking care of cleanups.
Kevin Benton [Fri, 1 May 2015 00:14:44 +0000 (17:14 -0700)]
Don't delete port from bridge on delete_port event
Commit d6a55c17360d1aa8ca91849199987ae71e8600ee added
logic to the OVS agent to delete a port from the integration
bridge when a port was deleted on the Neutron side. However,
this led to several races where whoever created the initial
port (e.g. Nova, L3 agent, DHCP agent) would be trying to
remove the port from the bridge at the same time. These
would result in ugly exceptions on one side or the other.
The original commit was trying to address the problem where
the port would maintain connectivity even though it was removed
from the integration bridge.
This patch addresses both cases by removing the iptables rules
for the deleted port and putting it in the dead VLAN so it loses
connectivity. However, it still leaves the port attached to the
integration bridge so the original creator can delete it.
Cedric Brandily [Tue, 26 May 2015 12:29:15 +0000 (14:29 +0200)]
Enable random hash seeds
Neutron tests have been updated in order to support random hash seed. It
allows to remove PYTHONHASHSEED=0 in tox.ini and remove hashtest tox
environment.
Joe Gordon [Fri, 29 May 2015 21:28:34 +0000 (14:28 -0700)]
Fix formatting of core-reviewers doc
Fix some RST formatting issues with the core-reviewers policy document.
When reading the RST rendered version of that document at
http://docs.openstack.org/developer/neutron/policies/core-reviewers.html
I noticed a few rendering issues where were bothering me, so I fixed
them.
`contextlib.nested` is deprecated since Python 2.7 and incompatible with
Python 3. This patch removes all its occurences by using the helper
script at [1].
This is a necessary step to allow us running all unit tests with
Python 3 (not just a small subset as it is done now).
They are some missing/extra indentations in tests source code. This
results in variables used out their scope (which remains unnoticed as
long as `with` contexts do not fail), and prevent refactoring scripts
(such as the one for getting rid of `contextlib.nested` [1]) from
performing well.
This simple patch fixes these indentation errors.
[1]: See change I8d1de09ff38ed0af9fb56f423a2c43476408e0fb
Cedric Brandily [Wed, 27 May 2015 06:53:00 +0000 (08:53 +0200)]
Improve test_set_members_deleting_less_than_5
In test_set_members_deleting_less_than_5[1], 3 ips are deleted from
ipset but test_set_members_deleting_less_than_5 checked that the
first one was deleted because the call ordering was non-trivial.
The test was successful because
assert_has_calls(expected_calls, any_order=False) allows extra calls
before and after expected_calls.
A parent change[2] forces the call ordering, this allows to check that
the 3 ips are deleted.
Cedric Brandily [Tue, 26 May 2015 14:38:26 +0000 (14:38 +0000)]
Sort _get_new/deleted_set_ips responses in unittests
This fixes the test_set_members_adding/deleting_less_than_5 unit test
that breaks with a randomized PYTHONHASHSEED (see the bug report).
The test assumed that the _get_new/deleted_set_ips from
neutron.agent.linux.ipset_manager return elements in a particular order.
Found with PYTHONHASHSEED=1.
The fix refactors the test case to force sorted responses from
_get_new/deleted_set_ips during unittests.
Note: There are several other unrelated unit tests that also break with
a randomized PYTHONHASHSEED, but they are not addressed here. They will
be addressed in separate patches.
Oleg Bondarev [Thu, 14 May 2015 12:09:24 +0000 (15:09 +0300)]
Cleanup stale metadata processes on l3 agent sync
Currently l3 agent only cleans up stale namespaces.
The fix adds checking and deleting stale metadata processes
to NamespaceManager class responsible for clearing stale
namespaces
shihanzhang [Tue, 26 May 2015 01:29:58 +0000 (09:29 +0800)]
Fix ovs agent restore local_vlan_map failed
when ovs agent restart, it will restore the local_vlan_map, but in
some condition, if a device does not be set tag in ovsdb, the function
'db_get_val("Port", port.port_name, "tag")' will return a empty list,
it does not need 'provision_local_vlan' for this device.
Kevin Benton [Thu, 28 May 2015 23:48:04 +0000 (16:48 -0700)]
Use correct time delta function
The .seconds attribute of a timedetla object cannot be taken in
isolation because it can overflow into days. For example, a -1 second
difference will become -1 day and 86399 seconds.
This became a problem when the agent clock was slightly ahead of
the server clock. When calling (server_time - agent_time).seconds
in this scenario, it would go below 0 in the daily seconds and
wraparound to 86399 seconds and -1 day.
This patch corrects the issue by using a method in timeutils that
ends up calling total_seconds(), which was designed for this usecase.
It also restores the formatting that was removed in patch:
Ibfc30444b7a167fb18ae9051a775266236d4ecce
Cedric Brandily [Wed, 27 May 2015 18:30:28 +0000 (20:30 +0200)]
Do not assume order of security group rules
This fixes the unit tests[1] that breaks with a randomized
PYTHONHASHSEED (see the bug report).
The test assumed that the security_group_rules_for_devices method from
neutron.agent.securitygroups_rpc returned security group rules in a
particular order. Found with PYTHONHASHSEED=2.
The fix refactors the test case to handle unsorted security group rules.
Note: There are several other unrelated unit tests that also break with
a randomized PYTHONHASHSEED, but they are not addressed here. They will
be addressed in separate patches.
Ihar Hrachyshka [Thu, 28 May 2015 12:40:25 +0000 (14:40 +0200)]
py34: don't run any tests except unit tests
py34 job was intended for unit tests only. It's important to distinguish
between different types of tests, because they all have different
requirements to execution environment. E.g. functional tests are not
expected to run in a restricted env designed for unit tests, and that's
even more valid for fullstack tests.
Otherwise, the job may fail or apply irrecoverable changes to test
runner system, breaking the system.
If we ever want to support py3 for other types of tests, we should add
separate jobs just for that.
Note that the neutron-python3 blueprint was not intended to introduce
changes to support anything but unit test execution with the new Python
version, so strictly speaking, any effort to make other test types work
is out of scope.
Note: There are several other unrelated unit tests that also break with
a randomized PYTHONHASHSEED, but they are not addressed here. They will
be addressed in separate patches.
Gal Sagie [Thu, 28 May 2015 06:17:15 +0000 (09:17 +0300)]
Addressing follow up comments for OVS_LIB fail_mode setting API
Review https://review.openstack.org/#/c/185659/ got merged before i could
see and address the last comment.
This is a follow up patch to address that change
Gong Zhang [Wed, 27 May 2015 09:10:17 +0000 (17:10 +0800)]
Move pool dispose() before os.fork
Currently pool dispose() is done after os.fork, but this will
produce shared DB connections in child processes which may lead
to DB errors.
Move pool dispose() before os.fork. This will remove all existing
connections in the parent process and child processes will create
their own new ones.
Kevin Benton [Sat, 16 May 2015 00:10:15 +0000 (17:10 -0700)]
Switch to dictionary for iptables find
The code to find the matching entry was scanning through a
list of all rules for every rule. This became extremely slow
as the number of rules became large, leading to long delays
waiting for firewall rules to be applied.
This patch switches to the use of a dictionary so the cost
becomes a hash lookup instead of a list scan.
Kevin Benton [Thu, 28 May 2015 00:38:32 +0000 (17:38 -0700)]
Process port IP requests before subnet requests
When a port requests multiple fixed IPs, process the requests
for specific IP addresses before the ones asking for a subnet.
This prevents an error where the IP that was requested happens
to be the next up for allocation so the subnet request takes it
and causes a DBDuplicateEntry.
Kevin Benton [Wed, 27 May 2015 21:52:06 +0000 (14:52 -0700)]
Remove time formatting in agent clock error
This removes time formatting that may be hiding timezone
issues that are leading to a delta being calculated between
the agent and the server even when it shows none. It also
adds logging of the difference so we can see how far off it
thinks they are.
Example message:
during the registration of Open vSwitch agent has a timestamp:
2015-05-19T18:15:27Z. This differs from the current server
timestamp: 2015-05-19T18:15:27Z by more than the threshold agent
downtime: 75.
Note that the timestamps are exactly the same after formatting.
Kevin Benton [Tue, 26 May 2015 01:55:44 +0000 (18:55 -0700)]
Persist DHCP leases to a local database
Due to issues caused by dnsmasq restarts sending DHCPNAKs,
change Ieff0236670c1403b5d79ad8e50d7574c1b694e34 passed the
'dhcp-authoritative' option to dnsmasq. While this solved the
restart issue, it broke the multi-DHCP server scenario because
the dnsmasq instances will NAK requests to a server ID that
isn't their own.
Problem DHCP Request Lifecycle:
Client: DHCPDISCOVER(broadcast)
Server1: DHCPOFFER
Server2: DHCPOFFER
Client: DHCPREQUEST(broadcast with Server-ID=Server1)
Server1: DHCPACK
Server2: DHCPNAK(in response to observed DHCPREQUEST with other Server-ID)
^---Causes issues
This change removes the authoritative option so NAKs are not
send in response to DHCPREQUEST's to other servers. To handle
the original issue that Ieff0236670c1403b5d79ad8e50d7574c1b694e34
was inteded to address, this patch also allows changes to be persisted
to a local lease file.
In order to handle the issue where a DHCP server may be scheduled
to another agent, a fake lease file is generated for dnsmasq to start
with. The contents are populated based on all of the known ports for
a network. This should prevent dnsmasq from NAKing clients renewing
leases issued before it was restarted/rescheduled.
Kyle Mestery [Wed, 27 May 2015 17:26:00 +0000 (17:26 +0000)]
Flesh out the new RFE process and set deadlines for it's use
The new RFE process is great in concept, but as was discovered in the
first neutron-drivers meeting where we discussed these, there exist
some rough edges. Specifically around deadlines and the conversion to
using RFEs, the gray area was very obvious. This patch attempts to put
a stake in the ground for when we transition fully to this new model,
including distinct timelines.
Given that we will need to work with people during the transition,
what is proposed is a way to let us do that while not blocking existing
specs and work.
Cedric Brandily [Wed, 27 May 2015 17:57:04 +0000 (19:57 +0200)]
Do not assume order of dictionary elements in init_l3
This fixes the test_interface unit tests[1] that breaks with a
randomized PYTHONHASHSEED (see the bug report).
The test assumed that the init_l3 method from
neutron.agent.linux.interface had dictionary elements in a particular
order. Found with PYTHONHASHSEED=2.
The fix refactors the test case to handle unsorted dictionaries in
init_l3.
Note: There are several other unrelated unit tests that also break with
a randomized PYTHONHASHSEED, but they are not addressed here. They will
be addressed in separate patches.