Kahou Lei [Sun, 13 Sep 2015 01:48:04 +0000 (18:48 -0700)]
Fix adding tap failure if bridge mapping is not provided
When bridge mapping option is not provided, tap interface cannot be
added properly. The root cause is that linux bridge agent is expecting
bridge mapping option is provided if physical network is presented.
However it is not always the case as Kilo was working fine
without existing bridge mapping
Assaf Muller [Thu, 11 Jun 2015 21:13:44 +0000 (17:13 -0400)]
Add l2pop support to full stack tests
Add the l2pop mechanism driver to the ML2 plugin configuration, and set
l2_population = True, in the OVS agent configuration.
Each test class can enable or disable l2pop in its environment.
Assaf Muller [Tue, 16 Jun 2015 12:56:41 +0000 (08:56 -0400)]
Add tunneling support to full stack tests
* EnvironmentDescription class now accepts 'network_type'.
It sets the ML2 segmentation type, passes it to the OVS agents
configuration files, and sets up the host configuration. If
tunnelling type is selected, it sets up a veth pair with an IP
address from the 240.0.0.1+ range. The addressed end of
this pair is configured as the local_ip for tunneling purposes
in each of the OVS agents. If network type is not tunnelled, it
sets up provider bridges instead and interconnects them.
* For now we run the basic L3 HA test with VLANs and tunneling just
so we have something to show for.
* I started using scenarios in fullstack tests to run the same test
with VLANs or tunneling, and because test names are used for log
dirs, and testscenarios changes test names to include characters
that are not shell friendly (Space, parenthesis), I 'sanitized'
some of those characters.
Stephen Ma [Fri, 28 Aug 2015 14:00:48 +0000 (14:00 +0000)]
Descheduling DVR routers when ports are unbound from VM
When a VM is deleted, the DVR port used by the VM could be unbound
from the compute node. When it is unbounded, it is no longer
in use on the node. Currently the unbind doesn't trigger a check
to determine whether the DVR router can be unscheduled from the
L3-agent running on the compute node. This patch makes the check
and unschedule the router, if necessary.
Reduce the chance of random check/gate test failures
As previously implemented, the TestTrackedResource class is designed
to inject random failures into the gate. It generates random numbers
within the range of 0..10000, and will fail if it generates duplicate
random numbers during its run.
This patch creates UUIDs instead of random numbers, and makes the
chance of an collision vanishingly small.
lzklibj [Mon, 2 Mar 2015 10:13:41 +0000 (02:13 -0800)]
Fix dvr update for subnet attach multi subnets
Fix method dvr_update_router_addvm to notify every
router attached to subnet where the vm will boot
on.
In dvr case, when a subnet only attaches to one router,
the subnet will only have one distributed router interface,
which device_owner is "network:router_interface_distributed".
So in this case, get_ports in this method will only get
one port, and it should be unnecessary to break in for loop.
But when a subnet attaches multiple routers, get_ports in
this method will return all distributed router interfaces
and the routers hold those interfaces should be notified
when an instance booted on the subnet. So it should also
be unnecessary to break in for loop.
Change-Id: I3a5808e5b6e8b78abd1a5b924395844507da0764
Closes-Bug: #1427122 Co-Authored-By: Ryan Moats <rmoats@us.ibm.com>
Carl Baldwin [Fri, 28 Aug 2015 21:19:40 +0000 (21:19 +0000)]
Make ip rule comparison more robust
I found that ip rules would be added multiple times in new address
scopes code because the _exists method was unable to reliably
determine if the rule already existed. This commit improves this by
more robustly canonicalizing what it reads from the ip rule command so
that like rules always compare the same.
Mike Bayer [Fri, 14 Aug 2015 18:44:28 +0000 (14:44 -0400)]
Add non-model index names to autogen exclude filters
The SQLAlchemy MySQL dialect generates implicit indexes
in the less-common case of an integer column within a composite
primary key where autoincrement is not set to False.
Add a rule to ignore these indexes when performing
autogenerate against a target database.
Mike Bayer [Mon, 20 Jul 2015 22:34:15 +0000 (18:34 -0400)]
Implement expand/contract autogenerate extension
Makes use of new Alembic 0.8 features to allow
altering of the "alembic revision" stream such
that operations for expand and contract are
directed into separate branches.
Delete FIP agent gateway port with external gw port
FIP agent gateway ports are associated with external
networks and specific host.
Today FIP agent gateway ports are deleted for
every floatingip associate and disassociate. This
introduces race conditions in the port delete and also
un-necessary access to the db.
This patch will delete the FIP agent gateway port when
the last gateway port of the external network is deleted.
The child patch linked to this parent patch will clean
up the FIP agent gateway port delete when associate,
disassociate and delete of floatingip happens.
This should also cover the case when an agent for some
reason was unable to request agent gw port delete.
(agent died).
Previous changes[1] have been merged as enablers[2] to fix the bug 1274034 but an alternative solution has been choosen and now we can
consider the introduced code as dead code.
This changes removes [2], associated tests and rootwrap filters.
Kevin Benton [Wed, 26 Aug 2015 05:03:27 +0000 (22:03 -0700)]
Stop device_owner from being set to 'network:*'
This patch adjusts the FieldCheck class in the policy engine to
allow a regex rule. It then leverages that to prevent users from
setting the device_owner field to anything that starts with
'network:' on networks which they do not own.
This policy adjustment is necessary because any ports with a
device_owner that starts with 'network:' will not have any security
group rules applied because it is assumed they are trusted network
devices (e.g. router ports, DHCP ports, etc). These security rules
include the anti-spoofing protection for DHCP, IPv6 ICMP messages,
and IP headers.
Without this policy adjustment, tenants can abuse this trust when
connected to a shared network with other tenants by setting their
VM port's device_owner field to 'network:<anything>' and hijack other
tenants' traffic via DHCP spoofing or MAC/IP spoofing.
Aman Kumar [Tue, 17 Mar 2015 10:41:54 +0000 (03:41 -0700)]
ovs agent resync may miss port remove event
In OVS Agent rpc_loop() resync mechanism clears the registered ports and
rescans them again, and it might result in missing some "port removed"
event and treat_devices_removed will not be called.
This fix rescans the newly updated ports when resync mechanism called,
without clearing the current registered ports.
The registered ports will be cleared only if there are too many
consecutive resyncs to avoid resycing forever because of the same
faulty port.
Retry metadata request on connection refused error
This testcase may fail intermittently on 'Connection refused' error.
This could be due to the fact that the metadata proxy setup is not exactly
complete at the time the request is issued; in fact there is no
synchronization between the router being up and the metadata request being
issued, and clearly this may be the reason of accidental but seldom failures.
In order to rule out this possibility and stabilize the test, let's retry
on connection refused only. If we continue to fail, then the next step would
be to dump the content of iptables to figure out why the error occurs.
This patch doesn't changes behaviour of dhcp-agent
but adds the opportunity to use user-defined config,
that will make dhcp-agent more flexible
and allows to run functional tests correctly
(without changing global oslo.config CONF)
This patch deals with the lock wait timeout and the deadlock errors
observed under high concurrency (api_workers >= 4) with the pymysql
driver. It includes the following changes:
- Stop setting dirty status for resource usage when creating
reservation, as usage of reserved resources is not tracked anymore;
- Add a variable, increasing delay when retrying make_reservation
upon a DBDeadlock error in order to reduce the chances of further
collisions;
- Enable transaction retry upon DBDeadlock errors for set_quota_usage;
- Do not resync quota usage while making reservation. This puts a lot
of stress on the database and is also wasteful since resource usage
is very likely to change again once the transaction is committed;
- Use autonested_transaction to simplify logic around when the
nested flag should be used.
Moshe Levi [Tue, 18 Aug 2015 05:48:24 +0000 (08:48 +0300)]
Qos SR-IOV: Refactor extension delete to get mac and pci slot
When calling delete we need the pci slot details to reset the VF rate. The problem
is that when the VM is deleted libvirt return the VF to the hypervisor and eswitch
manager will mark the pci_slot as unassigned so can't know from the mac which pci slot (VF)
to reset. Also newer libvirt version reset the mac when deleteing VM, so than it is
not possible at all.
The solution is to keep pci slot details locally in the agent since upon removal event
you cannot get pci_slot from the neutron server as it is for create/update since port
is already removed from neutron.
This patch pairs the mac and pci_slot for a device (VF) so when calling the extension
port delete api we can have the pci_slot and reset the VF rate.
It is also add a mapping between mac to port_id so we can pass the port_id
when calling the extention port delete api.
OVS agent: handle deleted ports on each rpc_loop iteration
Currently rpc loop processes ports only in case polling is required
(message from ovsdb monitor) or there are port_updated notifications from
server or security group notifications.
In case of just port_deleted notifications port processing is not
triggered during rpc loop.
This may lead to agent accumulating a big amount of deleted ports
and processing all of them at once during next iteration when polling is
required or any notification from server, which might be quite tough for
the agent. Tough means agent will be irresponsive while processing deleted
ports.
The patch makes port deletion processing more gradual.
Shweta P [Thu, 27 Aug 2015 20:53:13 +0000 (16:53 -0400)]
Final decomposition of Cisco plugin
This patch follows the previous patch(listed as dependent) and moves
the remaining cisco db models from neutron to networking-cisco.
The patch deletes l3_model and cisco_router_plugin and their associated
config and helper files from neutron
Fixed functional test that validates graceful ovs agent restart
The async_ping function returns a callable that returns True when all ping
futures are done. Since those futures are running for 10 secs, there was no
chance that the result of the callable was True.
The test was bailing out without calling bridge reset even a single time,
effectively leaving the feature untested in gate.
Another thing to note is that for some reason the patch fixed oslo rootwrap
errors in the test when executed locally. Since I still don't understand how
it's possible that it fixes the issue for me, I mark the bug as related only,
and will track logstash after it's merged to see whether it applies unknown
magic to gate jobs too.
rossella [Wed, 26 Aug 2015 16:06:25 +0000 (16:06 +0000)]
_bind_devices query only existing ports
If a port is deleted right before _bind_devices is called,
get_ports_attributes will throw an exception since the row
corresponding to the port doesn't exist in the OVS DB.
Avoid that setting if_exists to True. The port will be
processed as deleted by the agent in the following iteration.
OVS agent: flush firewall rules for all deleted ports at once
In some cases, under high load OVS agent has to delete a big amount of
ports during rpc_loop. remove_devices_filter() does iptables-save/restore
for IPv4 and IPv6 which is 4 system calls. It is very expensive and
inefficient to call it for each port individually.
* Skip TestWSGIServerWithSSL[1] for Python 3 since it seems wsgi + ssl +
eventlet setup does not behave correctly now,
* Skip test_json_with_utf8[2] until we solve unicode/utf8 encode/decode,
* Fix some more tests to pass for py3,
* Replace print by print() in docs/docstrings.
Tu Hong Jun [Thu, 20 Aug 2015 06:08:07 +0000 (14:08 +0800)]
Changed filter field to router_id
The get_sync_interfaces query will always return all router ports
from database even it is supposed to query specific ones that
belong to a certain router. In large L3 scale environment with
number of route ports in place, this would lag the response time
for adding router interface and router L3 agent binding.
Sergey Vilgelm [Mon, 31 Aug 2015 14:06:48 +0000 (17:06 +0300)]
Fix a wrong condition for the _purge_metering_info function
Fix a situation for the _purge_metering_info function
when the items will never be deleted from the metering_info.
Delete the metering_info dict and use the metering_infos instead.
Fix the problem with changing a dictionary during iteration.
Add the unit tests for the _purge_metering_info and
_add_metering_info functions.
Make sure service providers can be loaded correctly
This patch fixes a regression where, if neutron was loaded using
--config-dir, the service_providers option was no longer available.
We bring the logic back (removed by 61121c5f2af), alongside the ability
to load the option auto-magically. This is especially required for DevStack
deployments as of today, because neutron-server is only loaded by passing
--config-file (...)neutron.conf and --config-file (...)ml2_conf.ini
Some SRIOV drivers/devices don't support link state setting,
meaning that 'ip link' fails like this when trying to set state:
# ip l set dev p2p1 vf 6 state disable
RTNETLINK answers: Operation not supported
The sriov-nic-agent tries to do that in
SriovNicSwitchAgent.treat_device() and fails because of non-zero
exit status from 'ip link' and, therefore, doesn't reach the code
that updates the actual port status, so port could hang in a BUILD
state even if binding was successful.
This patch fixes problem of nova not being able to successfully bind
or cleanup such a port. It does not fix a case when user manually
updates admin_state_up for a port via API, it's subject to a separate
fix.
Also, replace LOG.exception with LOG.warning for set_device_state()
as the exception would be logged by PciDeviceIPWrapper.set_vf_state()
anyway.