Add migration support from agent to NSX dhcp/metadata services
This is feature patch (3 of 3) that introduces support for
transitioning existing NSX-based deployments from the agent
based model of providing dhcp and metadata proxy services
to the new agentless based mode. In 'combined' mode, existing
networks will still be served by the existing infrastructure,
whereas new networks will be served by the new infrastructure.
Networks may be migrated to the model using a new CLI tool
provided, called 'neutron-nsx-manage'. Currently the tool
provides two admin-only commands:
neutron-nsx-manage net-report <net-id-or-name>
This will check that the network can be migrated and returns
the resources currently in use. And:
neutron-nsx-manage net-migrate <net-id-or-name>
This will move the network over the new model and deallocate
resources from the agent. Once a network has been migrated
there is no turning back.
The NSX plugin does not allow to reassociate a floating IP to
a different internal IP address on the same port where it's
currently associated.
This patch fixes this behaviour and adds a unit test to ensure
re-association on the same port with a different IP is possible.
A few tweaks to the unit test aux functions were necessary to
accomodate the newly introduced unit test.
Terry Wilson [Fri, 24 Jan 2014 19:34:15 +0000 (13:34 -0600)]
Remove psutil dependency
The version of psutil that was being required is not hosted on
PyPi which caused some issues. This patch removes the psutil
dependency in favor of using the method that was proposed for
the havana backport of polling minimization.
hyunsun [Wed, 18 Dec 2013 09:03:34 +0000 (18:03 +0900)]
Fix binding:host_id is set to None when port update
when updating a port 'binding:host_id' is reset if not specified among
the parameter to be updated. As a result, a None value for
'binding:host_id' is sent from the notifier which might potentially
cause consumers to not work properly.
Akihiro Motoki [Thu, 5 Dec 2013 06:55:31 +0000 (15:55 +0900)]
Return request-id in API response
Import RequestIdMiddleware from oslo which ensures to request-id
in API response. CatchErrorsMiddleware is also imported to ensure
all internal exceptions are caught outermost.
api-paste.ini is updated to use them.
KeystonAuthContext middleware is updated so that it uses
request-id generated by RequestIdMiddleware.
Add middleware to openstack.conf and import all modules
under middleware directory from oslo.
DocImpact UpgradeImpact
This patch adds new WSGI middlewares "request_id" and "catch_errors".
They needs to be added to api-paste.ini when upgrading.
Currently updates to security group rules or membership
are handled by immediately triggering a call to refresh_firewall.
This call is quite expensive, and it is often executed with a
very high frequency.
With this patch, the notification handler simply adds devices for
which the firewall should be refreshed to a set, which will then
be processed in another routine. The latter is supposed to
be called in the main agent loop.
This patch for 'provider updates' simply sets a flag for refreshing
the firewall for all devices.
In order to avoid breaking other agents leveraging the security
group RPC mixin, the reactive behaviour is still available, and is
still the default way of handling security group updates.
Édouard Thuleau [Sat, 8 Feb 2014 17:28:19 +0000 (18:28 +0100)]
ML2 plugin cannot raise NoResultFound exception
The ML2 plugin cannot raise NoResultFound exception because it does not
use the correct sqlalchemy library:
'from sqlalchemy import exc as ...' instead of 'from sqlalchemy.orm
import exc as ...'
Henry Gessau [Fri, 7 Feb 2014 01:56:00 +0000 (20:56 -0500)]
Prepare for multiple cisco ML2 mech drivers
Code tree reorganization in preparation for ML2 mechanism drivers for
other cisco products. The cisco nexus ML2 mechanism driver and its
test cases need to move down into their own subdirectory.
Rich Curran [Thu, 14 Nov 2013 22:20:07 +0000 (17:20 -0500)]
ML2 Cisco Nexus MD: Create pre/post DB event handlers
Split ML2 cisco nexus event handers for update and delete
into precommit (called during DB transactions) and postcommit
(called after DB transactions) methods.
Also fixes some unit tests that were incorrectly accessing
context managers without using the "with" statement.
Sascha Peilicke [Tue, 19 Nov 2013 08:57:32 +0000 (09:57 +0100)]
Support building wheels (PEP-427)
Universal is used to identify pure-Python module(by bdist_wheel). For
these, it is sufficient to build a wheel with _any_ Python ABI version
and publish that to PyPI (by whatever means).
NVP plugin:fix delete sec group when backend is out of sync
If a security group does not exist on the NVP backend, an error
should not be raised on deletion of the security group.
This patch changes the plugin behavior by deleting the record
from the database and just logging that the security group
was not found on the NVP backend.
Sylvain Afchain [Thu, 12 Dec 2013 23:12:29 +0000 (00:12 +0100)]
Allow multiple DNS forwarders for dnsmasq
This patch change the dnsmasq_server configuration option to a ListOpt
in order to enable user to specify multiple DNS forwarders for each
dnsmasq instance.
Ihar Hrachyshka [Thu, 30 Jan 2014 12:42:29 +0000 (13:42 +0100)]
Fix passing keystone token to neutronclient instance
Neutron client expects token to be passed as token= argument, while
neutron-metadata-agent passes auth_token= instead. This effectively makes the
client to authenticate against keystone each time it's instantiated. In
neutron-metadata-agent case, it means 'each time a client sends a metadata
request.'
The issue results in high cpu utilization on keystone side when simultaneously
invoking multiple nova instances with cloud-init.
Fix race condition in network scheduling to dhcp agent
Rarely dhcp agent rpc call get_active_networks_info() can interleave
with network scheduling initiated by create.port.end notification.
In this case scheduling raises and port creation returns 500.
Need to synchronize on DhcpNetworkBindings table.
Kevin Benton [Tue, 28 Jan 2014 01:26:12 +0000 (17:26 -0800)]
Enables BigSwitch/Restproxy ML2 VLAN driver
Refactors Bigswitch/Restproxy plugin by separating into
reusable libraries that can be used by the plugin as well
as the ml2 driver to proxy calls to the backend controller.
Enables basic unit tests for the ML2 driver.
Removes deprecated separate unplug/plug operations on ports.
Fawad Khaliq [Wed, 5 Feb 2014 18:15:13 +0000 (10:15 -0800)]
Fix error message typo
* Fix error message typo in "_network_admin_state"
function where "Network Admin State Validation Falied"
should be changed to "Network Admin State Validation Failed"
Change the behaviour of the L3 agent in order to set the IP addresses
for the floating IPs on the external gateway interface after the
relevant NAT rules have been applied.
This will avoid a transitory period in which the floating IP exists
and is reachable but it not yet wired to the actual target.
Maru Newby [Tue, 14 Jan 2014 18:43:22 +0000 (18:43 +0000)]
Add an explicit tox job for functional tests
This change is in support of adding a new jenkins job dedicated
to functional testing. Functional tests will no longer be
run as part of the unit tests.
In the majority of cases a port_update notification pertains
a change in the properties affecting port filter, and does
not affect port wiring, ie: the local vlan tag.
This patch simply avoids doing port wiring/unwiring if the
local vlan tag did not change.
The extra overhead for the ovs-db get operation is offset
by the fact that get commands are generally faster than
set commands, and by avoiding executing the ovs-ofctl operation.
Process port_update notifications in the main agent loop
Instead of processing a port update notification directly in
the RPC call, the actual processing is moved into the main
rpc loop, whereas the RPC call just adds the updated port
identifier to a set of updated ports.
In this way, a port_update notification won't compete with the
main rpc loop, causing long delays into its completion under
heavy load. Also, repeated port_update notifications received
within a single iteration of the main agent loop will be
coalesced and processed only once.
This will also avoid the risk of processing notifications out
of order thus ending up with an actual configuration which
differs from the desired one.
This patch still performs L2 wiring for updated ports even if
it is necessary only when the administrative state of a port
changes.
The update_ports method has been renamed to scan_ports as the latter
name appears to be more in line with what the method actually does.
Ralf Haferkamp [Tue, 26 Nov 2013 16:38:44 +0000 (17:38 +0100)]
Reassign IP to vlan interface when deleting a VLAN bridge
When deleting a VLAN bridge that has an IP address assigned to it, don't delete
the VLAN interface, but reassigned the IP address back to the underlying VLAN
interface.
Brian Haley [Thu, 30 Jan 2014 20:05:49 +0000 (15:05 -0500)]
Change metadata-agent to have a configurable backlog
The metadata agent currently runs with a default socket backlog
of 128. This isn't enough on a busy network node, even when
spawning multiple worker processes.
This change addes a new "metadata_backlog = XX" to the ini file
to support a configurable value to help improve performance.
Brian Haley [Thu, 30 Jan 2014 19:39:47 +0000 (14:39 -0500)]
Change metadata-agent to spawn multiple workers
There is currently only one metadata-agent per network node,
which could be handling connections from hundreds or thousands
of metadata-namespace-proxy processes.
This change addes a new "metadata_workers = XX" to the ini file
to support creating more workers to help improve performance.
Evgeny Fedoruk [Wed, 29 Jan 2014 07:39:01 +0000 (23:39 -0800)]
Extending quota support for neutron LBaaS entities
Note: This change is a continuation of abandoned
change https://review.openstack.org/#/c/58720/
Previous change was abandoned due to rebase problem.
Extending quota mechanism to support neutron
LBaaS entities. Adding quota for vips, pools, members
and health monitors.
This is one of four changes related to the BP.
This one is for neutron project.
Another one is for python-neutronclient package,
another one for tempest,
and another one for horizon/openstack-dashboard project
See blueprint neutron-quota-extension for another two changes.
Tweak version nvp/nsx version validation logic for router operations
This patch improves how the nsx/nvp controller version is validated
prior to some router operations. This is done by greatly simplifying
the boolean condition.
Missing unit tests are also added for increased coverage.
Carl Baldwin [Mon, 18 Nov 2013 23:32:19 +0000 (23:32 +0000)]
Simplify ip allocation/recycling to relieve db pressure
I found that multiple calls to delete_port can pile up on the
_recycle_ip operation. This patch simplifies this operation. It
reduces the _recycle_ip operation to a single row delete in the ip
allocations table and doesn't touch the availability table.
To acheive the recycling of ips in a pool, this code runs a more
complex operation of rebuilding the availability table when it is
exhausted. Only one API process will perform this more expensive
operation and others waiting for allocation will immediately benefit.
The amortized cost of this operation is much less than the cumulative
cost of running the more expensive _recycle_ip operation for every
port delete.
IP allocation behaves a bit differently with this patch. Instead of
giving out the first IP available in a pool, the entire pool will be
allocated before wrapping around and recycling ip addresses that have
been released. This is a desirable feature as it puts ip addresses in
a sort of quarantine after they are released. It is easier to
distinguish newly allocated ips from old ones.
Carl Baldwin [Fri, 24 Jan 2014 22:35:48 +0000 (22:35 +0000)]
Reduce severity of log messages in validation methods
I noticed this while reviewing Ic2c87174. When I read through log
files, I don't want to see errors like this that come from validating
bad user input. Info severity is more appropriate.
Stephen Ma [Wed, 29 May 2013 01:52:27 +0000 (18:52 -0700)]
L3 Agent restart causes network outage
When a L3 agent controlling multiple qrouter namespaces
restarts, it destroys all qrouter namespaces even if
some of them are still in use. As a result, network
traffic could be stopped on the VMs that use the
networks associated with these namespaces.
So what is needed is for the L3 agent to preserve those
qrouter namespaces a L3 agent instance recognizes and to
destroy those it does not know about.
Maru Newby [Mon, 20 Jan 2014 19:28:03 +0000 (19:28 +0000)]
Minimize the cost of checking for api worker exit
A recent change to oslo allows the configuration of the interval
that ProcessLauncher waits between checks of child exit. The
default interval of 0.01s resulted in the neutron service consuming
unnecessary cpu cycles checking whether api workers had exited (5%
cpu on idle in a VM). This patch extends the interval to 1s to
minimize the cost of the checks.
Aaron Rosen [Mon, 13 Jan 2014 21:57:04 +0000 (13:57 -0800)]
Remove and recreate interface if already exists
If the dhcp-agent machine restarts when openvswitch comes up it logs the
following warning messages for all tap interfaces that do not exist:
bridge|WARN|could not open network device tap2cf7dbad-9d (No such device)
Once the dhcp-agent starts it recreates the interfaces and re-adds them to the
ovs-bridge. Unfortunately, ovs does not reinitialize the interfaces as they
are already in ovsdb and does not assign them a ofport number.
This situation corrects itself though the next time a port is added to the
ovs-bridge which is why no one has probably noticed this issue till now.
In order to correct this we should first remove interface that exist and
then readd them.
Carl Baldwin [Fri, 17 Jan 2014 19:28:10 +0000 (19:28 +0000)]
Use an independent iptables lock per namespace
Since iptables is independent from namespace to namespace, it makes
sense to use an independent lock per namespace. This improvement is
aimed at improving the parallel performance in the L3 agent.
In the NVP plugin, metadata processing performs several plugin operations
with the same context (and db session). The first operation might leave
persisted objects in the session instance which then conflict with objects
created in the second operation.
Roman Podoliaka [Thu, 12 Dec 2013 06:20:15 +0000 (08:20 +0200)]
Fix the migration adding a UC to agents table
The migration script mistakenly assumes that all core
plugins use agents extension, which is not true (e.g.
plumgrid and bigswitch don't).
Apply this migration script only for plugins that are
stated in the original migration script adding agents
table (511471cc46b_agent_ext_model_supp.py).
Eugene Nikanorov [Mon, 13 Jan 2014 14:58:59 +0000 (18:58 +0400)]
Fix race condition in delete_port method. Fix update_port method
Port can be gone between
l3plugin.prevent_l3_port_deletion(context, id)
and port query. Need to handle it properly.
ALso need to handle non-existing port in update_port properly.
Carl Baldwin [Tue, 12 Nov 2013 22:52:47 +0000 (22:52 +0000)]
Use information from the dnsmasq hosts file to call dhcp_release
Certain situations can cause the DHCP agent's local cache to get out
of sync with the leases held internally by dnsmasq. This method of
detecting when to call dhcp_release is idempotent and not dependent on
the cache. It is more robust.
Oleg Bondarev [Mon, 16 Dec 2013 09:17:48 +0000 (13:17 +0400)]
LBaaS: handle NotFound exceptions in update_status callback
LBaaS agent may send update_status requests to server on objects
which were already deleted from db: this is due to creating and
deleting objects with a high rate (like tempest API tests do).
As a result errors and stacktraces appear in server and agent logs.
The proposed solution is to catch NotFound exceptions and print a warning.
Édouard Thuleau [Tue, 24 Dec 2013 10:48:46 +0000 (11:48 +0100)]
[ML2] l2-pop MD handle multi create/delete ports
If more than one port is added or removed simultaneously, port db entry
have status BUILD or DOWN and pass to ACTIVE when agent have finish to
configured it.
l2-pop mechanism driver use events port pass to ACTIVE or DOWN to send
fdb entries. In case of port is the first or the last network port on
an agent, the flooding entry need to be add or removed.
This patch fix the method to determine how many ports are active on a
agent by adding filter on status port to be ACTIVE.
Sylvain Afchain [Wed, 4 Dec 2013 20:01:06 +0000 (21:01 +0100)]
Dnsmasq uses all agent IPs as nameservers
Add dhcp option which provides all agent IPs
which will be used as nameserver entries when
neutron uses multiple dhcp agent per network and
when there is no dns nameserver provided by the
neutron server.
Oleg Bondarev [Thu, 12 Dec 2013 08:13:22 +0000 (12:13 +0400)]
LBaaS: fix handling pending create/update members and health monitors
When agent requests loadbalancer logical config from server,
server returns only active pool members and health_monitors.
Need to make server return also members and monitors which are in pending states.
Also a small refactoring moving ACTIVE_PENDING set to common place
Fix pip install failure due to missing nvp.ini file
It looks like sdist does not support symlinks, therefore
letting nvp.ini point to nsx.ini is not a good solution.
Since nvp.ini is going away, leave a copy for now, but
add a warning so that users are aware of the switch,
whilst preserving full backward-compatibility.