Ivar Lazzaro [Tue, 14 Jan 2014 19:17:05 +0000 (11:17 -0800)]
Embrane Tempest Compliance
This changeset tracks the changes needed by the Embrane's Neutron Plugin
in order to consistently pass tempest tests.
Changes:
- Some db transactions were too long and were causing lock timeout
exception. Removed useless transactions (waiting on non-db tasks to complete)
to fix the problem.
- The operation filter was useless, and breaking the tests. Most of the
logic which guarantees the appliance correct state when an operation in executed
is now in the internal library used for the heleos APIs.
The filter was therefore removed (as well as the corresponding exception).
- Fixed "sync" mode. The behavior was incorrect due to the queue timeout.
Furthermore, parallel requests were not waiting on the correct thread.
- Added missing methods for floating IPs (not all the scenarios were covered).
Édouard Thuleau [Thu, 16 Jan 2014 09:15:07 +0000 (10:15 +0100)]
Update help message of flag 'enable_isolated_metadata'
Thanks to the commit c73b54e50b62c489f04432bdbc5bee678b18226e,
the way of DHCP agent determines how a subnet is isolated evolves.
But the flag help message wasn't updtated accordingly to this evolution.
shihanzhang [Tue, 18 Feb 2014 01:50:57 +0000 (09:50 +0800)]
Fix invalid facilities documented in rootwrap.conf
The values user0 and user1 do not map to valid facility values.
local1, etc. Using user0 results in a pri value that does not map
back to a facility of the same name in syslog.
RFC5424 suggest values values of local0 through local7. Setting
syslog_log_facility to one of those values results in a message with a
priority that can be mapped back to the original string value.
This fix adjusts the comment in rootwrap.conf to suggest the local
prefix instead of the user prefix.
This patch replaces regex matching of text output with parsing
of JSON output in ovs_lib.get_vif_port_by_id.
This makes the code more reliable as subtle, possibly even
cosmetic, changes in ovs-vsctl output format could cause the
regular expression match to fail.
Also, this makes the code consistent with ovs_lib.get_vif_port_set
which already uses JSON output.
Finally this patch slightly changes the behaviour of
ovs_lib.get_vif_port_by_id returning None if elements such as
mac address or ofport were not available.
test_router_add_interface_subnet_with_port_from_other_tenant
is causing intermittent failures in unit tests because of
issues related with sql session autoflush.
This patch skips this test, since it is already covered
by another test case in the same module. This should prevent
job failures while the relevant bug is addressed.
Fix request timeout errors during calls to NSX controller
Sometimes two correlated exception traces are observed in
the server log for the Neutron Server backed by NSX:
RequestTimeout (The nsx request has timed out) and
OperationalError (Lock wait timeout exceeded). This is
generally described by Guru Salvatore Orlando as the,
and I quote, the "infamous eventlet-mysql deadlock".
This patch tries to address the issue by adding a
cooperative yield in the nsx client code (it’s a good idea
to call sleep(0) occasionally in any case) and also by
avoiding the unnecessary spawning of another Greenthread
within a call that is already executed in Greenthred
itself.
Carl Baldwin [Wed, 15 Jan 2014 18:46:17 +0000 (18:46 +0000)]
L3 agent fetches the external network id once
Rather than fetching the id of the external network each time that
_process_routers is called, get it once and remember it. If the agent
is ever requested to connect to a different ext-net then it will fetch
the current ext-net to double check for the unlikely event that the
ext-net has changed. If it has then it will remember the new ext-net.
This is only applicable in the case where there is only one ext-net
that has not been configured explicitly in the config file. That was
the only case that would cause an RPC message in the first place.
This patch changes get_vif_port_set in order to not return
OVS ports for which the ofport is not yet assigned, thus avoiding
a regex match failure in get_vif_port_by_id.
Because of this failure, treat_vif_port is unable to wire
the port.
As get_vif_port_by_id is also used elsewhere in the agent, it has
been enhanced in order to tolerate situations in which ofport might
have not yet been assigned.
The ofport field is added to the list of those monitored by the
SimpleInterfaceMonitor. This will guarantee an event is generated
when the ofport is assigned to a port. Otherwise there is a risk
a port would be never processed if it was not yet ready the first
time is was detected. This change won't trigger any extra processing
on the agent side.
Finally, this patch avoids fetching device details from the plugin
for ports which have disappeared from the OVS bridge. This is a
little optimization which might be beneficial for short lived ports.
Ensure that session is rolled back on bulk creates
During bulk creates, the session is began explicitely;
ensure that it gets rolled back before re-raising in
order to avoid triggering InvalidRequestError
exceptions when the session is reused.
This patch introduces DB mappings between neutron and NSX router,
thus not requiring anymore the Neutron router ID to be equal to the
NSX one.
This change is needed for enabling asynchronous operations in
the NSX plugin.
This patch also performs NVP/NSX renaming where appropriate, and
fixes delete router logic causing a 500 HTTP error to be returned
when a Neutron internal error occurs.
Related to blueprint nvp-async-backend-communication
Related to blueprint nicira-plugin-renaming
Akihiro Motoki [Wed, 12 Feb 2014 07:38:10 +0000 (16:38 +0900)]
nec plugin: Compare OFS datapath_id as hex int
Previously NEC plugin compares old and new datapath_ids as
a string and zero padding in hex notation is not taken into
account when compared. This causes unintended deletion and
recreation of a port on OpenFlow controller. This patch fixes
this issue by comparing datapath_ids as hex int.
Mark T. Voelker [Wed, 12 Feb 2014 16:45:32 +0000 (11:45 -0500)]
Lowercase OVS sample config section headers
The "Sample Configurations" section of ovs_neutron_plugin.ini
has uppercased section headers. In Havana the section headers
were normalized to lowercase, but the sample configs were never
updated.
This patch introduces DB mappings between neutron network and NSX
logical switches, thus not requiring anymore the Neutron network
ID to be equal to the NSX one.
This change is necessary for enabling asynchronous operations in
the NSX plugin.
This patch also performs NVP/NSX renaming where appropriate.
Related to blueprint nvp-async-backend-communication
Related to blueprint nicira-plugin-renaming
Akihiro Motoki [Mon, 10 Feb 2014 06:24:54 +0000 (15:24 +0900)]
Raise an error from ovs_lib list operations
Previously list operations in ovs_lib returns an empty list
if RuntimeError occurs and a caller cannot distinguish an error
from normal results. This commit changes ovs_lib list operations
(get_vif_port_set, get_vif_ports, get_bridges) to raise an
exception when RuntimeError occurs.
Note: callers of these commands are ovs/nec/ryu-agent and ovs_cleanup.
- plugin agents: these commands are inside in try/except clause
in daemon loop and there is no need to change.
- ovs_cleanup: there is no error catch logic in main() at now
and it calls commands other than ovs_lib, so it can be cleanup
later if required.
It also fixes the code to use excutils.save_and_reraise_exception
when reraising an exception.
Arata Notsu [Fri, 10 Jan 2014 10:54:10 +0000 (19:54 +0900)]
Fix ValueError in ip_lib.IpRouteCommand.get_gateway()
As metric is not necessarily the 5th word of the gateway line, the
method should search the string 'metric' in the line and pick the next
word as the metric value.
Jakub Libosvar [Mon, 27 Jan 2014 17:09:26 +0000 (18:09 +0100)]
Move db migration of ml2 security groups to havana
ml2 plugin is a havana feature. Currently securitygroups table are
created in chain of migration after havana release. It causes db
migration failure when migrating from havana to current head because
securitygroups table is attempted to be created although it was already
created by create_all().
Julia Varlamova [Tue, 11 Feb 2014 14:04:01 +0000 (18:04 +0400)]
Sync latest oslo.db code into neutron
Changes that were ported from oslo:
b4f72b2 Don't raise MySQL 2013 'Lost connection' errors 271adfb Format sql in db.sqlalchemy.session docstring 0334cb3 Handle exception messages with six.text_type eff69ce Drop dependency on log from oslo db code 7a11a04 Automatic retry db.api query if db connection lost 11f2add Clean up docstring in db.sqlalchemy.session 1b5147f Only enable MySQL TRADITIONAL mode if we're running against MySQL 39e1c5c Move db tests base.py to common code 986dafd Fix parsing of UC errors in sqlite 3.7.16+/3.8.2+ 9a203e6 Use dialect rather than a particular DB API driver 1779029 Move helper DB functions to db.sqlalchemy.utils bcf6d5e Small edits on help strings ae01e9a Transition from migrate to alembic 70ebb19 Fix mocking of utcnow() for model datetime cols 7aa94df Add a db check for CHARSET=utf8 aff0171 Remove "vim: tabstop=4 shiftwidth=4 softtabstop=4" from headers fa0f36f Fix database connection string is secret 8575d87 Removed copyright from empty files d08d27f Fix the obsolete exception message 8b2b0b7 Use hacking import_exceptions for gettextutils._ 9bc593e Add docstring for exception handlers of session 855644a Removal of _REPOSITORY global variable. ea6caf9 Remove string.lowercase usage a33989e Remove eventlet tpool from common db.api e40903b Database hook enabling traditional mode at MySQL f2115a0 Replace xrange in for loop with range c802fa6 SQLAlchemy error patterns improved 1c1f199 Remove unused import 6d0a6c3 Correct invalid docstrings 135dd00 Remove start index 0 in range() 28f8fd5 Make _extra_keys a property of ModelBase 45658e2 Fix violations of H302:import only modules bb4d7a2 Enables db2 server disconnects to be handled pessimistically 915f8ab db.sqlalchemy.session add [sql].idle_timeout e6494c2 Use six.iteritems to make dict work on Python2/3 48cfb7b Drop dependency on processutils from oslo db code 4c47d3e Fix locking in migration tests c2ee282 Incorporating MIT licensed code c5a1088 Typos fix in db and periodic_task module fb0e86a Use six.moves.configparser instead of ConfigParser 1dd4971 fix typo in db session docstring 8a01dd8 The ability to run tests at various backend 0fe4e28 Use log.warning() instead of log.warn() in oslo.db 12bcdb7 Remove vim header 4c22556 Use py3kcompat urlutils functions instead of urlparse ca7a2ab Don't use deprecated module commands 6603e8f Remove sqlalchemy-migrate 0.7.3 patching 274c7e2 Drop dependency on lockutils from oslo db code 97d8cf4 Remove lazy loading of database backend 2251cb5 Do not name variables as builtins 3acd57c Add db2 communication error code when check the db connection c2dcf6e Add [sql].connection as deprecated opt for db 001729d Modify SQLA session due to dispose of eventlet c2dcf6e Add [sql].connection as deprecated opt for db 001729d Modify SQLA session due to dispose of eventlet 4de827a Clean up db.sqla.Models.extra_keys interface 347f29e Use functools.wrap() instead of custom implementation 771d843 Move base migration test classes to common code 9721129 exception: remove 56ff3b3 Use single meta when change column type 3f2f70e Helper function to sanitize db url credentials df3f2ba BaseException.message is deprecated since Python 2.6 c76be5b Add function drop_unique_constraint() d4d8126 Change sqlalchemy/utils.py mode back to 644 cf41936 Move sqlalchemy migration from Nova 5758360 Raise ValueError if sort_dir is unknown 31c1995 python3: Add python3 compatibility support 3972c3f Migrate sqlalchemy utils from Nova 1a2df89 Enable H302 hacking check 3f503fa Add a monkey-patching util for sqlalchemy-migrate 7ba5f4b Don't use mixture of cfg.Opt() deprecated args
Add migration support from agent to NSX dhcp/metadata services
This is feature patch (3 of 3) that introduces support for
transitioning existing NSX-based deployments from the agent
based model of providing dhcp and metadata proxy services
to the new agentless based mode. In 'combined' mode, existing
networks will still be served by the existing infrastructure,
whereas new networks will be served by the new infrastructure.
Networks may be migrated to the model using a new CLI tool
provided, called 'neutron-nsx-manage'. Currently the tool
provides two admin-only commands:
neutron-nsx-manage net-report <net-id-or-name>
This will check that the network can be migrated and returns
the resources currently in use. And:
neutron-nsx-manage net-migrate <net-id-or-name>
This will move the network over the new model and deallocate
resources from the agent. Once a network has been migrated
there is no turning back.
The NSX plugin does not allow to reassociate a floating IP to
a different internal IP address on the same port where it's
currently associated.
This patch fixes this behaviour and adds a unit test to ensure
re-association on the same port with a different IP is possible.
A few tweaks to the unit test aux functions were necessary to
accomodate the newly introduced unit test.
Terry Wilson [Fri, 24 Jan 2014 19:34:15 +0000 (13:34 -0600)]
Remove psutil dependency
The version of psutil that was being required is not hosted on
PyPi which caused some issues. This patch removes the psutil
dependency in favor of using the method that was proposed for
the havana backport of polling minimization.
hyunsun [Wed, 18 Dec 2013 09:03:34 +0000 (18:03 +0900)]
Fix binding:host_id is set to None when port update
when updating a port 'binding:host_id' is reset if not specified among
the parameter to be updated. As a result, a None value for
'binding:host_id' is sent from the notifier which might potentially
cause consumers to not work properly.
Akihiro Motoki [Thu, 5 Dec 2013 06:55:31 +0000 (15:55 +0900)]
Return request-id in API response
Import RequestIdMiddleware from oslo which ensures to request-id
in API response. CatchErrorsMiddleware is also imported to ensure
all internal exceptions are caught outermost.
api-paste.ini is updated to use them.
KeystonAuthContext middleware is updated so that it uses
request-id generated by RequestIdMiddleware.
Add middleware to openstack.conf and import all modules
under middleware directory from oslo.
DocImpact UpgradeImpact
This patch adds new WSGI middlewares "request_id" and "catch_errors".
They needs to be added to api-paste.ini when upgrading.
Currently updates to security group rules or membership
are handled by immediately triggering a call to refresh_firewall.
This call is quite expensive, and it is often executed with a
very high frequency.
With this patch, the notification handler simply adds devices for
which the firewall should be refreshed to a set, which will then
be processed in another routine. The latter is supposed to
be called in the main agent loop.
This patch for 'provider updates' simply sets a flag for refreshing
the firewall for all devices.
In order to avoid breaking other agents leveraging the security
group RPC mixin, the reactive behaviour is still available, and is
still the default way of handling security group updates.
Édouard Thuleau [Sat, 8 Feb 2014 17:28:19 +0000 (18:28 +0100)]
ML2 plugin cannot raise NoResultFound exception
The ML2 plugin cannot raise NoResultFound exception because it does not
use the correct sqlalchemy library:
'from sqlalchemy import exc as ...' instead of 'from sqlalchemy.orm
import exc as ...'
Henry Gessau [Fri, 7 Feb 2014 01:56:00 +0000 (20:56 -0500)]
Prepare for multiple cisco ML2 mech drivers
Code tree reorganization in preparation for ML2 mechanism drivers for
other cisco products. The cisco nexus ML2 mechanism driver and its
test cases need to move down into their own subdirectory.
Rich Curran [Thu, 14 Nov 2013 22:20:07 +0000 (17:20 -0500)]
ML2 Cisco Nexus MD: Create pre/post DB event handlers
Split ML2 cisco nexus event handers for update and delete
into precommit (called during DB transactions) and postcommit
(called after DB transactions) methods.
Also fixes some unit tests that were incorrectly accessing
context managers without using the "with" statement.
Sascha Peilicke [Tue, 19 Nov 2013 08:57:32 +0000 (09:57 +0100)]
Support building wheels (PEP-427)
Universal is used to identify pure-Python module(by bdist_wheel). For
these, it is sufficient to build a wheel with _any_ Python ABI version
and publish that to PyPI (by whatever means).
NVP plugin:fix delete sec group when backend is out of sync
If a security group does not exist on the NVP backend, an error
should not be raised on deletion of the security group.
This patch changes the plugin behavior by deleting the record
from the database and just logging that the security group
was not found on the NVP backend.
Sylvain Afchain [Thu, 12 Dec 2013 23:12:29 +0000 (00:12 +0100)]
Allow multiple DNS forwarders for dnsmasq
This patch change the dnsmasq_server configuration option to a ListOpt
in order to enable user to specify multiple DNS forwarders for each
dnsmasq instance.
Ihar Hrachyshka [Thu, 30 Jan 2014 12:42:29 +0000 (13:42 +0100)]
Fix passing keystone token to neutronclient instance
Neutron client expects token to be passed as token= argument, while
neutron-metadata-agent passes auth_token= instead. This effectively makes the
client to authenticate against keystone each time it's instantiated. In
neutron-metadata-agent case, it means 'each time a client sends a metadata
request.'
The issue results in high cpu utilization on keystone side when simultaneously
invoking multiple nova instances with cloud-init.
Fix race condition in network scheduling to dhcp agent
Rarely dhcp agent rpc call get_active_networks_info() can interleave
with network scheduling initiated by create.port.end notification.
In this case scheduling raises and port creation returns 500.
Need to synchronize on DhcpNetworkBindings table.
Kevin Benton [Tue, 28 Jan 2014 01:26:12 +0000 (17:26 -0800)]
Enables BigSwitch/Restproxy ML2 VLAN driver
Refactors Bigswitch/Restproxy plugin by separating into
reusable libraries that can be used by the plugin as well
as the ml2 driver to proxy calls to the backend controller.
Enables basic unit tests for the ML2 driver.
Removes deprecated separate unplug/plug operations on ports.
Fawad Khaliq [Wed, 5 Feb 2014 18:15:13 +0000 (10:15 -0800)]
Fix error message typo
* Fix error message typo in "_network_admin_state"
function where "Network Admin State Validation Falied"
should be changed to "Network Admin State Validation Failed"
Change the behaviour of the L3 agent in order to set the IP addresses
for the floating IPs on the external gateway interface after the
relevant NAT rules have been applied.
This will avoid a transitory period in which the floating IP exists
and is reachable but it not yet wired to the actual target.
Maru Newby [Tue, 14 Jan 2014 18:43:22 +0000 (18:43 +0000)]
Add an explicit tox job for functional tests
This change is in support of adding a new jenkins job dedicated
to functional testing. Functional tests will no longer be
run as part of the unit tests.
In the majority of cases a port_update notification pertains
a change in the properties affecting port filter, and does
not affect port wiring, ie: the local vlan tag.
This patch simply avoids doing port wiring/unwiring if the
local vlan tag did not change.
The extra overhead for the ovs-db get operation is offset
by the fact that get commands are generally faster than
set commands, and by avoiding executing the ovs-ofctl operation.
Process port_update notifications in the main agent loop
Instead of processing a port update notification directly in
the RPC call, the actual processing is moved into the main
rpc loop, whereas the RPC call just adds the updated port
identifier to a set of updated ports.
In this way, a port_update notification won't compete with the
main rpc loop, causing long delays into its completion under
heavy load. Also, repeated port_update notifications received
within a single iteration of the main agent loop will be
coalesced and processed only once.
This will also avoid the risk of processing notifications out
of order thus ending up with an actual configuration which
differs from the desired one.
This patch still performs L2 wiring for updated ports even if
it is necessary only when the administrative state of a port
changes.
The update_ports method has been renamed to scan_ports as the latter
name appears to be more in line with what the method actually does.
Ralf Haferkamp [Tue, 26 Nov 2013 16:38:44 +0000 (17:38 +0100)]
Reassign IP to vlan interface when deleting a VLAN bridge
When deleting a VLAN bridge that has an IP address assigned to it, don't delete
the VLAN interface, but reassigned the IP address back to the underlying VLAN
interface.
Brian Haley [Thu, 30 Jan 2014 20:05:49 +0000 (15:05 -0500)]
Change metadata-agent to have a configurable backlog
The metadata agent currently runs with a default socket backlog
of 128. This isn't enough on a busy network node, even when
spawning multiple worker processes.
This change addes a new "metadata_backlog = XX" to the ini file
to support a configurable value to help improve performance.
Brian Haley [Thu, 30 Jan 2014 19:39:47 +0000 (14:39 -0500)]
Change metadata-agent to spawn multiple workers
There is currently only one metadata-agent per network node,
which could be handling connections from hundreds or thousands
of metadata-namespace-proxy processes.
This change addes a new "metadata_workers = XX" to the ini file
to support creating more workers to help improve performance.
Evgeny Fedoruk [Wed, 29 Jan 2014 07:39:01 +0000 (23:39 -0800)]
Extending quota support for neutron LBaaS entities
Note: This change is a continuation of abandoned
change https://review.openstack.org/#/c/58720/
Previous change was abandoned due to rebase problem.
Extending quota mechanism to support neutron
LBaaS entities. Adding quota for vips, pools, members
and health monitors.
This is one of four changes related to the BP.
This one is for neutron project.
Another one is for python-neutronclient package,
another one for tempest,
and another one for horizon/openstack-dashboard project
See blueprint neutron-quota-extension for another two changes.
Tweak version nvp/nsx version validation logic for router operations
This patch improves how the nsx/nvp controller version is validated
prior to some router operations. This is done by greatly simplifying
the boolean condition.
Missing unit tests are also added for increased coverage.
Carl Baldwin [Mon, 18 Nov 2013 23:32:19 +0000 (23:32 +0000)]
Simplify ip allocation/recycling to relieve db pressure
I found that multiple calls to delete_port can pile up on the
_recycle_ip operation. This patch simplifies this operation. It
reduces the _recycle_ip operation to a single row delete in the ip
allocations table and doesn't touch the availability table.
To acheive the recycling of ips in a pool, this code runs a more
complex operation of rebuilding the availability table when it is
exhausted. Only one API process will perform this more expensive
operation and others waiting for allocation will immediately benefit.
The amortized cost of this operation is much less than the cumulative
cost of running the more expensive _recycle_ip operation for every
port delete.
IP allocation behaves a bit differently with this patch. Instead of
giving out the first IP available in a pool, the entire pool will be
allocated before wrapping around and recycling ip addresses that have
been released. This is a desirable feature as it puts ip addresses in
a sort of quarantine after they are released. It is easier to
distinguish newly allocated ips from old ones.
Carl Baldwin [Fri, 24 Jan 2014 22:35:48 +0000 (22:35 +0000)]
Reduce severity of log messages in validation methods
I noticed this while reviewing Ic2c87174. When I read through log
files, I don't want to see errors like this that come from validating
bad user input. Info severity is more appropriate.
Stephen Ma [Wed, 29 May 2013 01:52:27 +0000 (18:52 -0700)]
L3 Agent restart causes network outage
When a L3 agent controlling multiple qrouter namespaces
restarts, it destroys all qrouter namespaces even if
some of them are still in use. As a result, network
traffic could be stopped on the VMs that use the
networks associated with these namespaces.
So what is needed is for the L3 agent to preserve those
qrouter namespaces a L3 agent instance recognizes and to
destroy those it does not know about.
Maru Newby [Mon, 20 Jan 2014 19:28:03 +0000 (19:28 +0000)]
Minimize the cost of checking for api worker exit
A recent change to oslo allows the configuration of the interval
that ProcessLauncher waits between checks of child exit. The
default interval of 0.01s resulted in the neutron service consuming
unnecessary cpu cycles checking whether api workers had exited (5%
cpu on idle in a VM). This patch extends the interval to 1s to
minimize the cost of the checks.
Aaron Rosen [Mon, 13 Jan 2014 21:57:04 +0000 (13:57 -0800)]
Remove and recreate interface if already exists
If the dhcp-agent machine restarts when openvswitch comes up it logs the
following warning messages for all tap interfaces that do not exist:
bridge|WARN|could not open network device tap2cf7dbad-9d (No such device)
Once the dhcp-agent starts it recreates the interfaces and re-adds them to the
ovs-bridge. Unfortunately, ovs does not reinitialize the interfaces as they
are already in ovsdb and does not assign them a ofport number.
This situation corrects itself though the next time a port is added to the
ovs-bridge which is why no one has probably noticed this issue till now.
In order to correct this we should first remove interface that exist and
then readd them.
Carl Baldwin [Fri, 17 Jan 2014 19:28:10 +0000 (19:28 +0000)]
Use an independent iptables lock per namespace
Since iptables is independent from namespace to namespace, it makes
sense to use an independent lock per namespace. This improvement is
aimed at improving the parallel performance in the L3 agent.
In the NVP plugin, metadata processing performs several plugin operations
with the same context (and db session). The first operation might leave
persisted objects in the session instance which then conflict with objects
created in the second operation.