Kevin Benton [Fri, 26 Sep 2014 04:42:39 +0000 (21:42 -0700)]
BSN: Optimistic locking strategy for consistency
Summary:
Adds an optimistic locking strategy for the Big Switch
server manager so multiple Neutron servers wanting to
communicate with the backend do not receive the consistency
hash for use simultaneously.
The bsn-rest-call semaphore is removed because serialization
is now provided by the new locking scheme.
A new DB engine is added because the consistency hashes
need a life-cycle with rollbacks and other DB operations
than cannot impact or be impacted by database operations
happening on the regular Neutron objects.
Unit tests are included for each of the new branches
introduced.
Problem Statement:
Requests to the Big Switch controllers must contain the
consistency hash value received from the previous update.
Otherwise, an inconsistency error will be triggered which
will force a synchronization. Essentially, a new backend
call must be prevented from reading from the consistency
hash table in the DB until the previous call has updated
the table with the hash from the server response.
This can be addressed by a semaphore around the rest_call
function for the single server use case and by a table lock
on the consistency table for multiple Neutron servers.
However, both solutions are inadequate because a single
Neutron server does not scale and a table lock is not
supported by common SQL HA deployments (e.g. Galera).
This issue was previously addressed by deploying servers
in an active-standby configuration. However, that only
prevented the problem for HTTP API calls. All Neutron
servers would respond to RPC messages, some of which would
result in a port update and possible backend call which
would trigger a conflict if it happened at the same time
as a backend call from another server. These unnecessary
syncs are unsustainable as the topology increases beyond
~3k VMs.
Any solution needs to be back-portable to Icehouse so new
database tables, new requirements, etc. are all out of the
question.
Solution:
This patch stores the lock for the consistency hash as a part
of the DB record. The guaruntees the database offers around
atomic insertion and constrained atomic updates offer the
primitives necessary to ensure that only one process/thread
can lock the record at once.
The read_for_update method is modified to not return the hash
in the database until an identifier is inserted into the
current record or added as a new record. By using an UPDATE
query with a WHERE clause restricting to the current state,
only one of many concurrent callers to the DB will successfully
update the rows. If a caller sees that it didn't update any
rows, it will start the process over of trying to get the
lock.
If a caller observes that the same ID has the lock for
more than 60 seconds, it will assume the holder has
died and will attempt to take the lock. This is also done
in a concurrency-safe UPDATE call since there may be many
other callers may attempt to do the same thing. If it
fails and the lock was taken by someone else, the process
will start over.
Some pseudo-code resembling the logic:
read_current_lock
if no_record:
insert_lock
sleep_and_retry if constraint_violation else return
if current_is_locked and not timer_exceeded:
sleep_and_retry
if update_record_with_lock:
return
else:
sleep_and_retry
Russell Bryant [Fri, 7 Nov 2014 15:30:15 +0000 (16:30 +0100)]
Drop usage of RpcProxy from L3PluginApi
Drop the usage of the RpcProxy compatibility class from the
L3PluginApi. The equivalent direct usage of the oslo.messaging APIs
are now being used instead.
Russell Bryant [Fri, 7 Nov 2014 14:31:15 +0000 (15:31 +0100)]
Fix client side versions in dhcp rpc API
The dhcp rpc API has two version (1.0 and 1.1). The proper way to use
versioning for this is to only specify '1.1' from the client side when
you require that the remote side implements at least version '1.1' for
the method to work. Otherwise, '1.0' should still be specified. The
previous code specified '1.1' always.
Russell Bryant [Fri, 7 Nov 2014 14:11:42 +0000 (15:11 +0100)]
Drop usage of RpcProxy from DhcpPluginApi
Drop the usage of the RpcProxy compatibility class from the
DhcpPluginApi RPC client class. The implementation has been updated
to use the appropariate APIs from oslo.messaging directly.
Gary Kotton [Sat, 17 May 2014 06:48:21 +0000 (23:48 -0700)]
Update i18n translation for neutron.agents log msg's
Don't translate debug level logs and enforce log hints
Our translation policy
(https://wiki.openstack.org/wiki/LoggingStandards#Log_Translation) calls
for not translating debug level logs. This is to help prioritize log
translation. Furthermore translation has a performance overhead, even if
the log isn't used (since neutron doesn't support lazy translation yet).
NOTE: this is done on a directory by directory basis to ensure that we
do not have too many conflicts and rebases.
Add a local hacking rule to enforce this.
This patch set enforces the directory neutron/agents
Mark McClain [Thu, 12 Jun 2014 01:23:53 +0000 (21:23 -0400)]
enable F811 check for flake8
This change incorporates two cleanups that do not change logic:
- Removes the shadowed unused imports by using the proper oslo.config import
mechanism
- duplicate unit tests have been removed
- duplicate unit test names have been corrected to reflect true test
nature
Cleanup and refactor methods in unit/test_security_groups_rpc
We had strings repeating all along the code which already were
in constant form, those have been refactored. Also global configuration
changes are now handled by functions to enhance code readability.
Elena Ezhova [Thu, 18 Sep 2014 07:53:24 +0000 (11:53 +0400)]
Updated policy module from oslo-incubator
Common policy has not been synced with oslo-incubator for a
long time and is seriously outdated.
This change pulls in fresh code from oslo-incubator which
introduces the Enforcer class to replace the old check function.
Rewrite neutron.policy using naming conventions and approach
that was set in Nova and amend related unit tests.
Remove neutron.common.exceptions.PolicyNotAuthorized and switch
to neutron.openstack.common.policy.PolicyNotAuthorized.
Drop Neutron specific policy_file option since now it is defined
in oslo-incubator policy module.
Change log: 4ca5091 Fixes nits in module policy 262fc82 Correct default rule name for policy.Enforcer 9e8b9f6 Minor fixes in policy module 6c706c5 Delete graduated serialization files 5d40e14 Remove code that moved to oslo.i18n aebb58f Fix typo to show correct log message bb410d9 Use MultiStrOpt for policy_dirs 33f44bf Add support for policy configration directories 2b966f9 Fix deletion of cached file for policy enforcer 238e601 Make policy debug logging less verbose fe3389e Improve help strings 15722f1 Adds a flag to determine whether to reload the rules in policy 5d1f15a Documenting policy.json syntax fcf517d Update oslo log messages with translation domains e038d89 Fix policy tests for parallel testing 0da5de6 Allow policy.json resource vs constant check e4b2334 Replaces use of urlutils with six in policy module 8b2b0b7 Use hacking import_exceptions for gettextutils._ 0d8f18b Use urlutils functions instead of urllib/urllib2 12bcdb7 Remove vim header 9ef9fec Use six.string_type instead of basestring 4bfb7a2 Apply six for metaclass 1538c80 ConfigFileNotFoundError with proper argument 33533b0 Keystone user can't perform revoke_token 64bb5e2 Fix wrong argument in openstack common policy b7edc99 Fix missing argument bug in oslo common policy 3626b6d Fix policy default_rule issue 7bf8ee9 Allow use of hacking 0.6.0 and enable new checks e4ac367 Fix missing argument bug in oslo common policy 1a2df89 Enable H302 hacking check 7119e29 Enable hacking H404 test. 6d27681 Enable H306 hacking check. 1091b4f Reduce duplicated code related to policies
Michael Smith [Mon, 10 Nov 2014 23:49:14 +0000 (15:49 -0800)]
Fix for FIPs duplicated across hosts for DVR
For DVR, FIPs should be hosted on the single node
which hosts the VM assigned with the fixed_ip of the FIP.
The l3_agent should only take action on the correct FIP per
host by filtering the FIPs based on the 'host' value
of the FIP.
A recent refactor on the l3_agent moved the host filtering logic
from process_router_floating_ip_addresses() to
_get_external_device_interface_name(). The local floating_ips var
was not altered as it was before the refactor.
This resulted in network disruption across multiple hosts
since more than one namespace contained the FIP. This problem
would only be seen in a mutli-host environment where the same
router hosting FIPs was present on more than one node.
The fix is to return the host filtering logic by adding a
call to get_floating_ips(). In addition, the unit test
test_process_router_dist_floating_ip_add() was modified to
pass two FIPs instead of one. One FIP matches the host
of the agent, one does not. Only one should be processed,
not two.
Russell Bryant [Thu, 6 Nov 2014 23:21:10 +0000 (00:21 +0100)]
Remove neutron.common.rpc.RPCException
Remove RPCException, which was just mapped directly to
oslo.messaging.MessagingException for the purposes of minimizing the
impact to the code base when moving from openstack.common.rpc to
oslo.messaging.
Paul Michali [Wed, 24 Sep 2014 22:22:04 +0000 (18:22 -0400)]
Cisco VPNaaS and L3 router plugin integration
This completes the blueprint effort to integrate the Cisco VPNaaS
drivers with the Cisco L3 router plugin for a VPN solution that
uses an in-band Cisco CSR (versus out-of-band, as is currently).
The VPNaaS service driver is modified to no longer read router
configuration from an INI file, but instead, will query the L3
router plugin for this information. The config loading logic is
removed. The router information is transformed and passed to the
device driver for use by the REST client.
The L3 router plugin now provides two methods to provide the needed
router and hosting device info. In addition, the template and
snippets used to provision the Cisco CSR are modified to have the
needed VPN settings and to change the VRF configuration to the new
style to support VPN REST API calls with VRF info.
The VPNaaS device driver was modified to provide the VRF name as
part of the URI for several VPN REST API calls, as needed for this
solution. The device driver uses the additional VRF and interface
information that is provided by the service driver.
The unit tests were updated to reflect the usage of VRF names, and
the new logic for checking router information.
lzklibj [Tue, 21 Oct 2014 06:55:53 +0000 (23:55 -0700)]
fix event_send for re-assign floating ip
Neutron can associate a floating ip to a new port
without disassociate from original instance port.
This situation will send network changed event only
for new instance port, and that event object contains
the new instance's id.
In this case nova will update new instance's info
but not original one's in nova's database table
instance_info_caches. For nova can get new instance's
id from the above event. So in table instance_info_caches,
both original instance and new instance will have the
same floating ip in their records. And this make it
possible that, in most situation, after your re-assign
floating ip, run "nova list" will return incorrect info,
multiple instances have a same floating ip, and this may
confuse users.
Nova will sync data in table instance_info_caches, but it
may take dozens of seconds.
The new added code will send network changed event for the
original instance, and this will make nova update instance_
_info_caches table in a few seconds.
Sachi King [Sun, 2 Nov 2014 13:35:51 +0000 (00:35 +1100)]
Fix L3 HA network creation to allow user to create router
Update HA Network creation to use an admin context to allow Neutron
to create the tenant-less network required for the HA router when
it does not yet exist and is being created by a non-admin user.
Neutron creates these resources without a tenant so users cannot see
or modify the HA network, ports, etc. Port creation and association
already use elivated admin contexts to allow their function when
an user attempts to create a HA L3 router.
Samer Deeb [Tue, 4 Nov 2014 11:58:19 +0000 (13:58 +0200)]
SRIOV: Fix Wrong Product ID for Intel NIC example
Some Examples and default Values of the Product ID for Intel NIC contains
Wrong Product ID that belongs to the PF product,
it should be replaced with the product ID of the VF.
Since python2.6, python has a proper ternary construct "A if PRED else
B". The older idiom "PRED and A or B" has a hidden trap - when A is
itself false, the result is (unexpectedly) B.
This change removes all cases of the older construct found using a
trivial git grep " and .* or " - except one case in oslo common
code (fixed in oslo upstream).
Edgar Magana [Thu, 29 Aug 2013 23:28:26 +0000 (16:28 -0700)]
Include call to delete_subnet from delete_network at DB level
Removes an extra lock in bsn plugin that causes a deadlock
when delete_subnet is invoked form delete_network, agreed
with kevinbenton to remove it
Modifies a unit test to cover this change per reviewer request
Co-author amirosh
Elena Ezhova [Wed, 5 Nov 2014 10:01:51 +0000 (13:01 +0300)]
Replace "nova" entries in iptables_manager with "neutron"
In iptables_manager docstrings there are still some references to
nova left from nova/network/linux_net.py.
Remove these references and update the docstrings.
Brian Haley [Thu, 25 Sep 2014 01:45:06 +0000 (21:45 -0400)]
Make L2 DVR Agent start successfully without an active neutron server
If the L2 Agent is started before the neutron controller
is available, it will fail to obtain its unique DVR MAC
address, and fall-back to operate in non-DVR mode
permanently.
This fix does two things:
1. Makes the L2 Agent attempt to retry obtaining a DVR MAC
address up to five times on initialization, which should be
enough time for RPC to be successful. On failure, it will
fall back to non-DVR mode, ensuring that basic switching
continues to be functional.
2. Correctly obtains the current operating mode of the
L2 Agent in _report_state(), instead of only reporting
the configured state. This operating mode is carried
in 'in_distributed_mode' attribute of agent state, and
is separate from the existing enable_distributed_routing
static config that is already sent.