John Perkins [Mon, 6 Oct 2014 21:24:57 +0000 (16:24 -0500)]
Fix hostname regex pattern
Current hostname_pattern regex complexity grows exponentially
when given a string of just digits, which can be exploited to
cause neutron-server to freeze.
lzklibj [Tue, 21 Oct 2014 06:55:53 +0000 (23:55 -0700)]
fix event_send for re-assign floating ip
Neutron can associate a floating ip to a new port
without disassociate from original instance port.
This situation will send network changed event only
for new instance port, and that event object contains
the new instance's id.
In this case nova will update new instance's info
but not original one's in nova's database table
instance_info_caches. For nova can get new instance's
id from the above event. So in table instance_info_caches,
both original instance and new instance will have the
same floating ip in their records. And this make it
possible that, in most situation, after your re-assign
floating ip, run "nova list" will return incorrect info,
multiple instances have a same floating ip, and this may
confuse users.
Nova will sync data in table instance_info_caches, but it
may take dozens of seconds.
The new added code will send network changed event for the
original instance, and this will make nova update instance_
_info_caches table in a few seconds.
Numan Siddique [Sat, 11 Oct 2014 12:08:05 +0000 (17:38 +0530)]
Fix KeyError in dhcp_rpc when plugin.port_update raise exception
KeyError exception is seen because of following reasons
* DhcpRpcCallback._port_action() is called by two functions
- DhcpRpcCallback.create_dchp_port()
- DhcpRpcCallback.update_dhcp_port()
* When create_dhcp_port() function calls _port_action(), the
function argument 'port' will have the body as
{'port': {'network_id': foo_network_id, 'fixed_ips': [..] ...}
* When update_dhcp_port() function calls _port_action(), the
function argument 'port' will have the body as
{'id': port_id, 'port': {{'port': {'network_id': foo_network_id,
'fixed_ips': [..] ...}}
* If an exception occurs when _port_action() calls plugin.create_port(),
network id is accessed as
net_id = port['port']['network_id']
* If an exception occurs when _port_action() calls plugin.update_port(),
network id is accessed as
net_id = port['port']['network_id']
which is causing the KeyError. network_id should have been accessed as
net_id = port['port']['port']['network_id']
This patch fixes the issue by making the _port_action() take the
same port body. update_dhcp_port() insteading of passing the port_id
and port information in a single argument, it now adds port_id
in the port body itself.
Sachi King [Sun, 2 Nov 2014 13:35:51 +0000 (00:35 +1100)]
Fix L3 HA network creation to allow user to create router
Update HA Network creation to use an admin context to allow Neutron
to create the tenant-less network required for the HA router when
it does not yet exist and is being created by a non-admin user.
Neutron creates these resources without a tenant so users cannot see
or modify the HA network, ports, etc. Port creation and association
already use elivated admin contexts to allow their function when
an user attempts to create a HA L3 router.
This commit stops ignoring 400 (Bad Request) HTTP codes returned by ODL
in create and update operations. It also modifies sendjson() because it
doesn't ignore any HTTP error code now.
While plugging vif, VIFDriver in Nova follows "ovs_hybrid_plug" and
"port_filter" in "binding:vif_detail" which is passed from Neutron, but
those are always true. This patch make ML2 OVS mech driver set those
param depends on enable_security_group flag. It enables users to avoid
ovs_hybrid plugging.
This patch also fixes the same issue in the following plugins/drivers:
* NEC Plugin
* BigSwitch Plugin
* Ryu Plugin
* ML2 Plugin - OFAgent Mech Driver
Kevin Benton [Fri, 26 Sep 2014 16:40:44 +0000 (09:40 -0700)]
Batch ports from security groups RPC handler
The security groups RPC handler calls get_port_from_device
individually for each device in a list it receives. Each
one of these results in a separate SQL query for the security
groups and port details. This becomes very inefficient as the
number of devices on a single node increases.
This patch adds logic to the RPC handler to see if the core
plugin has a method to lookup all of the device IDs at once.
If so, it uses that method, otherwise it continues as normal.
The ML2 plugin is modified to include the batch function, which
uses one SQL query regardless of the number of devices.
Kevin Benton [Wed, 29 Oct 2014 04:39:04 +0000 (21:39 -0700)]
Big Switch: Fix SSL version on get_server_cert
The ssl.get_server_certificate method uses SSLv3 by default.
Support for SSLv3 was dropped on the backend controller in
response to the POODLE vulnerability. This patch fixes it
to use TLSv1 like the wrap_socket method.
NSX: allow multiple networks with same vlan on different phy_net
Previously, the NSX plugin prevented one from creating multiple networks on
the same vlan even if they were being created on different physical_networks.
This patch corrects this issue and allows this to now occur.
Kevin Benton [Thu, 16 Oct 2014 08:49:19 +0000 (01:49 -0700)]
_update_router_db: don't hold open transactions
This patch prevents the L3 _update_router_db method from
starting a transaction before calling the gateway interface
removal functions. With these port changes now occuring
outside of the L3 DB transaction, a failure to update the
router DB information will not rollback the port deletion
operation.
The 'VPN in use' check had to be moved inside of the DB deletion
transaction now that there isn't an enclosing transaction to undo
the delete when an 'in use' error is raised.
===Details===
The router update db method starts a transaction and calls
the gateway update method with the transaction held open.
This becomes a problem when the update results in an
interface removal which uses a port table lock.
Because the delete_port caller is still holding open a
transaction, other sessions are blocked from getting an
SQL lock on the same tables when delete_port starts
performing RPC notifications, external controller calls,
etc. During those external calls, eventlet will
yield and another thread may try to get a lock on the
port table, causing the infamous mysql/eventlet deadlock.
This separation of L2/L3 transactions is similiar to change
I3ae7bb269df9b9dcef94f48f13f1bde1e4106a80 in nature. Even
though there is a loss in the atomic behavior of the interface
removal operation, it was arguably incorrect to begin with.
The restoration of port DB records during a rollback after some
other failure doesn't undo the backend operations (e.g. REST calls)
that happened during the original deletion. So, having a delete
rollback without corresponding 'create_port' calls to the backend
causes a loss in consistency.
Kevin Benton [Wed, 24 Sep 2014 12:23:32 +0000 (05:23 -0700)]
Improve performance of security group DB query
The _select_ips_for_remote_group method was joining the
IP allocation, port, allowed address pair, and security group tables
together in a single query. Additionally, it was loading all of
the port columns and using none of them. This resulted in a
very expensive query with no benefit.
This patch eliminates the unnecessary use of the port table by joining
the IP allocation table directly to the security groups and allowed
address pairs tables. In local testing of the method, this sped it up
by an order of magnitude.
Within ovs agent daemon loop, prepare_devices_filter will impose heavy workloads
to neutron server in order to retrieve the security groups message to apply
firewall rules. If agent is configured to use Noopfirewall driver or security
groups are disabled, there is no need for loading the rules from server and
refreshing the firewalls. This will reduce the number of db calls and improve
performance for neutron server in this case.
Kevin Benton [Tue, 7 Oct 2014 11:34:41 +0000 (04:34 -0700)]
Big Switch: Don't clear hash before sync
This patch removes the step of clearing the consistency
hash from the DB before a topology sync. This will ensure
that inconsistency will be detected if the topology sync
fails.
This logic was originally there to make sure the hash header
was not present on the topology sync call to the backend.
However, the hash header is ignored by the backend in a sync
call so it wasn't necessary.
With l2pop enabled, race exists in delete_port_postcommit
when both create/update_port and delete_port deal with
different ports on the same host, where such ports are
either the first (or) last on same network for that host.
This race happens outside the DB locking zones in
the respective methods of ML2 plugin.
To fix this, we have moved determination of
fdb_entries back to delete_port_postcommit and removed
delete_port_precommit altogether from l2pop mechanism
driver. In order to accomodate dvr interfaces, we
are storing and re-using the mechanism-driver context
which hold dvr-port-binding information while
invoking delete_port_postcommit. We loop through
dvr interface bindings invoking delete_port_postcommit
similar to delete_port_precommit.
Xu Han Peng [Fri, 20 Jun 2014 06:59:53 +0000 (14:59 +0800)]
Use EUI64 for IPv6 SLAAC when subnet is specified
This commit uses EUI64 for SLAAC and stateless IPv6 address
when subnet id in fixed_ip is specified.
After this patch, all the ports created on a subnet which has
ipv6_address_mod=slaac or ipv6_address_mod=dhcpv6-stateless
will use EUI64 as the address.
This patch also checks if fixed IP address is specified
for a IPv6 subnet with address mode slaac or dhcpv6-stateless
during creating or updating a port. If yes, raise InvalidInput
error to stop the port creation or update.
Remove unit test test_generated_duplicate_ip_ipv6 because
fixed_ip should not be specified for a slaac subnet.
Mark McClain [Wed, 8 Oct 2014 18:49:20 +0000 (18:49 +0000)]
Add database relationship between router and ports
Add an explicit schema relationship between a router and its ports. This
change ensures referential integrity among the entities and prevents orphaned
ports.
Henry Gessau [Wed, 8 Oct 2014 00:38:38 +0000 (20:38 -0400)]
Disable PUT for IPv6 subnet attributes
In Juno we are not ready for allowing the IPv6 attributes on a subnet
to be updated after the subnet is created, because:
- The implementation for supporting updates is incomplete.
- Perceived lack of usefulness, no good use cases known yet.
- Allowing updates causes more complexity in the code.
- Have not tested that radvd, dhcp, etc. behave OK after update.
Therefore, for now, we set 'allow_put' to False for the two IPv6
attributes, ipv6_ra_mode and ipv6_address_mode. This prevents the
modes from being updated via the PUT:subnets API.
Assaf Muller [Tue, 7 Oct 2014 19:45:41 +0000 (22:45 +0300)]
Forbid update of HA property of routers
While the HA property is update-able, and resulting router-get
invocations suggest that the router is HA, the migration
itself fails on the agent. This is deceiving and confusing
and should be blocked until the migration itself is fixed
in a future patch.
Eugene Nikanorov [Sun, 24 Aug 2014 20:59:02 +0000 (00:59 +0400)]
Raise exception if ipv6 prefix is inappropriate for address mode
Address prefix to use with slaac and stateless ipv6 address modes
should be equal to 64 in order to work properly.
The patch adds corresponding validation and fixes unit tests
accordingly.
On systems that start both neutron-server and neutron-l3-agent together,
there is a chance that the first call to neutron will timeout. Retry upto
4 more times to avoid the l3 agent exiting on startup.
This should make the l3 agent a little more robust on startup but still
not ideal, ideally it wouldn't exit and retry periodically.
Ed Bak [Mon, 29 Sep 2014 20:15:52 +0000 (14:15 -0600)]
Don't fail when trying to unbind a router
If a router is already unbound from an l3 agent, don't fail. Log
the condition and go on. This is harmless since it can happen
due to a delete race condition between multiple neutron-server
processes. One delete request can determine that it needs to
unbind the router. A second process may also determine that it
needs to unbind the router. The exception thrown will result
in a port delete failure and cause nova to mark a deleted instance
as ERROR.
Mark McClain [Wed, 24 Sep 2014 04:00:54 +0000 (04:00 +0000)]
remove openvswitch plugin
This changeset removes the openvswitch plugin, but retains the agent for ML2
The database models were not removed since operators will need to migrate the
data.
Fix pid file location to avoid I->J changes that break metadata
Changes in commit 7f8ae630b87392193974dd9cb198c1165cdec93b moved
pid files handled by agent/linux/external_process.py from
$state_path/external/<uuid>.pid to $state_path/external/<uuid>/pid
that breaks the neutron-ns-metadata-proxy respawn after upgrades
becase the l3 or dhcp agent can't find the old pid file so
they try to start a new neutron-ns-metadata-proxy which won't
succeed, because the old one is holding the port already.
Mark McClain [Wed, 24 Sep 2014 01:50:06 +0000 (01:50 +0000)]
remove linuxbridge plugin
This changeset removes the linuxbridge plugin, but retains the agent for ML2.
The database models were not removed since operators will need to migrate the
data.
Additionally, the ml2 migration script was altered to support Juno. For
testing, a user must either run the migration against the icehouse
scheme or run the update, manually change alembic_version to juno and
then run the migration script. Once the juno migration is added, this
manually step will not be required.
Kevin Benton [Tue, 30 Sep 2014 03:21:23 +0000 (20:21 -0700)]
ML2: move L3 cleanup out of network transaction
Move _process_l3_delete out of the delete_network
transaction to eliminate the semaphore deadlock that
occurs when it tries to delete the ports associated
with existing floating IPs.
It makes more sense to live outside of the transaction
anyway because the operations it performs cannot be
rolled back only in the database if the L3 plugin makes
external calls for floating IP creation/deletion.
e.g. if delete_floatingip is successful, it may have
deleted external resources and restoring the DB records
would make things inconsistent.
If a failure to delete the network does occur, any cleanup
done by _process_l3_delete will not be reversed.
John Kasperski [Thu, 25 Sep 2014 15:38:45 +0000 (10:38 -0500)]
Update migration scripts to support DB2
Three of the migration scripts are causing failures with DB2.
- DB2 doesn't support nullable column in primary key
- Hard coded SQL statements which use False/True as Boolean arguments
are not compatible with DB2. In DB2, Boolean columns are created as
small integer with a constraint to allow only 0 & 1.
- Hardcoded update rows from other table sql is not compatible with DB2
Note: There are several other unrelated unit tests that also break with a
randomized PYTHONHASHSEED, but they are not addressed here. They will be
addressed in separate patches.
Kevin Benton [Sat, 20 Sep 2014 07:17:58 +0000 (00:17 -0700)]
Fix broken port query in Extraroute test case
One of the queries in an extra route test case tries
to filter based on the port owner, but the _list_ports
method it calls doesn't take a device_owner parameter.
This can cause failures if a DHCP port is created on
the same subnet.
The patch being reverted here addresses an issue that can no longer be
reproduced, in that under no circumstances, I can make the FIP lie around
before deleting a router (which can only be done after all FIP have been
disassociated or released).
Unless we have more clarity as to what the initial commit was really meant
to fix, there is a strong case for reverting this patch at this point.
Michael Smith [Wed, 24 Sep 2014 17:20:46 +0000 (10:20 -0700)]
fix dvr snat bindings for external-gw-clear
When router_gateway_clear happens, the
schedule_router calls the unbind_snat_servicenode
in the plugin. This will clear the agent binding
from the binding table. But the l3-agent was
expecting the ex_gw_port binding to be present.
The agent needs to check its cache of the
router['gw_host_port'] value now.