Sam Betts [Wed, 29 Apr 2015 15:15:35 +0000 (16:15 +0100)]
Ensure mocks for lla allocator _write in test_agent
The test test_create_dvr_fip_interfaces_for_restart_l3agent_case was
causing a file fip-linklocal-networks to be created when the tests are
run, this patch ensures that the correct part of the LinkLocalAllocator
is patched to prevent this in the test case.
lijianlj [Thu, 29 Jan 2015 06:41:20 +0000 (14:41 +0800)]
Add icmpv6 to sg_supported_protocols
support using icmpv6 (protocol num 58) in the protocol option, when creating
a security group rule.At this time, port_range_min/port_range_max represent
icmpv6 type/code, and you can use only port_range_min to specify just one type.
eg:neutron security-group-rule-create --direction ingress \
--ethertype ipv6 --protocol icmpv6 --port-range-min 134 SECURITY_GROUP
Gal Sagie [Mon, 30 Mar 2015 07:40:36 +0000 (10:40 +0300)]
Suppress exception when trying to remove non existing device in SNAT redirect
L3 service plugin first calls to remove_router_interface from the L2 OVS agent
which delete this port from OVS and then the service plugin calls
to remove the router interface from L3 agent.
Catch the exception thrown on the delete gateway, if its due to device doesn't exists
ignore the exception
Currently a HA router can be successfully created even when
there is not enough active l3 agent. Current code only checks
existing l3 agents but does not check if the agent is already
down.
This patch fixes this problem by checking only active l3 agents
when getting the number of agents for scheduling HA router.
lzklibj [Sat, 21 Mar 2015 16:58:15 +0000 (09:58 -0700)]
fix l3-agent restart with last runtime fip for dvr
In DVR enabled environment, after we associated a floating
IP to a VM, when we restart L3-agent on the same compute
node, the L3-agent will miss to create rtr_fip_subnet for
router_info. The previous floating IP can still work, but
new associated floating IPs to VMs related to the same router
on this L3-agent will fail to configure and not work. This
patch will fix this.
The method create_dvr_fip_interfaces in dvr_router.py will
invoke fip_ns.create_rtr_2_fip_link, and the later one will
create rtr_fip_subnet, consider VMs related to the same router
will share the same rtr_fip_subnet, so processing here should
run only once for those VMs, once rtr_fip_subnet is created.
Current code will check dist_fip_count then decide to invoke
fip_ns.create_rtr_2_fip_link or not.
dist_fip_count should be zero if a router related VMs never
have been associated with any floating IPs before. But if a
router has floating IPs associated to its related VMs, after
it is restared, dist_fip_count will be non-zero, and this is
the point this patch try to fix. And for case rtr_fip_subnet
has been created, both dist_fip_count and is_fisrt will be
false, and fip_ns.create_rtr_2_fip_link will be no more need
to be invoked.
Cedric Brandily [Thu, 5 Mar 2015 21:43:09 +0000 (21:43 +0000)]
Replace BaseLinuxTestCase by BaseSudoTestCase
BaseLinuxTestCase provides 2 methods which are used once/three time(s),
this change inlines these methods and removes BaseLinuxTestCase and
replaces it by BaseSudoTestCase.
This change removes a useless cleanup in RecursivePermDirFixture:
previously RecursivePermDirFixture reverts permission changes on
directories, but the cleanup is useless as directories are provided
by TempDir.
Kevin Benton [Fri, 24 Apr 2015 13:52:21 +0000 (06:52 -0700)]
Add missing interface to populate subnets method
Change Ib46f685d72eb61ecbaa2869e28fb173cd6d49552 introduced
and optimization to defer the lookup of interface subnet info
until all of the router interfaces were collected. However,
it didn't add the DVR SNAT interface to the list of interfaces
to populate subnet info so it broke DVR.
This patch corrects the behavior by adding the DVR SNAT interface
to the list of ports that need subnet info populated.
Elena Ezhova [Tue, 7 Apr 2015 11:54:45 +0000 (14:54 +0300)]
Refactor socket ssl wrapping
Move socket wrapping into a separate method in order to separate
its logic from other action done in _get_socket. Now, ssl wrapping
is applied to the socket returned by _get_socket method.
Additionally checks for ssl config options are now performed during
init and not each time wrap_socket is called.
Kevin Benton [Fri, 24 Apr 2015 07:35:31 +0000 (00:35 -0700)]
Don't resync on DHCP agent setup failure
There are various cases where the DHCP agent will try to
create a DHCP port for a network and there will be a failure.
This has primarily been caused by a lack of available IP addresses
in the allocation pool. Trying to fix all availability corner cases
on the server side will be very difficult due to race conditions between
multiple ports being created, the dhcp_agents_per_network parameter, etc.
This patch just stops the resync attempt on the agent side if a failure
is caused by an IP address generation problem. Future updates to the subnet
will cause another attempt so if the tenant does fix the issue they will
get DHCP service.
sridhargaddam [Mon, 8 Dec 2014 16:11:38 +0000 (16:11 +0000)]
Neutron to Drop Router Advts from VM ports
As part of Spoofing filter chain Neutron drops all the outbound
traffic where MAC/IP does not match the IP address assigned
to the VM ports (inc' allowed_address_pairs). Along with this,
we also drop traffic associated to dhcp[v6] server (i.e., do
not allow a VM to run dhcp[v6] server). Currently we do not
have any rules to drop Router Advts from VM ports. This can create
issues in the network as other devices in the network may not have
any protection for this kind of stuff.
Even if we allow RAs from the VM ports, because of the Anti-Spoofing
rules that are applied, a VM cannot act as a IPv6 router (i.e., it
cannot forward IPv6 traffic). So there is no point in allowing Router
Advts from VMs assuming that it would be useful in Service VM use-cases.
In order to properly implement IPv6 router as a Service VM, one needs
to use the port_security_extension [1] which allows us to disable
security group rules/anti-spoofing filters on the VM ports.
The test_ha_router_failover tests were not being unmocked. This
is because the same object was being mocked twice, but unmocked
once. The mock.patch.stopall call in the tests base class was rewinding
the value of the object from the second mock to the first mock.
Follow up tests in the same worker were using namespace
names defined via the first mock in the failover test.
This routine in policy.py used to have a backward compatibility
check to ensure proper behaviour even when the policy.json file
did not have a specific 'context_is_admin' policy.
However, this backward compatibility check does not work. It
appears indeed that it has been broken for several release cycles;
it is also possible that actually it never worked.
When the 'context_is_admin' policy is not in the policy.json file
the enforcer simply ends up evaluating whatever is the default
policy configured there.
Therefore this patch:
- Removes the backward compatibility check, since it does not work
- Fails, for safety, check_is_admin if 'context_is_admin' policy is
not specified
- Fixeds check_is_advsvc in the same way (the backward compatibility
check never made any sense for this function)
- Fixes unit tests adding appropriate tests for check_is_admin and
check_is_advsvc
Currently radvd is spawned in all the HA routers irrespective of the
state of the router. This approach has the following issues.
1. While processing the internal router ports (i.e., qr-xxx), ha_router
removes the LLA of the interface and adds it as a VIP to Keepalived conf.
Radvd daemon is spawned after this operation in the router namespace
(if the port is associated with any IPv6 subnets). Radvd notices that
qr-xxx interface does not have the LLA, so does not transmit any Router
Advts. In this state, VMs fail to acquire IPv6 addresses because of the
missing RAs. Radvd does not recover even after keepalived configures the
LLA of the interface. The only solution is to restart/reload radvd daemon.
Currently keepalived-state-change monitor does not do any radvd related
operations when a state transition happens. So we endup in this state
forever.
2. For all the routers in Backup state, qr-xxx interface does not have LLA
as it is managed by keepalived and configured only on the Master HA router.
In such agents syslog is flooded with the messages [1] and this can cause
loss of other useful info.
[1] - resetting ipv6-allrouters membership on qr-2e373555-97
This patch implements the following.
1. If the router is already in the Master state, we configure the LLA as a VIP
in keepalived conf but do not delete the LLA of the internal interface.
2. We spawn radvd only if the router is in the Master State.
3. Keepalived-state-change monitor takes care of enabling/disabling radvd upon
state transitions.
Restrict subnet create/update to avoid DHCP resync
As we know, IPs in subnet CIDR are used for
1) Broadcast port
2) Gateway port
3) DHCP port if enable_dhcp is True, or update to True
4) Others go into allocation_pools
Above 1) to 3) are created by default, which means if CIDR doesn't
have that much of IPs, subnet create/update will cause a DHCP resync.
This fix is to add some restricts to the issue:
A) When subnet create, if enable_dhcp is True, /31 and /32
cidrs are forbidden for IPv4 subnets while /127 and /128 cidrs are
forbidden for IPv6 subnets.
B) When subnet update, if enable_dhcp is changing to True and there are no
more IPs in allocation_pools, the request should be denied.
Change-Id: I2e4a4d5841b9ad908f02b7d0795cba07596c023d Co-authored-by: Andrew Boik <dboik@cisco.com>
Closes-Bug: #1443798
Remove dependency on weak reference for registry callbacks
The use of weakref was introduced as a preventive measure to avoid
potential OOM kills, however that limited our ability to employ
certain functions as callbacks, such as object methods (see [1] for
an example).
Since the adoption of the callback registry, it has been observed that
callbacks are generally long lived (for the entire duration of the
process they belong to), therefore this limitation appears to be too
restrictive at this point in time.
Some might argue that it's better safe than sorry, but until we
have some evidence of actual OOM kills, it's probably best to take
the bolder action of removing the adoption of weak references and
deal with the potential fallout, should it happen.
As DVR routers use a different type of interface, this patch
amends the DHCP agent code ensuring that a metadata proxy is
spawned when the metadata network feature is enabled on the
DHCP agent.
This is an internal implementation detail, would admins care
if internal events are being fired off successfully? What actionable
information does this present?
Brent Eagles [Tue, 17 Feb 2015 17:15:25 +0000 (13:45 -0330)]
Refactor RESOURCE_ATTRIBUTE_MAP cleanup
This patch adds a AttributeMapMemento class that can be used for
restoring the RESOURCE_ATTRIBUTE_MAP on test tear down. Tests containing
their own cleanup code have been modified to use it instead.
Previously the query was fetching an IPAllocation object incorrectly
relying on the fact that it has port attribute that should be
join-loaded when it really is not.
Incorrect query produced by previous code:
SELECT ipallocations.port_id AS ipallocations_port_id,
ipallocations.ip_address AS ipallocations_ip_address,
ipallocations.subnet_id AS ipallocations_subnet_id,
ipallocations.network_id AS ipallocations_network_id
FROM ipallocations, ports
WHERE ipallocations.subnet_id = :subnet_id_1
AND ports.device_owner NOT IN (:device_owner_1)
The query then may have produced results that don't satisfy
the condition intended by the code.
Query produced by the fixed code:
SELECT ipallocations.port_id AS ipallocations_port_id,
ipallocations.ip_address AS ipallocations_ip_address,
ipallocations.subnet_id AS ipallocations_subnet_id,
ipallocations.network_id AS ipallocations_network_id
FROM ipallocations JOIN ports ON ports.id = ipallocations.port_id
WHERE ipallocations.subnet_id = :subnet_id_1
AND ports.device_owner NOT IN (:device_owner_1)
ARP cache poisoning is not actually prevented by the firewall
driver 'iptables_firewall'. We are adding the use of the ebtables
command - with a corresponding ebtables-driver - in order to create
Ethernet frame filtering rules, which prevent the sending of ARP
cache poisoning frames.
The complete patch is broken into a set of smaller patches for easier review.
This patch here is th first of the series and includes the low-level ebtables
integration, unit and functional tests.
Note:
This commit is based greatly on an original, now abandoned patch,
presented for review here:
The use of the builtin unittest test loader was silently dropping tests
that couldn't be imported.
This change also drops the retargetable path from discovery in the api
path due to a previously-masked configuration problem, and fixes an
invalid import in a functional testing fixture module.
Fullstack tests are also disabled temporarily pending a fix for #1446261.
Kevin Benton [Tue, 21 Apr 2015 09:01:39 +0000 (02:01 -0700)]
Block allowed address pairs on other tenants' net
Don't allow tenants to use the allowed address pairs extension
when they are attaching a port to a network that does not belong
to them.
This is done because allowed address pairs can allow things like
ARP spoofing and all tenants attached to a shared network might not
implicitly trust each other.
tests: confirm that _output_hosts_file does not log too often
I3ad7864eeb2f959549ed356a1e34fa18804395cc didn't include any regression unit
tests to validate that the method won't ever log too often again,
reintroducing performance drop in later patches. It didn't play well
with stable backports of the fix, where context was lost when doing the
backport, that left the bug unfixed in stable/juno even though the patch
was merged there [1].
The patch adds an explicit note in the code that suggests not to add new
log messages inside the loop to avoid regression, and a unit test was
added to capture it.
Once the test is merged in master, it will be proposed for stable/juno
inclusion, with additional changes that would fix the regression again.
ML2 mech drivers have no direct exposure to security groups,
and they can only infer them from the associated network/ports.
This is problematic as agentless ML2 mech drivers have no way of
intercepting securitygroups events and propagate the information
to their backend, or more generally, react to them.
This patch leverages the callback registry to dispatch such events
so that interested ML2 mech drivers (or any interested party like
service plugins) can be notified and react accordingly.
This patch addresses create/update/delete of security groups and
create/delete of security groups rules. Other events may be added
over time, if need be.
This patch is only about emitting the events. The actual subscription
and implementation of the event handlers will have to take place where
deemed appropriate.
Kevin Benton [Fri, 17 Apr 2015 11:46:11 +0000 (04:46 -0700)]
L3 DB: Defer port DB subnet lookups
_populate_subnets_for_ports was being called multiple
times for different interface types during the get_routers
process.
This patch eliminates those extra queries by deferring the
subnet information population until after all of the interfaces
have been looked up. Includes a function rename as well to
indicate that a function is only used internally.
Kevin Benton [Tue, 21 Apr 2015 05:26:22 +0000 (22:26 -0700)]
Only update MTU in update code for MTU
The ML2 create_network_db was re-passing in the entire network
with extensions like vlan_transparency present that was causing
issues in the base update function it was calling.
This corrects the behavior by having it only update the MTU, which
is the only thing it was intending to update in the first place.
Kevin Benton [Fri, 17 Apr 2015 10:53:45 +0000 (03:53 -0700)]
Defer creation of router JSON in get_routers RPC
The get_routers method in the l3 RPC code has a log.debug
statement that formats all of the router data as indented
JSON. This method can be expensive if there are hundreds
of routers being synced and it happens even if debugging
is disabled since the function call result is the parameter
to the debug statement.
This patch adds and leverages a small helper class that takes a
callable and its args and defers calling it until the __str__ method
is called on it when it's actually trying to be rendered to a string.
ovs_lib: Fix a race between get_port_tag_dict and port removal
get_port_tag_dict() gets a list of ports using get_port_name_list()
and then queries the db again for ports in the list.
It fails if some of ports disappeared in between.
This change fixes it by ignoring "not exist" errors in the later query.
network.external is only present if one is using the external_net_db
mixin. This patch just adds a check to see network has the attribute
external to avoid an Attribute error.