Elena Ezhova [Tue, 7 Apr 2015 11:54:45 +0000 (14:54 +0300)]
Refactor socket ssl wrapping
Move socket wrapping into a separate method in order to separate
its logic from other action done in _get_socket. Now, ssl wrapping
is applied to the socket returned by _get_socket method.
Additionally checks for ssl config options are now performed during
init and not each time wrap_socket is called.
Kevin Benton [Fri, 24 Apr 2015 07:35:31 +0000 (00:35 -0700)]
Don't resync on DHCP agent setup failure
There are various cases where the DHCP agent will try to
create a DHCP port for a network and there will be a failure.
This has primarily been caused by a lack of available IP addresses
in the allocation pool. Trying to fix all availability corner cases
on the server side will be very difficult due to race conditions between
multiple ports being created, the dhcp_agents_per_network parameter, etc.
This patch just stops the resync attempt on the agent side if a failure
is caused by an IP address generation problem. Future updates to the subnet
will cause another attempt so if the tenant does fix the issue they will
get DHCP service.
sridhargaddam [Mon, 8 Dec 2014 16:11:38 +0000 (16:11 +0000)]
Neutron to Drop Router Advts from VM ports
As part of Spoofing filter chain Neutron drops all the outbound
traffic where MAC/IP does not match the IP address assigned
to the VM ports (inc' allowed_address_pairs). Along with this,
we also drop traffic associated to dhcp[v6] server (i.e., do
not allow a VM to run dhcp[v6] server). Currently we do not
have any rules to drop Router Advts from VM ports. This can create
issues in the network as other devices in the network may not have
any protection for this kind of stuff.
Even if we allow RAs from the VM ports, because of the Anti-Spoofing
rules that are applied, a VM cannot act as a IPv6 router (i.e., it
cannot forward IPv6 traffic). So there is no point in allowing Router
Advts from VMs assuming that it would be useful in Service VM use-cases.
In order to properly implement IPv6 router as a Service VM, one needs
to use the port_security_extension [1] which allows us to disable
security group rules/anti-spoofing filters on the VM ports.
The test_ha_router_failover tests were not being unmocked. This
is because the same object was being mocked twice, but unmocked
once. The mock.patch.stopall call in the tests base class was rewinding
the value of the object from the second mock to the first mock.
Follow up tests in the same worker were using namespace
names defined via the first mock in the failover test.
Currently radvd is spawned in all the HA routers irrespective of the
state of the router. This approach has the following issues.
1. While processing the internal router ports (i.e., qr-xxx), ha_router
removes the LLA of the interface and adds it as a VIP to Keepalived conf.
Radvd daemon is spawned after this operation in the router namespace
(if the port is associated with any IPv6 subnets). Radvd notices that
qr-xxx interface does not have the LLA, so does not transmit any Router
Advts. In this state, VMs fail to acquire IPv6 addresses because of the
missing RAs. Radvd does not recover even after keepalived configures the
LLA of the interface. The only solution is to restart/reload radvd daemon.
Currently keepalived-state-change monitor does not do any radvd related
operations when a state transition happens. So we endup in this state
forever.
2. For all the routers in Backup state, qr-xxx interface does not have LLA
as it is managed by keepalived and configured only on the Master HA router.
In such agents syslog is flooded with the messages [1] and this can cause
loss of other useful info.
[1] - resetting ipv6-allrouters membership on qr-2e373555-97
This patch implements the following.
1. If the router is already in the Master state, we configure the LLA as a VIP
in keepalived conf but do not delete the LLA of the internal interface.
2. We spawn radvd only if the router is in the Master State.
3. Keepalived-state-change monitor takes care of enabling/disabling radvd upon
state transitions.
Restrict subnet create/update to avoid DHCP resync
As we know, IPs in subnet CIDR are used for
1) Broadcast port
2) Gateway port
3) DHCP port if enable_dhcp is True, or update to True
4) Others go into allocation_pools
Above 1) to 3) are created by default, which means if CIDR doesn't
have that much of IPs, subnet create/update will cause a DHCP resync.
This fix is to add some restricts to the issue:
A) When subnet create, if enable_dhcp is True, /31 and /32
cidrs are forbidden for IPv4 subnets while /127 and /128 cidrs are
forbidden for IPv6 subnets.
B) When subnet update, if enable_dhcp is changing to True and there are no
more IPs in allocation_pools, the request should be denied.
Change-Id: I2e4a4d5841b9ad908f02b7d0795cba07596c023d Co-authored-by: Andrew Boik <dboik@cisco.com>
Closes-Bug: #1443798
Remove dependency on weak reference for registry callbacks
The use of weakref was introduced as a preventive measure to avoid
potential OOM kills, however that limited our ability to employ
certain functions as callbacks, such as object methods (see [1] for
an example).
Since the adoption of the callback registry, it has been observed that
callbacks are generally long lived (for the entire duration of the
process they belong to), therefore this limitation appears to be too
restrictive at this point in time.
Some might argue that it's better safe than sorry, but until we
have some evidence of actual OOM kills, it's probably best to take
the bolder action of removing the adoption of weak references and
deal with the potential fallout, should it happen.
As DVR routers use a different type of interface, this patch
amends the DHCP agent code ensuring that a metadata proxy is
spawned when the metadata network feature is enabled on the
DHCP agent.
This is an internal implementation detail, would admins care
if internal events are being fired off successfully? What actionable
information does this present?
Brent Eagles [Tue, 17 Feb 2015 17:15:25 +0000 (13:45 -0330)]
Refactor RESOURCE_ATTRIBUTE_MAP cleanup
This patch adds a AttributeMapMemento class that can be used for
restoring the RESOURCE_ATTRIBUTE_MAP on test tear down. Tests containing
their own cleanup code have been modified to use it instead.
Previously the query was fetching an IPAllocation object incorrectly
relying on the fact that it has port attribute that should be
join-loaded when it really is not.
Incorrect query produced by previous code:
SELECT ipallocations.port_id AS ipallocations_port_id,
ipallocations.ip_address AS ipallocations_ip_address,
ipallocations.subnet_id AS ipallocations_subnet_id,
ipallocations.network_id AS ipallocations_network_id
FROM ipallocations, ports
WHERE ipallocations.subnet_id = :subnet_id_1
AND ports.device_owner NOT IN (:device_owner_1)
The query then may have produced results that don't satisfy
the condition intended by the code.
Query produced by the fixed code:
SELECT ipallocations.port_id AS ipallocations_port_id,
ipallocations.ip_address AS ipallocations_ip_address,
ipallocations.subnet_id AS ipallocations_subnet_id,
ipallocations.network_id AS ipallocations_network_id
FROM ipallocations JOIN ports ON ports.id = ipallocations.port_id
WHERE ipallocations.subnet_id = :subnet_id_1
AND ports.device_owner NOT IN (:device_owner_1)
ARP cache poisoning is not actually prevented by the firewall
driver 'iptables_firewall'. We are adding the use of the ebtables
command - with a corresponding ebtables-driver - in order to create
Ethernet frame filtering rules, which prevent the sending of ARP
cache poisoning frames.
The complete patch is broken into a set of smaller patches for easier review.
This patch here is th first of the series and includes the low-level ebtables
integration, unit and functional tests.
Note:
This commit is based greatly on an original, now abandoned patch,
presented for review here:
The use of the builtin unittest test loader was silently dropping tests
that couldn't be imported.
This change also drops the retargetable path from discovery in the api
path due to a previously-masked configuration problem, and fixes an
invalid import in a functional testing fixture module.
Fullstack tests are also disabled temporarily pending a fix for #1446261.
Kevin Benton [Tue, 21 Apr 2015 09:01:39 +0000 (02:01 -0700)]
Block allowed address pairs on other tenants' net
Don't allow tenants to use the allowed address pairs extension
when they are attaching a port to a network that does not belong
to them.
This is done because allowed address pairs can allow things like
ARP spoofing and all tenants attached to a shared network might not
implicitly trust each other.
tests: confirm that _output_hosts_file does not log too often
I3ad7864eeb2f959549ed356a1e34fa18804395cc didn't include any regression unit
tests to validate that the method won't ever log too often again,
reintroducing performance drop in later patches. It didn't play well
with stable backports of the fix, where context was lost when doing the
backport, that left the bug unfixed in stable/juno even though the patch
was merged there [1].
The patch adds an explicit note in the code that suggests not to add new
log messages inside the loop to avoid regression, and a unit test was
added to capture it.
Once the test is merged in master, it will be proposed for stable/juno
inclusion, with additional changes that would fix the regression again.
ML2 mech drivers have no direct exposure to security groups,
and they can only infer them from the associated network/ports.
This is problematic as agentless ML2 mech drivers have no way of
intercepting securitygroups events and propagate the information
to their backend, or more generally, react to them.
This patch leverages the callback registry to dispatch such events
so that interested ML2 mech drivers (or any interested party like
service plugins) can be notified and react accordingly.
This patch addresses create/update/delete of security groups and
create/delete of security groups rules. Other events may be added
over time, if need be.
This patch is only about emitting the events. The actual subscription
and implementation of the event handlers will have to take place where
deemed appropriate.
Kevin Benton [Fri, 17 Apr 2015 11:46:11 +0000 (04:46 -0700)]
L3 DB: Defer port DB subnet lookups
_populate_subnets_for_ports was being called multiple
times for different interface types during the get_routers
process.
This patch eliminates those extra queries by deferring the
subnet information population until after all of the interfaces
have been looked up. Includes a function rename as well to
indicate that a function is only used internally.
Kevin Benton [Tue, 21 Apr 2015 05:26:22 +0000 (22:26 -0700)]
Only update MTU in update code for MTU
The ML2 create_network_db was re-passing in the entire network
with extensions like vlan_transparency present that was causing
issues in the base update function it was calling.
This corrects the behavior by having it only update the MTU, which
is the only thing it was intending to update in the first place.
Kevin Benton [Fri, 17 Apr 2015 10:53:45 +0000 (03:53 -0700)]
Defer creation of router JSON in get_routers RPC
The get_routers method in the l3 RPC code has a log.debug
statement that formats all of the router data as indented
JSON. This method can be expensive if there are hundreds
of routers being synced and it happens even if debugging
is disabled since the function call result is the parameter
to the debug statement.
This patch adds and leverages a small helper class that takes a
callable and its args and defers calling it until the __str__ method
is called on it when it's actually trying to be rendered to a string.
ovs_lib: Fix a race between get_port_tag_dict and port removal
get_port_tag_dict() gets a list of ports using get_port_name_list()
and then queries the db again for ports in the list.
It fails if some of ports disappeared in between.
This change fixes it by ignoring "not exist" errors in the later query.
network.external is only present if one is using the external_net_db
mixin. This patch just adds a check to see network has the attribute
external to avoid an Attribute error.
Cedric Brandily [Sun, 1 Mar 2015 22:08:58 +0000 (22:08 +0000)]
Define FakeMachine helper for functional/fullstack tests
The change defines the FakeMachine fixture/helper which emulates a
machine through a namespace with:
* a port bound to a bridge,
* an ip on the port,
* a gateway (if requested).
The FakeMachine class can be used to emulate:
* a VM for testing network features (ex: metadata service),
* an external machine for testing "external" network features (ex:
routing/natting),
* a server for low level tests of network features (ex: iptables).
The change also defines PeerMachines fixture/helper to create some fake
machines bound to a bridge.
Assaf Muller [Mon, 2 Mar 2015 16:29:51 +0000 (11:29 -0500)]
Simplify keepalived.virtual_routes
keepalived.virtual_routes previously held one list of virtual
routes of different kinds, and the HA router class manipulated
that list directly. The list held both the default gateway
virtual route, and any extra routes. This means that when adding
extra routes for example, the HA router would first have to
remove all routes that are not default gateway routes, then add
the extra routes received via RPC.
This is messy because:
a) It's needlessly complicated
b) It's fragile
c) There's zero separation of concerns (HA router should not know
how keepalived maintains its list of virtual routes)
d) It requires changes to the management of the default gateway
and virtual routes just to add another type of extra routes
This patch solves these issues by separating the persistency of
virtual routes according to their role.
Co-Authored-By: gong yong sheng <gong.yongsheng@99cloud.net>
Related-Bug: 1414640
Change-Id: I1406b1876c3a47b110818686b42e5f2f688154fa
Terry Wilson [Fri, 17 Apr 2015 21:13:09 +0000 (16:13 -0500)]
Correct typo for matching non-dict ovsdb rows
As can be seen just above, the correct operator for the equality
test is '=' and not '=='. This match isn't currently being used
in the neutron code, but will be used by the OVN driver.
The previous code would also raise NotImplemented when there was
no match.
Fixes race condition and boosts the scheduling performance
This patch fixes a race-condition that occurs when the
scheduler tries to check for dvr serviceable ports before
it schedules a router when a subnet is associated with
a router.
Sometimes the dhcp port creation is delayed and so the
router is not scheduled to the l3-agent.
Also it boosts the scheduling performance on dvr-snat
node for scheduling a router.
This patch will provide a work around to fix this race
condition and to boost the scheduling performance
by scheduling a router on a dvr-snat when
dhcp is enabled on the provided subnet, instead of checking
all the available ports on the subnet.
mathieu-rohon [Sat, 7 Mar 2015 12:30:49 +0000 (13:30 +0100)]
ML2: Change port status only when it's bound to the host
Currently, nothing prevents the port status to be changed to BUILD
state when get_device_details() is sent by a host that doesn't own
the port.
In some cases the port might stay in BUILD state.
This could happen during a live-migration, or for multi-hosted ports
such as HA ports.
This commit allows the port status modification only if the port
is bound to the host that is asking for it.
Kevin Benton [Fri, 17 Apr 2015 11:28:58 +0000 (04:28 -0700)]
Remove double queries in l3 DB get methods
Two frequently called functions were querying the routerport table
and the corresponding ports just to get the port ID. Then they were
calling get_ports again with those port IDs, resulting in two queries
to the port table when there should have only been one.
This eliminates the second call to get_ports since all of the necessary
data hase been retrieved from the port table.
Kevin Benton [Fri, 17 Apr 2015 11:18:56 +0000 (04:18 -0700)]
Strip unnecessary overrides in extraroute_db mixin
The extra route DB mixin seemed to be overriding the
get_router and get_routers method for no reason. They
both just called the super version of themselves with
the same arguments.
This patch just pulls those functions out. Found in
tracebacks while working on a related bug.
Kevin Benton [Fri, 17 Apr 2015 10:36:50 +0000 (03:36 -0700)]
Set loading strategy to joined for Routerport/Port
The RouterPort model has a relationship to the ports model which
is frequently relied on to get the port IDs of interfaces attached
to a router. However, this defaults to the loading strategy to
'select', which meant a new query was being emitted for every
interface to the ports table just to get the ID.
This patch adjusts the relationship to be 'joined' by default so
one query will fetch the related ports.
Another option would have been not to use the port object at all since
the ID is all that the callers were usually interested in. However,
they would end up using the ID to do a port lookup, which is being
optimized away in another patch anyway so the full port object from
the relationship will end up getting used.