John Griffith [Fri, 24 Oct 2014 14:19:19 +0000 (08:19 -0600)]
Reserve 5 migrations for backports
Reserve 5 migrations incase the need arises to backport any
fixes that require a db migration in stable/juno.
We've never set this up in the past and we did run into a case
last cycle where we had to hack some things around to make it work
without the place holder.
Why 5? Why not? For as little as we touch the DB historically
this number should be more than sufficient.
This patch allows an OpenStack environment to run as a secure NAS
environment from the client and server perspective, including having
root squash enabled and not running file operations as the 'root'
user. This also sets Cinder file permissions as 660: removing
other/world file access.
The "nas_secure_file_permissions" option controls the setting of file
permissions when Cinder volumes are created. The option defaults to
"auto" to gracefully handle upgrade scenarios. When set to "auto",
a check is done during Cinder startup to determine if there are
existing Cinder volumes: no volumes will set the option to 'true',
and use secure file permissions. The detection of existing volumes will
set the option to 'false', and use the current insecure method of
handling file permissions.
The "nas_secure_file_operations" option controls whether file
operations are run as the 'root' user or the current OpenStack
'process' user. The option defaults to "auto" to gracefully handle
upgrade scenarios. When set to "auto", a check is done during Cinder
startup to determine if there are existing Cinder volumes: no volumes
will set the option to 'true', be secure and do NOT run as the 'root'
user. The detection of existing volumes will set the option to 'false',
and use the current method of running operations as the 'root' user.
For new installations, a 'marker file' is written so that subsequent
restarts of Cinder will know what the original determination had been.
This patch enables this functionality only for the NFS driver.
Other similar drivers can use this code to enable the same
functionality with the same config options.
John Griffith [Fri, 17 Oct 2014 04:43:20 +0000 (22:43 -0600)]
Turn on Flake-8 Complexity Checking
Flake8 provides the ability to measure code complexity. There are
a lot of modules in Cinder that are considered "too complex", the
worst being "cinder/tests/test_huawei_hvs.py:110:1:" with a complexity
ranking of 59.
There's some outlyers at the higher end here, but the majority of the
code checks in at under 30, so let's make that our threshold and ignore
the two offenders that are above that for now.
Granted this may or may not be valuable, but it doesn't hurt to try it
and if we all hate it or find there's no value but it makes life difficult
we can always turn it back off.
See flake8.readthedocs for more info on flake8 and McCabe complexity
checking.
Matt Riedemann [Thu, 16 Oct 2014 15:39:07 +0000 (08:39 -0700)]
Log a warning when getting lvs and vgs takes longer than 60 seconds
We know something is causing lvs/vgs commands to block while deleting a
volume and this is causing Tempest to timeout while waiting for the
volume to be deleted. What we don't have right now is very good
(specific) logging when this happens, unless we get messages in syslog
for lvm tasks taking more than 120 seconds, but that doesn't always
happen when we see the volume delete timeout in Tempest.
This patch adds a check for when getting logical volumes and volume
groups takes longer than 60 seconds and logs a warning if that happens.
This is helpful in production also because the default interval for
periodic tasks is 60 seconds so having these take longer than that time
could cause periodic tasks to block up on each other and you'll get
warnings from the FixedIntervalLoopingCall in oslo which is controlling
the task runs.
Stuart McLaren [Fri, 5 Sep 2014 12:48:04 +0000 (12:48 +0000)]
Add client_socket_timeout option
Add a parameter to take advantage of the new(ish) eventlet socket timeout
behaviour. Allows closing idle client connections after a period of
time, eg:
$ time nc localhost 8776
real 1m0.063s
Setting 'client_socket_timeout = 0' means do not timeout.
Vincent Hou [Fri, 10 Oct 2014 07:46:36 +0000 (15:46 +0800)]
IBM Storwize driver: Add local variable assignment to "ctxt"
* The method get_vdisk_params in helpers.py is missing a local variable
assignment for "ctxt", causing "UnboundLocalError: local variable
'ctxt' referenced before assignment. Adding the assignment should
resolve this issue.
* Add the unit tests coverage for get_vdisk_params.
Patrick East [Tue, 14 Oct 2014 22:45:38 +0000 (15:45 -0700)]
Multipath commands with error messages in stdout fail to parse
This change fixes an issue in find_multipath_device() where the command
output of ‘multipath -l <device>’ would sometimes fail to be parsed if
there were error messages in the stdout string in addition to the
expected output. We will now strip out the error messages before we
attempt to parse the lines.
Andrew Kerr [Thu, 29 May 2014 03:16:23 +0000 (08:46 +0530)]
NetApp fix to set non default server port in api
The non default netapp_server_port config option was not
getting set in api even if specified in cinder.conf. Its
made non mandatory and set if specified in the configuration.
Tomoki Sekiyama [Tue, 14 Oct 2014 23:09:44 +0000 (19:09 -0400)]
Fix LVM iSCSI driver tgtadm CHAP authentication
Currently CHAP Authentication in LVM iSCSI driver with tgtadm does not work.
This is because the tgtadm helper creates the target configuration file
with an 'IncomingUser' entry, which is ignored by tgtd.
This patch fixes it to 'incominguser'.
Mitsuhiro Tanino [Tue, 14 Oct 2014 16:41:41 +0000 (12:41 -0400)]
Export cinder volumes only if the status is 'in-use'
Currently, cinder volumes are exported both 'in-use' and 'available'
after restarting cinder-volume service.
This behavior was introduced following commit.
If the volumes are attached to nova instances, they should be exported
via tgtd after restarting cinder-volume.
But the volumes which are not attached to instances must not be exported
because everyone can connect these volumes.
This patch changes volume export behavior that exports a volume only if
the volume status is 'in-use'.
John Griffith [Fri, 10 Oct 2014 01:22:03 +0000 (19:22 -0600)]
Move SolidFire driver from httplib to requests
The SolidFire driver has been pretty static for a number of
years now, this change is to move from httplib for API calls
to requests. There are a number of advantages to this, including
performance, simplicity and ability to add things like ssl support
easily.
In addtion this change removes the confusing looping/retry mechanisms
that were in the issue_api_request method and replaces it with a
retry decorator for the exceptions we're interested in retrying.
Finally, I realize that my unit tests suck! That will be one of the
follow up items after a bit more clean up in the driver.
we need to to check the value of the configuration item glance_num_retries
in the code in order to ensure the " glance_num_retries " is equal or greater
than 0
The WSDL URL of storage policy service is determined and a session is
created using it in do_setup(). This session is later used to initialize
the datastore selector property (ds_sel), which uses the session for all
storage policy related API calls.
After commit a8fa3ceb1e72bac2ab67f569a2ca009f995f59fd (Integrate
OSprofiler and Cinder), the properties defined in vmdk module are called
before do_setup(). As a result, the ds_sel (datastore selector) property
is initialized with a session instance containing a 'None' PBM (storage
policy service) WSDL URL. This results in failures of all storage policy
related APIs invoked using datastore selector. This patch fixes the
problem by re-initializing the property in do_setup().
Fix exception handling on test_delete_nonexistent_volume
test_delete_nonexistent_volume wants to check the exception would be
received when nonexistent volume was specified. But currently, this
test case checks the exception would be received when nonexistent
metadata was specified.
we need to to check the value of the configuration item eqlx_cli_max_retries
in the code in order to ensure the "eqlx_cli_max_retries" is equal to or
greater than 0
DocImpact: The 'retries' is not a configured number of attempts
Change-Id: If9fadda83a855b4bbda6129d3b3a64d296eb2b54
Closes-Bug: #1372454
Jay S. Bryant [Wed, 10 Sep 2014 03:07:48 +0000 (22:07 -0500)]
Remove deprecated use of gettextutils import _
The initial hacking check I wrote allowed users to check in code
using 'from cinder.openstack.common.gettextutils import _' or
'from cinder.i18n import _'. This was to ease the transition for
code that was in flight with the old import.
Now that we have moved Cinder over to using cinder.i18n and most
of the code that was in flight with the old import has been merged
and/or fixed, I am updating the hacking check to enforce the use of
cinder.i18n and fixing the cases that still had the old import.
Mark Sturdevant [Sat, 13 Sep 2014 06:48:16 +0000 (23:48 -0700)]
HP 3PAR: Don't ignore extra-specs snap_cpg when missing cpg
When snap_cpg is specified in extra-specs, it should be used.
For some reason, it was being ignored when extra-specs did not
also specify a user cpg.
When using a volume-type, the snapCPG should come from (in this order
of preference):
1. extra specs snap_cpg,
2. extra specs cpg,
3. config hp3par_cpg_snap,
4. config hp3par_cpg.
Mark Sturdevant [Mon, 9 Jun 2014 02:57:21 +0000 (10:57 +0800)]
3PAR with pool-aware-cinder-scheduler
HP 3PAR support for the pool-aware scheduler.
Adds support for a list of CPGs (pools). Uses
model update when our volume-type CPG is used
instead of the scheduler selected CPG.
In cinder.conf, hp3par_cpg now accepts a list of CPGs.
Currently the display name and description of the target volume
is changed to that in the backup meta-data during backup restore.
This can confuse the end-user, especially in the case of restore
to an existing volume. This patch removes display name and display
description from the list of fields to be copied from the backup
meta-data to update the volume.
During create_backup failure handling, backup_update fails with
DataError ("Data too long for column") if the fail_reason is
greater than 255 characters. As a result, backup status is stuck
in 'creating' state. This patch avoids the problem by truncating
fail_reason to 255 characters before update.
Sean McGinnis [Tue, 7 Oct 2014 15:29:26 +0000 (10:29 -0500)]
Fix eqlx CLI output parsing on bad input
The eqlx driver would identify CLI command completion
by looking for the system name prompt at the end of
the output ("ARRAY_NAME>"). In some cases where there
is bad input the array will print an error, then
prepopulate the command again so it can be edited,
resulting in the output:
"ARRAY_NAME> [bad command]"
The array name prompt only gets printed on command
completion, so the fix is to look for the prompt
anywhere in the CLI output.
Sean McGinnis [Tue, 7 Oct 2014 15:10:57 +0000 (10:10 -0500)]
Eqlx fix NoSuchOptError for volume_name_template on clone
The eqlx driver was referencing the volume_name_template
config setting via self.configuration.volume_name_template.
This option is not imported in self.configuration.
The curent preferred method for volume clone is to reference
the passed in name, avoiding the need for the driver to
know what the naming template is all together.
John Griffith [Tue, 7 Oct 2014 17:49:58 +0000 (11:49 -0600)]
Make sure device support Direct before setting
We added '-t none' option to the qemu-img convert operation
in image_utils.py a while back to accomodate a couple of
backend devices that didn't flush writes on disconnect.
(Change: I7a04f683add8c23b9125fe837c4048ccc3ac224d)
The only problem here is that some backend devices don't
support Direct mode and raise an exception and fail when
setting this option.
This patch adds a simple check using dd to see if the dest
supports the Direct flag and only sets '-t none' if the device
does in fact support it.
Additionally it was brought up that even yet other backends
are using file devices not blk devices. In their case setting
Direct will still work, however it's sub-optimal as qemu-convert
has internal mechanisms to make sure flushing etc are done
correctly and efficiently for those devices. So to accomodate
that particular use case I'm also adding a check if blk dev
that can be used for determining whether to set Direct for the
qemu-convert process.
Tom Barron [Tue, 23 Sep 2014 22:05:40 +0000 (18:05 -0400)]
Eseries warn if multipath is not set for img xfer
Warn at driver startup if the configuration option
"use_multipath_for_image_xfer" has not been set to "True". Eseries filers
require appropriate multipath/DMMP configuration on the host running the
cinder volume process for image transfers to work reliably.
Currently, the Coraid driver requires storing Storage Repository
name in volume type key named 'coraid_repository' or different key name
if a value of the 'coraid_repository_key' option changed in the cinder
configuration file.
In case of creating a volume without volume type associated or creating
a volume with volume type that doesn't have an appropriate key with
the repository name, volume creation fails.
That is an inconvenient behaviour, especially in case of using that with
Tempest that creates new volume types but obviously doesn't know
that it should add keys with the repository name.
In order to fix that, introduce a 'coraid_default_repository' config
option that allows to set a default repository to use if one is not
specified in the volume type key.
In case if neither volume type nor config file provide a repository
name, throw CoraidException with description of the problem.
Windows SMBFS: Handle volume_name in _qemu_img_info
The volume_name is now parsed to the _qemu_img_info wrapper. As
this method is not prone to security issues because this driver
does not support raw images (at least not yet), we don't have to
perform any checks on the backing image file path.
Thus, this method simply ignores this argument that will be parsed
by the base class methods.
Sean McGinnis [Fri, 26 Sep 2014 20:21:35 +0000 (15:21 -0500)]
Handle eqlx SSH connection close on abort.
EqualLogic array CLI operation timeout causes the
SSH thread to be aborted. This would cause SSH
sessions to be orphaned and hit a max connection
limit on the array. This fix catches these aborts
and makes sure the connection is closed.