Management Scripts (“Hammers”)¶
These are the collection of scripts used to fix inconsistencies in the various OpenStack services. Not particularly bright tools, but good for a first-pass.
The tools are run in a Python virtualenv to avoid package clashes with the
system Python. On the CHI@TACC and CHI@UC controllers, they live at
/root/scripts/hammers/venv
. The scripts can be called directly without
any path shenanigans by providing the full path, e.g.
/root/scripts/hammers/venv/bin/conflict-macs info
, and that is how the
cronjobs do it.
Setup¶
As mentioned in the intro, the scripts are run in a virtualenv. Here’s how to set it up:
- Get code
mkdir -p /root/scripts/hammers
cd /root/scripts/hammers
git clone https://github.com/ChameleonCloud/hammers.git
- Create environment
virtualenv /root/scripts/hammers/venv
/root/scripts/hammers/venv/bin/pip install -r /root/scripts/hammers/hammers/requirements.txt
/root/scripts/hammers/venv/bin/pip install -e /root/scripts/hammers/hammers
Note
Because the hammers repo was installed with -e
, some updates in the
future can be done by cd
-ing into the directory and git pull
-ing.
Updates that change script entrypoints in setup.py
will require a
quick pip install...
- Set up credentials for OpenStack and Slack
The Puppet cronjobs have a configuration variable that
points to the OS shell var file, for instance /root/adminrc
.
There is also a file for Slack vars, e.g. /root/scripts/slack.json
. It
is a JSON with a root key "webhook"
that is a URL (keep secret!) to post to
and another root key "hostname_name"
that is a mapping of FQDNs to
pretty names.
Example:
{
"webhook": "https://hooks.slack.com/services/...super-seekrit...",
"hostname_names": {
"m01-07.chameleon.tacc.utexas.edu": "CHI@TACC",
"m01-03.chameleon.tacc.utexas.edu": "KVM@TACC",
"admin01.uc.chameleoncloud.org": "CHI@UC"
}
}
Running¶
You can either source venv/bin/activate
the virtualenv to put the scripts
into the path, or directly execute them out of the directory,
venv/bin/neutron-reaper
Common Options:
--slack <json-options>
- if provided, used to post notifications to Slack--osrc <rc-file>
- alternate way to feed in the OS authentication vars
Script Descriptions¶
Neutron Resource “Reaper”¶
neutron-reaper {info, delete} {ip, port} <grace-days>
Reclaims idle floating IPs and cleans up stale ports.
Required arguments, in order:
info
to just display what would be cleaned up, or actually clean it up withdelete
.- Consider floating
ip
’s orport
’s - A project needs to be idle for
grace-days
days.
Conflicting Ironic/Neutron MACs¶
conflict-macs {info, delete} ( --ignore-from-ironic-config <path to ironic.conf> |
--ignore-subnet <subnet UUID> )
The Ironic subnet must be provided—directly via ID or determined from a config—otherwise the script would think that they are in conflict.
Undead Instances¶
Sometimes Nova doesn’t seem to tell Ironic the instance went away on a node, then the next time it deploys to the same node, Ironic fails.
undead-instances {info, delete}
Running with info
displays what it thinks is wrong, and with delete
will clear the offending state from the nodes.
Ironic Node Error Resetter¶
Basic Usage:
ironic-error-resetter {info, reset}
Resets Ironic nodes in error state with a known, common error. Started out
looking for IPMI-related errors, but isn’t intrinsically specific to them
over any other error that shows up on the nodes. Records the resets it
performs on the node metadata (extra
field) and refuses after some number
(currently 3) of accumulated resets.
Currently watches out for:
Dirty Ports¶
Basic Usage:
dirty-ports {info, clean}
There was/is an issue where a non-empty value in an Ironic node’s port’s
internal_info
field would cause a new instance to fail deployment on the
node. This notifies (info
) or clean
s up if there is info on said
ports on nodes that are in the “available” state.
Orphan Resource Providers¶
orphan-resource-providers {info, update}
Occasionally, compute nodes are recreated in the Nova database with new UUIDs,
but resource providers in the Placement API database are not updated and still
refer to the old UUIDs. This causes failures to post allocations and results in
errors when launching instances. This detects the issue (info
) and fixes it
(update
) by updating the uuid
field of resource providers.
Curiouser¶
Note
Not well-tested, may be slightly buggy with Chameleon phase 2 updates.
curiouser
Displays Ironic nodes that are in an error state, but not in maintenance. The Ironic Error Resetter can fix some error states automatically.
Metadata Sync¶
Synchronizes the metadata contained in the G5K API to Blazar’s “extra capabilities”. Keys not in Blazar are created, those not in G5K are deleted.
If using the soft removal option, you could follow up with a manual query to clean up the empty strings:
DELETE FROM blazar.computehost_extra_capabilities WHERE capability_value='';
GPU Resource “Lease Stacking”¶
Puppet Directives¶
Add cronjob(s) to Puppet. These expect that the above setup is already done.
$slack_json_loc = '/root/scripts/slack.json'
$osrc_loc = '/root/adminrc'
$venv_bin = '/root/scripts/hammers/venv/bin'
cron { 'hammers-neutronreaper-ip':
command => "$venv_bin/neutron-reaper delete ip 14 --slack $slack_json_loc --osrc $osrc_loc 2>&1 | /usr/bin/logger -t hammers-neutronreaper-ip",
user => 'root',
hour => 5,
minute => 20,
}
cron { 'hammers-ironicerrorresetter':
command => "$venv_bin/ironic-error-resetter info --slack $slack_json_loc --osrc $osrc_loc 2>&1 | /usr/bin/logger -t hammers-ironicerrorresetter",
user => 'root',
hour => 5,
minute => 25,
}
cron { 'hammers-conflictmacs':
command => "$venv_bin/conflict-macs info --slack $slack_json_loc --osrc $osrc_loc --ignore-from-ironic-conf /etc/ironic/ironic.conf 2>&1 | /usr/bin/logger -t hammers-conflictmacs",
user => 'root',
hour => 5,
minute => 30,
}
cron { 'hammers-undeadinstances':
command => "$venv_bin/undead-instances info --slack $slack_json_loc --osrc $osrc_loc 2>&1 | /usr/bin/logger -t hammers-undeadinstances",
user => 'root',
hour => 5,
minute => 35,
}
cron { 'hammers-orphanresourceproviders':
command => "$venv_bin/orphan-resource-providers info --slack $slack_json_loc 2>&1 | /usr/bin/logger -t hammers-orphanresourceproviders",
user => 'root',
hour => 5,
minute => 40,
}
cron { 'hammers-gpuleasestacking':
command => "$venv_bin/lease-stack-reaper delete --slack $slack_json_loc 2>&1 | /usr/bin/logger -t hammers-leasestacking",
user => 'root',
hour => 5,
minute => 40,
}