Management Scripts (“Hammers”)

These are the collection of scripts used to fix inconsistencies in the various OpenStack services. Not particularly bright tools, but good for a first-pass.

The tools are run in a Python virtualenv to avoid package clashes with the system Python. On the CHI@TACC and CHI@UC controllers, they live at /root/scripts/hammers/venv. The scripts can be called directly without any path shenanigans by providing the full path, e.g. /root/scripts/hammers/venv/bin/conflict-macs info, and that is how the cronjobs do it.

Setup

As mentioned in the intro, the scripts are run in a virtualenv. Here’s how to set it up:

  1. Get code
mkdir -p /root/scripts/hammers
cd /root/scripts/hammers
git clone https://github.com/ChameleonCloud/hammers.git
  1. Create environment
virtualenv /root/scripts/hammers/venv
/root/scripts/hammers/venv/bin/pip install -r /root/scripts/hammers/hammers/requirements.txt
/root/scripts/hammers/venv/bin/pip install -e /root/scripts/hammers/hammers

Note

Because the hammers repo was installed with -e, some updates in the future can be done by cd-ing into the directory and git pull-ing. Updates that change script entrypoints in setup.py will require a quick pip install...

  1. Set up credentials for OpenStack and Slack

The Puppet cronjobs have a configuration variable that points to the OS shell var file, for instance /root/adminrc.

There is also a file for Slack vars, e.g. /root/scripts/slack.json. It is a JSON with a root key "webhook" that is a URL (keep secret!) to post to and another root key "hostname_name" that is a mapping of FQDNs to pretty names.

Example:

{
    "webhook": "https://hooks.slack.com/services/...super-seekrit...",
    "hostname_names": {
        "m01-07.chameleon.tacc.utexas.edu": "CHI@TACC",
        "m01-03.chameleon.tacc.utexas.edu": "KVM@TACC",
        "admin01.uc.chameleoncloud.org": "CHI@UC"
    }
}

Running

You can either source venv/bin/activate the virtualenv to put the scripts into the path, or directly execute them out of the directory, venv/bin/neutron-reaper

Common Options:

  • --slack <json-options> - if provided, used to post notifications to Slack
  • --osrc <rc-file> - alternate way to feed in the OS authentication vars

Script Descriptions

Neutron Resource “Reaper”

neutron-reaper {info, delete}                    {ip, port}                    <grace-days>

Reclaims idle floating IPs and cleans up stale ports.

Required arguments, in order:

  • info to just display what would be cleaned up, or actually clean it up with delete.
  • Consider floating ip’s or port’s
  • A project needs to be idle for grace-days days.

Conflicting Ironic/Neutron MACs

conflict-macs         {info, delete}         ( --ignore-from-ironic-config <path to ironic.conf> |
      --ignore-subnet <subnet UUID> )

The Ironic subnet must be provided—directly via ID or determined from a config—otherwise the script would think that they are in conflict.

Undead Instances

Sometimes Nova doesn’t seem to tell Ironic the instance went away on a node, then the next time it deploys to the same node, Ironic fails.

undead-instances {info, delete}

Running with info displays what it thinks is wrong, and with delete will clear the offending state from the nodes.

Ironic Node Error Resetter

Basic Usage:

ironic-error-resetter {info, reset}

Resets Ironic nodes in error state with a known, common error. Started out looking for IPMI-related errors, but isn’t intrinsically specific to them over any other error that shows up on the nodes. Records the resets it performs on the node metadata (extra field) and refuses after some number (currently 3) of accumulated resets.

Currently watches out for:

Dirty Ports

Basic Usage:

dirty-ports {info, clean}

There was/is an issue where a non-empty value in an Ironic node’s port’s internal_info field would cause a new instance to fail deployment on the node. This notifies (info) or cleans up if there is info on said ports on nodes that are in the “available” state.

Orphan Resource Providers

orphan-resource-providers {info, update}

Occasionally, compute nodes are recreated in the Nova database with new UUIDs, but resource providers in the Placement API database are not updated and still refer to the old UUIDs. This causes failures to post allocations and results in errors when launching instances. This detects the issue (info) and fixes it (update) by updating the uuid field of resource providers.

Curiouser

Note

Not well-tested, may be slightly buggy with Chameleon phase 2 updates.

curiouser

Displays Ironic nodes that are in an error state, but not in maintenance. The Ironic Error Resetter can fix some error states automatically.

Metadata Sync

Synchronizes the metadata contained in the G5K API to Blazar’s “extra capabilities”. Keys not in Blazar are created, those not in G5K are deleted.

If using the soft removal option, you could follow up with a manual query to clean up the empty strings:

DELETE FROM blazar.computehost_extra_capabilities WHERE capability_value='';

GPU Resource “Lease Stacking”

Puppet Directives

Add cronjob(s) to Puppet. These expect that the above setup is already done.

$slack_json_loc = '/root/scripts/slack.json'
$osrc_loc = '/root/adminrc'
$venv_bin = '/root/scripts/hammers/venv/bin'

cron { 'hammers-neutronreaper-ip':
    command => "$venv_bin/neutron-reaper delete ip 14 --slack $slack_json_loc --osrc $osrc_loc 2>&1 | /usr/bin/logger -t hammers-neutronreaper-ip",
    user => 'root',
    hour => 5,
    minute => 20,
}
cron { 'hammers-ironicerrorresetter':
    command => "$venv_bin/ironic-error-resetter info --slack $slack_json_loc --osrc $osrc_loc 2>&1 | /usr/bin/logger -t hammers-ironicerrorresetter",
    user => 'root',
    hour => 5,
    minute => 25,
}
cron { 'hammers-conflictmacs':
    command => "$venv_bin/conflict-macs info --slack $slack_json_loc --osrc $osrc_loc --ignore-from-ironic-conf /etc/ironic/ironic.conf 2>&1 | /usr/bin/logger -t hammers-conflictmacs",
    user => 'root',
    hour => 5,
    minute => 30,
}
cron { 'hammers-undeadinstances':
    command => "$venv_bin/undead-instances info --slack $slack_json_loc --osrc $osrc_loc 2>&1 | /usr/bin/logger -t hammers-undeadinstances",
    user => 'root',
    hour => 5,
    minute => 35,
}
cron { 'hammers-orphanresourceproviders':
  command => "$venv_bin/orphan-resource-providers info --slack $slack_json_loc 2>&1 | /usr/bin/logger -t hammers-orphanresourceproviders",
  user => 'root',
  hour => 5,
  minute => 40,
}
cron { 'hammers-gpuleasestacking':
  command => "$venv_bin/lease-stack-reaper delete --slack $slack_json_loc 2>&1 | /usr/bin/logger -t hammers-leasestacking",
  user => 'root',
  hour => 5,
  minute => 40,
}