All posts by Josh

Minds: @JoshFuhs Twitter: @JoshFuhs

Supermicro Motherboard Front Panel Header JF1

I’ve had the chance to build a few AMD-based Supermicro boxes over the years. SuperMicro makes server cases, motherboards, and other datacenter equipment. Because they provide both cases and motherboards, they’ve standardized a front panel header that works, I presume, with all of their products. I have only ever used alternative ATX rackmount cases, so I have been left to wire up the front panel headers by hand.

Unfortunately, the pin diagram for the SuperMicro front panel header on some of their boards isn’t what I’ve come to expect building boxes over the years. Most motherboards will declare Signal+ and Signal- for all things that are relevant on the front panel, but the front-panel header on the H11SSL SuperMicro boards looks like this:

The power and reset buttons are labeled as expected, Power+ and a nearby ground. The LEDs however list each LED with an adjacent 3.3V powered pin. Typically I would expect to see LED+ and LED- or LED and ground, so for a while I was avoiding plugging in the LEDs to avoid frying any case panels or motherboard parts.

A short time ago, I looked for pinouts for other SuperMicro boards (since the connector is standard), and I found the explanation. Here’s a description of the C7z97 motherboard header pinout:

This makes it more explicit. Each LED listed is LED- while the 3.3V pins are intended to be LED+.

This article also backs up my findings: https://www.unixgr.com/pinout-for-supermicro-fp836-front-panel-connector/

To switch naming conventions within a diagram seems odd, but now it’s clear. Maybe someone else will find this useful.

Optimizing RandomX: Loop Invariant Extraction

As seen in my last post, most of the RandomX execution time is spent in the execution of the randomly generated program. Each instruction in a loop executes an average of 620 times within a hash round, or 1.2 million times vs. the 2048 executions per hash.

Roughly half, or 128, of the instructions end up in loops.

Invariant Candidates

RandomX instructions are designed to mix inputs and outputs. This means the output of most instructions won’t be invariant, but there are a couple instructions where the inputs and outputs are independent:

  • CFROUND
  • ISTORE

CFROUND sets the floating point rounding mode based on an input register and the immediate instruction bits. If the input register is unchanged between calls to the same instruction, then the same rounding mode is set each time.

ISTORE takes 2 input registers, src and dst, and writes src to a memory location dependent on dst and other bits within the instruction. If src and dst are unchanged between calls to the same ISTORE instruction, the same value is written to the same location.

Invariant Rules

The rules for when an instruction, inst, can leave the loop are roughly as follows:

  • If the output of inst is never used in the loop AND the inputs never change in the loop, then inst can be moved off the back of the loop — this is obviously invariant.
  • If the output of inst is never used in the loop AND the inputs change, but entirely before the instruction AND the outputs are written completely, then inst can be moved off the back of the loop. If all input changes happen before the instruction, then the output will be set multiple times, but only the last one matters because the output doesn’t get used within the loop. This only works, though, if each write completely overwrites the previous. This is important for ISTORE because changing memory addresses means that each write is contributing to the state of the hash rather than just the last one.
  • If the inputs of inst never change AND inst is the only instruction that sets its output AND the output is either never used in the loop OR it’s used entirely after inst, then inst can be moved off the front of the loop.

These rules don’t catch every case in which one of these instructions could move, but they get most of the candidates that I manually confirmed. If you know of something big that I missed, please leave a comment.

Results of Instruction Moving

When applied, these rules move an average 2.5 instructions per program out of RandomX loops. There’s a lot of variability in these programs, but using the above stated averages, this should reduce hash runtime from: (128 x 620 + 128) x 2048 to: (125.5 x 620 + 130.5) x 2048. That’s an expected improvement of ~2% of hash program execution time.

However, when measured, no noticeable gain is seen. In fact, the runtimes between two versions — one that moves instructions and one that doesn’t — are virtually identical. The variance between runs is less than 0.5%, so a 1-2% change should have been apparent.

This surprising result led me to reconsider what the system hardware might be doing. It’s possible that the hash is bound by memory bandwidth rather than instruction throughput, meaning that between my standard and “optimized” runs the execution time might be the same while the processor might be doing less. This might show up in power consumption measurements, but that measurement is not something I’m set up to do.

The Cost

Unfortunately, the analysis and reordering of the instructions takes some work, and while I probably don’t have a particularly fast implementation, the work required to do this increases program compilation cost by 4-5x. That makes this first optimization, even if the expected results were seen, an overall loser. For an improvement of 2% in program execution time, compilation time would need to be within 2x of original. However, once the analysis is being done, it can potentially be applied to other things.

A Look At RandomX

I recently dug into the new proof-of-work algorithm powering mining for the Monero crypto-currency, RandomX. It was rolled out to the Monero blockchain early in December of 2019, replacing the CryptoNight algorithm that had been used before it. Several other crypto-currencies are prepping to follow Monero down the RandomX path, so I thought it might be worth investigating. For more background on proof-of-work algorithms, see here.

One of the problems with proof-of-work algorithms and crypto-currencies is that the money reward for faster processing creates a positive feedback loop for the first ones to optimize. The first optimization turns into an increase in income which then makes it easier to make the next optimization, repeat. This usually results in most of the mining rewards going to a few large miners while being prohibitively expensive for a newcomer to get started.

Several crypto-currencies have made it a point to ensure that consumer hardware is competitive when mining to keep the mining workload distributed, which theoretically makes the crypto-currency more secure and definitely makes it more approachable. Ethereum and Monero have been particularly good at this with Ethash and CryptoNight, respectively, working well via consumer GPU computing and resisting FPGA or ASIC optimizations.

RandomX is the latest attempt which initially appears to be resistant even to GPU optimization. Is does this by making maximum use of the typical CPU architecture so that effective hardware optimization virtually requires designing a full processor. This is a costly process which standard CPU manufacturing offsets with volume sales.

Performance

Here’s a quick look at the performance of this algorithm on three different systems:

  • AMD Ryzen 2700 w/ 2-channel DDR4 3200 CAS 14
  • AMD Ryzen TR 3970X w/ 4-channel DDR4 3200 CAS 14
  • AMD EPYC 7502P w/ 8-channel DDR4 3200 CAS 24 (Buffered ECC)

I built the xmr-stak-rx 1.0.4 miner from source that can be found here. Make sure to apply the large memory page optimizations. You’ll need at least 4GB RAM and 2MB L3 cache.

SystemPerformance (kH/s)System Power (w)
Ryzen 27004.8105
Ryzen TR 3970X24355
EPYC 7502P28235

Structure of a Hash

The phases of the hash proceed as:

  • Initialization: AES-based scratchpad (2MB) fill
  • Program generation – AES-based hash (typically hardware accelerated)
  • Program compilation (for JIT-enabled hasher, much slower without)
  • Program execution
  • Results mixing

Here’s a breakdown of where the per-hash time is spent on the Ryzen 2700:

PhasePercent of Execution
Program Generation0.24
Program Compilation1.96
Program Execution93.84
Results Mixing3.95

As we can see, the majority of time is spent in program execution. If we’re going to make significant improvements, it’s likely going to be there.

The number of things happening looks roughly like this:

  • 8 programs generated and executed
    • 2048 iterations of
      • 256 randomly generated instructions including an average of 25 loop instructions
        • Average of 620 iterations per loop instruction (measured)

Each instruction run in an inner loop instruction runs an average of 1.2 million times. Every other generated instruction runs 2048 times. If we find a clever way to move these instructions out of their loops or remove them entirely, there may be noticeable savings.

In the next couple entries, I’ll discuss the results of attempts at improving hash execution time.

Management Systems and Software Engineering Similarities

I spend my time between software development and management system development, and the similarities have never been clearer.

Business is about getting things done in a sustainable way. Improvement is almost always the name of the game. Computer science is all about solving problems with the fewest resources. Software engineering is the process by which software eventually achieves that goal. These things could not fit together much better.

Even the terminology matches. Operating systems procedures and processes match their management system counterparts definition. This was almost certainly adopted by the original operating systems developers as it was certainly their goal to apply software to business problems. The major distinction is that one executes via teams of people whereas the other on machines.

Formal Management Systems focus on establishing documented activities, defining how to change them, and determining how to evaluate them.

There are so many analogues to software engineering. The documentation system is the source control system (like git). The training system is the compilation and deployment processes (note: “training” computer systems is much more predictable). Complaint handling and CAPA are your bug tracking system, software internal audit is basically the regression test suites. The Top Management functions, choosing software process direction and evaluating the effectiveness, are handled by the Product Owner.

Formal Management System procedures could be treated as software, with some level of unit and regression testing. All of the lessons from the move to micro-services could be very quickly applied to management system processes including: error management, fault tolerance, and monitoring.

All of this is not to overlook the extra complexities of management systems where people are the primary executors. I just happen to be less familiar with those.

Thoughts on ReactJS

I’ve spent some time with ReactJS recently (and in the past), and I thought I would make some notes.

ReactJS is a UI management framework created for managing the presentation and interaction of UI components on a web app (or native mobile applications with ReactNative). It manages the structural document changes behind the scenes and renders them to the browser/screen.

There is a paradigm called Model-View-Controller (MVC) for UI development. The Model represents the data layer, the View represents its display, and the Controller represents the code used to mediate the two. I’ve said in the past that if the Model were rich enough, the Controller would be unnecessary.

ReactJS fills at least the View side of the MVC paradigm — possibly even the View-Controller. There are some standard data-management libraries that frequently accompany it, but they are optional, and I have not used them.

For each ReactJS UI component, you follow the lifecycle through the construction, presentation, maintenance, and teardown of the bits that get rendered to an HTML document.

Despite my love for the functional model, UI design is fundamentally an exercise in state management. There is no user interaction with anything that isn’t changing small stateful pieces one at a time to achieve some effect. However, I would say React does a good job of delineating where state change should happen vs. the side-effect-free computations.

The maintenance phase of each component is very unstructured. State updates happen either through internal updates (this.state) or through the owning component (this.props), and it’s up to the developer to wire the downstream effects together via JS code. The one exception to this is the update of the presentation, which always happens if the component state changes (except in rare circumstances).

In the past, I built small UIs of communicating state machines. React would have been a great tool to have to help with the management of adding and removing components, but that’s about where the benefit would end. I was ultimately going for much more granular control of the UI component interactions. I would rather spell out explicit data flows and state changes in a model than have them implicitly buried in blocks of code.

I think React has the potential to be the foundation of a much richer UI management framework. There are frameworks like AngularJS and VueJS that I’m much less familiar with that may already do what I’m looking for. I’ll have to check them out at some point. My preference for the “minimal” option took me down the ReactJS path, and I like it.

Universal Computing Interface

It’s a good exercise to ask, “What’s the smallest thing that would work?” Understanding the limits is a key step in the creation of anything.

In that spirit, what is the minimum description of software? This question has been the domain of computer language designers for decades. What’s the minimum syntax? I think Lisp can be declared the winner. What’s the minimum required description of a language? That’s still being worked out.

What is minimum required computing interface? That one we might be able to answer. The public clouds are touching on the answer in the form of AWS Lambda or Azure Service Fabric — inspired (I assume) by the general trend toward micro-services. There is a notion of an event that can be accepted, and a pre-defined computation that should be executed with the event. In the case where no event is required, you just have a computation that needs to be run to completion.

Any computational job could be described as an event to respond to and a job to run. Event-driven programming is a powerful paradigm that has been around for a while. It requires an effective way to express kinds of events and some kind of program to accept them.

This is a very light definition, so it’s an attractive interface. However, there are some aspects of computing that it doesn’t take into consideration. The biggest one is the run time of the job. For example, it’s often the case that if you can process one event in 1 second, you can process 10 events in 5. This is often the case in database or distributed applications where your computation involves the creation of lots of intermediate sets. There are caching techniques to try to mitigate the problem, but often it’s much more effective to do lots of the same things together. This suggests that the interface needs to be expanded.

OpenStack Single-Node Options

After getting a DevStack node running here, I realized that a DevStack cluster wasn’t going to be as useful as I would like. DevStack is designed for developer testing purposes, and doesn’t recover well in the event of a machine reboot.

I decided to look at some options that would lead to a cluster and, ultimately, a high-availability (HA) configuration.

Canonical conjure-up

A colleague had recommended Canonical’s conjure-up, so I decided to give the Workstation guide a try:

https://ubuntu.com/openstack/install#workstation-deployment

Initially, I hit one of three problems:

  1. neutron-gateway/0 error: hook failed: "config-changed"
    Specifically, neutron-gateway/0 didn’t like the dataport br-ex:eno2, claiming that eno2 didn’t exist. Confirmed. I looked for ways that eno2 might be expected to be generated without much luck.

    This github issue looks like it describes the problem almost perfectly (except for the name of the network interface), but the fix/correction makes no sense to me. It also looks like it was fixed in 2016, so I don’t think this is my problem.

    I took a chance at tweaking the configuration parameters. I set other options for the dataport parameter. Specifying eth1 could allow neutron-gateway/0 to complete installation, but then I typically run into another problem with neutron-api/0. I didn’t get far in investigating this one. If I’m running into these kinds of problems following what should be a straight-forward install guide, then I have no idea how deep the issues go.
  2. Machine locked up and become completely unresponsive. This was only seen on VMs (mostly 18.04), but I didn’t try much on bare metal.
  3. System setup hang: all processes were either active, blocked, or waiting — many just waiting on a machine. This was more rare.

There were regularly blocked-service errors in the conjure-up logs, leading to some confusion about the actual source of the installation error(s).

This is not a good start for Canonical. I tried their guide on 16.04 and 18.04, VMs and hardware — all with no success.

I’ve filed an issue with conjure-up: https://github.com/conjure-up/conjure-up/issues/1612

I asked a question: https://askubuntu.com/questions/1157568/how-to-install-openstack-with-novalxd-on-ubuntu-16-04

A week after filing the conjure-up issue, I’ve not heard back, and my question on askubuntu.com has gotten little attention. Other issues have been filed with exactly the same problems. This one is the most active: https://github.com/conjure-up/conjure-up/issues/1618

It seems odd to me that such simple instructions posted by a major OS vendor would be allowed to get to a state where they fail so completely.

Bottom line, I think the conjure-up OpenStack Workstation install is busted. There was a bit of irony in this struggle due to all of the “works by magic” claims with conjure-up and juju. Their magic lacks some potency.

The only good thing to come out of this was that I figured out how to interact with juju — a service that looks a bit like a Puppet/Chef/Ansible installer/manager. It looks like it has support for all kinds of different cloud types and vendors, but if those integrations work like this busted localhost install, I’m not sure that I’ll get much out of them.

There is still the conjure-up cluster installation (which was the original recommendation to me). More on that coming soon.

PackStack

I mentioned that this was an option in the first article, and I was fed-up with conjure-up, so I decided to look at OpenStack on CentOS.

I’m using CentOS 7 on a 6 core, 24 GB RAM, 128 GB disk VM.

I found this guide: https://linuxhint.com/install_openstack_centos/
The only adjustment needed was to run:

yum install -y yum-utils

so that the yum-config-manager step will work correctly.

I have tried this with both OpenStack versions rocky and stein, but not queens as indicated in the instructions. Just replace rocky or stein wherever queens is mentioned.

Success!

That’s how it should be done. Install time takes around 24 min — faster than the DevStack install.

The system survives reboots out of the box, which might make it a better candidate for more involved tests than DevStack. It would certainly behave better as a test system on hardware.

Unfortunately, when looking closer into clustering and HA configurations with packstack, it starts to look a little weaker.

Mentions

Throughout my investigation some other OpenStack options have popped up:

  • Mirantis – This is a professional OpenStack (and then some) system. It looks like these guys know what they’re doing, and they charge accordingly. There might have been a free evaluation option, but I prefer to try things that are unencumbered by licenses.
  • MicroStack – This name popped up in the AskUbuntu question. I don’t know much about it except that it doesn’t sound like it supports a high-availability configuration. Though they say it’s slated for 2020/2021.

Scripting

A quick test with some scripts written to interact with the DevStack install reveals that I had some things hard-coded to work with the DevStack install. The big adjustment I had to make was in the lookup of OpenStack service endpoints. Given the OpenStack architecture it makes sense that communication points could be different across different clusters. I wrote a bit about that here.

Next Steps

Several months into this project, I have a couple of ways to set up a demonstration installation of OpenStack on a single node, Python scripts to interact with it in some basic ways, and a technique for delivering data for project testing purposes. However, I still don’t have a system that takes advantage of the primary OpenStack capabilities: compute pooling across multiple hardware nodes. More on that to come.

OpenStack Python API Service Catalog

How do you get an OpenStack service catalog via the Python API? It took me way too long with too much code digging to figure it out, so I’ll share.

Note that the following was tested on versions rocky and stein.

The way that works for anyone that can log into the identity node:

from keystoneauth1 import loading, identity, session, exceptions

auth = identity.Password(
    auth_url,
    username=username,
    password=password,
    user_domain_id="default",
    project_domain_id="default",
    project_name=project_name )
sess = session.Session( auth=auth )

# This step was not obvious. It doesn't seem 
# to be directly doable from the Session, 
# which seems like the more obvious approach.
auth_ref = sess.auth.get_auth_ref(sess)
catalog = auth_ref.service_catalog.get_endpoints( interface="public" )
service_endpoints = {}
for s_name in catalog:
    s = catalog[s_name]
    service_endpoints[s_name] = s[0]["url"]

# Here's a dictionary of service-type -> endpoint URL
service_endpoints

This was actually the second technique discovered. The technique below was the first, but it only works for users that have admin privileges to a project and (I think) reader privileges to the project services.

from keystoneauth1 import loading, identity, session, exceptions
import keystoneclient

auth = identity.Password(
    auth_url,
    username=username,
    password=password,
    user_domain_id="default",
    project_domain_id="default",
    project_name=project_name )
sess = session.Session( auth=auth )

keyclient = keystoneclient.client.Client(
    "3.0",
    session=sess )

service_endpoints = {}
service_list = keyclient.services.list()
for s in service_list:
    endpoints = keyclient.endpoints.list( enabled=True, service=s.id, interface="public" )
    service_endpoints[s.type] = endpoints[0].url

# Here's a dictionary of service-type -> endpoint URL
service_endpoints

Note that the URLs from this technique will not include project ID’s, instead including a token to be replaced: %(tenant_id)

I have opened a question on Ask OpenStack that might eventually result in a better option: https://ask.openstack.org/en/question/123404/how-to-get-service-catalog-with-python-api/#123445

Formal Management System Core

Now that I’ve spent a few years working with and thinking about formal management systems like ISO 13485, ISO 27001, and others, I think I would break them down into two general tiers: the Management System Core and Domain Recommendations. In this case, I mean “domain” in the sense of: quality, information security, power systems, vulnerability disclosure, risk, etc.

The Management System Core answers the question: “What is at the heart of management?” It would be comprised of the Top Management, General Documentation, and Feedback/Improvement clauses of the typical ISO document. These are the foundation from which all others could be derived. For the most part, most of the non-core aspects of the standards seem to be little more than best practices, and they’re the parts that change the most between updates.

The Domain Recommendations (requirements) are essentially good practices for the domain — at least at the time of the publication. Quality System Domain Recommendations include things like infrastructure and engineering processes. Security Domain Recommendations include standard security controls.

If your domain doesn’t have an explicit standard, applying the core will eventually get you to a good point. Applying the core — bare — to domains with existing standards may produce a result better than the standard with time. Technologies and techniques get better.

OpenStack Images and Volumes

I’m starting from where I left off in my last article here. I have a single OpenStack test node running in a virtual environment hiding behind a firewall through which SSH is the only access.

All of the following was done with DevStack commit 984c3ae33b6a55f04a2d64ea9ffbe47e37e89953, which is roughly OpenStack Stein (3.18.0). Note, this version was in development at the time of writing.

Test system details:

  • 2 core
  • 4 GB RAM
  • 80 GB disk

I continue to search for or build a set of tools that can quickly setup and teardown VMs for testing purposes. Testing via VMs requires getting a base test image started, scripted configuration, kicking off the test sequence, and pulling the results. Manual steps are a no go. Speed is preferable, but it needs to work first.

Image Exploration

Ubuntu Cloud Image – Ubuntu Server the Easy Way

Conveniently, Canonical provides a cloud-ready set of Ubuntu images here: http://cloud-images.ubuntu.com/releases/

I haven’t found a smaller version of Ubuntu anywhere else.

The VMDK image, https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.vmdk, works. The image has a virtual size of 10GB.

The first attempt to start a server with the image fails on my test system — timeout in state downloading after ~3 min. The volume eventually gets created. If you remove the instance and volume and try again, the second attempt works in less than 10 seconds. I suspect the original loaded image gets cached for reuse.

Note that the above timeout issue happens for every image that gets loaded: the first attempt times out, second attempt completes quickly if the first volume was completed constructed.

Create Your Own Ubuntu Server Image

In some cases, the Ubuntu cloud image doesn’t quite do what you need. In that case it’s nice to be able to create your own images. This describes how to do that from the ground up.

There is a writeup of how to create an Ubuntu image suitable for OpenStack here: https://docs.openstack.org/image-guide/ubuntu-image.html

I think it’s noteworthy that they don’t recommend the OpenStack interface for doing this. I opted to do this through virt-manager and KVM.

The virt-sysprep step yielded an error for me due to a missing /var/lib/urandom/random-seed file. Running the following commands on the created volume clears that up:

mkdir /var/lib/urandom
touch /var/lib/urandom/random-seed

Another step missing from the above instructions is how to prepare an image to use cloud-init when cloud-init is already active. The important missing steps are:

  • If the volume has already been booted with cloud-init:
    • Either: run dpkg-reconfigure cloud-init and remove the NoCloud data source. You may also want to remove all other data sources that you know you won’t need.
    • OR: delete the /var/lib/cloud/seed directory so that local configuration data isn’t found. The seed data might be a useful reference, so I use dpkg-reconfigure.
    • Run cloud-init clean
  • Adding the users in the /etc/cloud/cloud.cfg 'users' section doesn’t change the default user for which passwords and SSH keys get set. Do this by changing system_info: default_user: name: …

Note that once all of the above steps are done, the image will not boot again without a configuration service. That makes it a little awkward if you want to adjust the image configuration or upgrade packages. So I did this in two stages:

  1. Install the OS and relevant applications.
  2. Copy image and do the cloud-init prep.

That way I can always easily get back into the original image.

Following the above OpenStack Ubuntu image guide results in an image with a virtual size of 6GB, which is the smallest disk size acceptable by Ubuntu 18.04 Server installer.

Use:

qemu-img convert -O qcow2 <input-volume> <output-volume>

to bring the image size down to less than 3GB.

Note that it is possible to boot from OpenStack volume snapshots, so for the purposes of creating a server, they’re indistinguishable from images. A custom configuration of an Ubuntu cloud image (turned volume) can be snapshotted to function similar to the above.

Create Your Own Server Image – Naive Way

This was the first way I attempted to create a usable Ubuntu Server image, but I mention it last because it was awkward and didn’t work as well for updating the image. However, if all you have is OpenStack, this will work fine.

Creating a new image from an Ubuntu installation ISO is straight-forward from the OpenStack Horizon UI. Go to Compute -> Images, click Create Image and follow the instructions. This has to be done with an admin account to make it a public image.

After that, create a server instance using the new ISO image, create a volume large enough for the installed base (>6GB for Ubuntu 18.04 Server), attach that new volume to the installer instance, and reboot it if necessary to make the volume appear on the instance. Then go through the installation process.

Detach the volume with the installed system, destroy the installation server, and create a server that boots directly from the Ubuntu 18.04 Server volume. This will allow us to do some direct configuration of the volume before turning it into an OpenStack image. cloud-init will have been installed and initially configured by the Ubuntu 18.04 Server installer. Follow the steps in the previous section to prep cloud-init to work with OpenStack on this new image.

Creating an image from a volume is fairly straight-forward. There is an option in the OpenStack UI, Upload to Image. Make sure to specify 6GB as the minimum volume disk size.

Note: If you don’t have enough storage to make any of the above work, OpenStack doesn’t give you much, if any, warning. Neither the UI nor the cinder upload-to-image command give any notice about running out
of space. The operation just silently fails. A little more on that below.

You might hold onto the original Ubuntu 18.04 Server installation volume because iteratively creating volumes from images and images from volumes seems to cause the sizes to grow. By keeping the original volume, you can go back to the original installation to apply package updates.

Create a Desktop Image

For testing of desktop applications, it’s helpful to have Ubuntu Desktop automatically log into a user account and have an autostart script kick off the testing job.

Auto-login via the UI settings is iffy when booting via OpenStack with server configuration enabled. Sometimes it goes through to the user desktop. Most of the time it just stops at the login prompt. You can verify that auto-login is enabled and the account that is set up to login by looking at /etc/gdm3/custom.conf. Looking through the boot logs, I find in auth.log that gdm-login is failing due to:

no password is available for user.

I chalked this up to a race condition between OpenStack server configuration and the desktop boot sequence. This answer suggests an alternative way to auto-login: https://askubuntu.com/questions/1136457/how-to-restart-gdm3-with-autologin-on-18-04-and-19-04

After many testing iterations, the suggestion looks solid.

The following steps will set up an Ubuntu Desktop cloud image (note, done on KVM through virt-manager):

  • Create a 9 GB volume (smallest allowable with desktop installer)
  • Install Ubuntu Desktop 18.04 on the volume.
  • Boot into the new desktop volume
  • apt install cloud-init
  • Update /etc/cloud/cloud.cfg to the user setup during installation system_info: default_user: name: …
  • vi /etc/gdm3/custom.conf
  • Add the lines in the [deamon] section:
    TimedLoginEnable = true
    TimedLogin = <user>
    TimedLoginDelay = 1
  • Shutdown VM
  • Use qemu-img command mentioned earlier to shrink the volume.

This creates an image of virtual size 9GB and real size between 6 and 7 GB. With cloud-init all of the boot configuration gets done. Note that the user to be auto-logged in must have a password set. Otherwise the auto-login feature will fail.

On my test OpenStack environment it takes 10-30 min to boot this image to the desktop, with volume cached. Directly on KVM (the system running OpenStack), it takes less than 2 min to boot to desktop. I’ll assume for now that the multi-level virtualization isn’t helping performance and revisit it when I’m working directly on metal. This might also have been causing problems with auto-login, but I’ll leave that alone since I prefer a system that works both virtualized and raw.

On-Demand Volume Creation/Mounting

I need a quick way to wrap up local data and make it available to the remote OpenStack servers.

For example: In a KVM setup, I can rsync local data to a remote directory on the KVM server and mount that directory read-only to the VM instances. OpenStack (or at least DevStack) doesn’t appear to have that capability by default — preferring to work in terms of images (glance service) and volumes (cinder service).

The OpenStack Horizon interface doesn’t allow mounting a volume read-only (or otherwise) to multiple servers. While I’ve seen some mention of multi-mounting (and possible complications), I’m going to assume that this isn’t the OpenStack design and plan to make a volume per server that requires it.

OpenStack volumes aren’t readily cloned, requiring a snapshot or image be made. So to go this route with a volume would require creating the initial volume, snapshotting, and then creating a new volume from the snapshot on each server boot. Images can be created directly via the API. Snapshots appear to need a volume for reference. That makes the image workflow a little simpler, so I’ll start with that.

Since we’ll be dealing in volumes and not synchronizing directories, we will need a way to create volumes on the local dev system, preferably without requiring root.

The GuestFish set of tools can be used to create a volume in user-mode on Linux. However, the running kernel will have to be readable by the user of the tools. A script added to /etc/kernel/postinst.d might take care of this. Access to /dev/kvm is also preferred.

This demonstrates how to make GuestFish work: http://libguestfs.org/guestfs-python.3.html. Code to make a volume from a directory, .tar, or .zip file may be forthcoming.

On a relatively light ultrabook without access to /dev/kvm, creating a qcow2-formatted EXT4 volume of any size with GuestFish takes at least 15s.

Pushing the volume data as a new image works well. Once there, volumes can be created from the images. This seems like a slight twisting of the intent of OpenStack images, but I like that it’s clear that this data SHOULD NOT CHANGE. Also, volumes work in terms of GB whereas images appear to be any size. That made me shy away from volumes as the purpose of this function is to push data on-demand.

Python API calls to create the image:

volume_name = "Chosen Volume Name"
volume_file = "qcow-formatted-volume.qcow2"
glance = glanceclient.client.Client( <server-info-and-login-credentials> )
new_image = glance.images.create(name=volume_name, container_format="bare", disk_format="qcow2", min_disk=1)
with open(volume_file) as f:
    glance.images.upload( new_image.id, f )

At this point, I’m satisfied that I can push data to OpenStack from a dev system.

An alternative, possibly lighter technique using shared filesystems, called Manila, can be found here: https://docs.openstack.org/manila/pike/install/post-install.html

Investigation of this will have to wait for another day.

Incremental Image/Volume Update

As noted earlier, rsync to a directory on the VM host makes for a quick way to do incremental updates to data that is to be mounted to VMs. It doesn’t look like OpenStack supports anything like this with images or volumes by default, requiring instead a complete re-upload of data.

If Manila for OpenStack works well, that might be an option.

Automated Server Creation

Arbitrary volume configurations can be passed to the server creation API (Python) novaclient.client.Client(…).servers.create( …, block_device_mapping_v2=… )

Note that you must pass a None image for this to take effect. I didn’t find any direct documentation on the format of the parameter, but the API code reveals a structure of at least:

# Note that the 'volume_size' parameter takes no effect if
# 'destination_type' is 'local'.
block_device_mapping_v2 = [
    {
        'uuid': image_id,
        'source_type': 'image',
        'destination_type': 'volume',
        'volume_size': required_disk_size,
        'boot_index': 0,
        'delete_on_termination': True
    },
    {
        'uuid': other_image_id,
        'source_type': 'image',
        'destination_type': 'volume',
        'volume_size': other_image_min_disk,
        'boot_index': 1,
        'delete_on_termination': True
    }
]

source_type and destination_type are described pretty well here: https://docs.openstack.org/nova/latest/user/block-device-mapping.html

Script-based Server Removal

When creating servers via Horizon, you can set a switch to remove volumes on server destruction, which does exactly as it suggests.

Removing a server via Horizon where the volumes have been set to delete_on_termination also removes the volumes.

This is also confirmed to work when removing servers via the API
(Python) novaclient.client.Client(…).servers.delete( server )

Problems in my OpenStack Test Environment

Loss of Service Interconnectivity

If the test VM IP changes for some reason, all OpenStack services are lost. Service information is stored as IPs throughout the config files. Changing the config files and restarting the node doesn’t seem to correct the problem. In fact, the config files in /etc don’t seem to have any impact at all. This problem is possible in my DHCP-based test setup, and it hit. Fixed IP is a must.

Server Creation Timeouts

First-time creation of any image of significant enough size times out (3 min), requiring the server to be destroyed and re-created. Since servers might be created with scripts, some volume cleanup may also be necessary. The sluggishness of volume creation could just be a problem with my system, but it seems like OpenStack might manage its timeouts better. Volumes that have been cached are created within seconds.

New Problems With OpenStack

Storage Leak

It appears that the OpenStack tools can fall out of sync with the volume backing. After several rounds of creating and destroying volumes, I find that I can’t create new volumes. After investigating the messages associated with the problem, it’s usually due to a lack of space. However, I can delete everything via the OpenStack UI and the problem still doesn’t get corrected. From what I can tell, there is still space available on the host.

Running sudo vgs revealed that I had virtually no free space in the logical volume groups.

Running sudo lvscan shows all of the volumes taking up my volume space. If I remove one manually with sudo lvremove, I can create volumes again for a little while. After a few rounds of this, I still end up unable to create volumes, and I just end up rebuilding the test node.

As a test, I created an OpenStack volume, observed it in the lvscan list, removed it, and saw that it got removed from the volume group. Some volumes get left behind. I suspect that those volumes are part of a cache. I would expect a cache to perform better than this in low resource circumstances.

Disappearing Services

The stack test node has hit what looked like low-memory conditions and started swapping. I let the system go for a while and returned to remove the active servers and volumes. It initially appeared to be functioning again.

However, I didn’t realize that the configuration service was either gone or malfunctioning, so no new servers were going to get any bootup configuration data. This was particularly unfortunate because I was experimenting with getting a custom image configured.

Out-of-Space Errors

The messages presented via Horizon regarding why an operation failed are not the most helpful. Most of the time you are at least told there is an error with an operation. The messages tend not to point out the cause, but at least you have a point from which to dig. In some cases, such as uploading an image from a volume, the operation will just silently fail and then you’re left digging through logs to try to find out what happened.

Granted, a production environment would be independently monitoring available system storage, but a little more robustness around error reporting would be helpful.