Category Archives: Quality

Creating with Quality

I read a book not too long ago, Lila by Robert Pirsig, where the author describes a system for organizing information and performing work. In it he describes two parts, a change agent and a lock-in mechanism. The change agent could be anything — a person or any kind of random interaction — anything that can bump the system into a new state based on a set of rules. The lock-in mechanism is the way that changes get stored and checked for usefulness. He likens this to a ratchet where a little work can be done, checked, and stored in a state where you can leave and return to make more progress at a later time.

This applies to virtually all creation. The universe is a soup of particles being tossed about operating under a strict set of rules:

  • Assuming for discussion that quantum is the base
  • Quantum wave/particle interaction yields a stable atomic system
  • Atomic interactions yield a chemical system
  • Chemical interactions yield a protein system
  • Protein interactions yield a DNA system
  • DNA manipulations yield codified social systems

I may have skipped some steps, but I think it can be seen that each layer rests on the foundations of the previous layer. Each of these systems are subject to changes from various sources. Each of these layers has a mechanism for storing and/or replicating those changes to be acted on at a later time. DNA is an incredibly rich system, but it’s nothing compared to the level at which we are/will-be operating intellectually.

Every factory comes from a blueprint and list of processes for creation. Stores and shops facilitate resource distribution, and offices are home to countless business-value processes. These are all improving regularly, sometimes like clockwork with a predictable pace.

If you have a system that allows you to make changes and check them against all the expectations of the system, you can very quickly deliver new features with confidence.

This model easily applies to software development. In fact, we explicitly structure projects in this manner. The engineers and designers are the change agents for obvious reasons. The revision control, build, and test systems are the lock-in mechanism. The software power-shops have their lock-in time frame down to hours, if that. They can make changes and push them to customers and users extremely rapidly. The trick to making this work well is a high fidelity ratchet — your tests.

Completeness is important in your testing. Missing and delivering problems can be like slipping a tooth in your ratchet. If enough teeth are slipping, you’ll find it gets very difficult to advance as the expectations of your system grow.

Frequently you’ll find that you need to improve the ratchet itself. Less work per stored change (architecture) or less verification time (testing overhead) will bring the ratchet cycle time down, ultimately making you more productive — and by much more than the reduced overhead. When an idle 15 minutes could turn into a product 15 minutes, the frequent little boosts add up.

I personally prefer to be able to do something useful and verifiable within 30-60 minutes. If that’s not easily doable, I tend to put time into my ratchet.

The small-cycle ratcheting technique can apply to other types of creative work as well. Model simulation and rapid prototyping techniques are quickly turning relatively complex manufacturing into an everyman’s game. I think we’ll be better off for it.

Agile vs. Formal Management Systems

I have spent some time recently navigating the differences between Agile Methods and development under a Formal Management System, ISO 13485.

I’ve spent a lot of time in and leading Scrum teams. While no group I’ve ever worked with felt like they were doing “real” Scrum, they all got the basic gist of small work increments with frequent updates among the team and stakeholders. I think this model has been useful mostly because the developers feel like they know what’s happening (or they have little excuse if they don’t), and the stakeholders feel similarly in the loop. Problems get recognized quickly and prioritizing of those vs. other goals happens regularly. Thus a project doesn’t go too far off the rails before a correction occurs.

Scrum recognizes that the requirements, and therefore design and testing, are never completely established until the stakeholder accepts the work. Up until that point everything is considered to be in flux to some extent.

Formal Management Systems on the other hand are concerned with demonstrating effectiveness. If something isn’t written down, it doesn’t exist. ISO 13485 is particularly all-encompassing with something to say about virtually every aspect of product development.

ISO 13485 (and others) also breaks development down into the phases: requirements, design, verification, validation, manufacture (it’s a manufacturing standard) with review and approval steps for each. The formality of this sequence reads like a traditional waterfall development process, and might seem jarring to an agile development group whose core tenets include “People before Process”.

One of the largest distinctions is that Agile Methods tend to be adopted voluntarily by engineering teams because they appear to provide benefits in terms of team flexibility and delivery throughput to customers. On the other hand, the only instances of Formal Management System adoption that I’ve seen are when an organization wants to demonstrate to another organization that they are using such a system. This might skew the organization motivations a bit and will almost certainly feel like a burden being placed on the development teams.

Does it need to be a burden? That’s what I’ve been working on.

Before I continue, note that I can’t guarantee anything here would be accepted by a Formal Management System audit. I’m not an expert on the interpretation of these standards (though I am certified “Competent” on ISO 13485:2016, for what that’s worth). Use your best judgment when implementing your system. Based on observation, auditors aren’t too particular about the techniques you use to achieve the requirements as long as you can justify your decisions to yourself and third parties.

Documentation

First of all, everything needs to be written down. Most engineering organizations have a wiki or other place to store written documents. If it has revision history and backups, you have met most of the document control requirements. In addition you will need an approval system and a way to observe which versions are approved and which is current.

Historically this requirement would have implied a lot more paper pushing. With electronic systems, this is little more difficult than other aspects of software engineering.

Training

In order to demonstrate that your team is competent to do the work they are doing, you’ll need to keep records of qualifications and training. Resumes are usually available on interview. Then it’s up to management to keep track of internal training and experience. Fortunately, this could be fairly light on the team depending on how you define competence and required training.

Most agile teams rely on a combination of review and regression testing to determine when things are going wrong. Training new employees usually consists of a “Getting Started” document, asking questions of the team, and relying on the review-and-test system to raise alarms when things are wrong. I believe writing this up as the process might be acceptable to a Formal Management System as long as you can show that you understand the risks.

On the other hand, you could introduce new training software that would be more at home in a multi-national corporation, drag your teams through reams of mundane slide-show courses, and definitively show that your organization meets the requirements. I think it’s clear which might grate the company culture, but this usually has the advantage of being readily demonstrable.

Requirements

ISO 13485 treats requirements gathering as a phase, but it allows for modifications provided they go through a review and approval process. Requirements modification is the normal state for an Agile team, so this could be made to fit an Agile development model.

Many Agile teams use Epics. We use Epics to represent our features and then break them down into Stories, verifiable product-change increments. Early in the Epic life-cycle, we collect a set of written requirements for the feature represented by the Epic. Thus, a list of Epics that are intended to be released is also a list of requirements.

Risk Management

This concept in the context of project management and engineering is relatively new to me. The idea of risk we have been implicitly using forever, but I had never considered making it concrete to better communicate the status of the development process and product to stakeholders parties. In retrospect, that feels like a big oversight.

Risk Management is a requirement of many Formal Management Systems, so there isn’t going to be a way around it. Given that, introducing Risk Management to our Agile development necessarily changes how our teams do things — but it can be Agile, and it definitely adds something valuable.

We implemented risk tracking by creating a new Risk ticket type in our ticket tracking system. Risks, at the moment, are analogous to Bugs that haven’t been observed. Review these periodically and raise the particularly bad risks to management, and you should have a system that conforms to at least ISO 13485.

We rolled Risk evaluation into our regular Bug evaluation cycle.

Design

Design is treated similarly to requirements and also allows for update given subsequent review and approval. Once again, design update is standard operating procedure for Agile Methods, so relevant documentation should lean on this heavily.

The collection of Stories and Risks that were produced from the Epic breakdown above are effectively our design. They list what needs to be changed, supporting documentation, how those changes should be verified, what Risks were addressed, and (more importantly) which Risks weren’t.

Manufacture

Most Agile software shops are going to be running a build server in either Continuous Integration mode. Most teams I’ve worked with have done daily full test cycles with continuous unit test cycles throughout the day.

Assuming your scripts are in good shape, this is by definition a highly controlled process. It should be relatively easy to document. Make sure you double-check that you’re verifying what you should and keep records of test results.

Review

I have been a fan of and have used lightweight changelist review for years. From over-the-shoulder to pull requests, it is a staple of Agile development. I think this fulfills a form of
manufacturing monitoring.

Review is a strength of Scrum and Agile Methods in general. Engineering teams meet and discuss issues and design changes (if necessary) daily. Stakeholders meet to discuss potential requirements regularly.

All of these group activities should satisfy the review requirements. Make sure to note that the meetings are happening, and who is present, especially when design and requirements changes are being made.

Approval

One aspect of Formal Management Systems that may seem alien to Agile teams, or at least ours, is the requirement for approval: approval of documentation, approval of requirements, approval of design, approval of release.

However, it didn’t initially occur to us that, by keeping all groups in the loop on a regular basis, we are effectively getting implied approval regularly.

How this is relevant to ISO 13485: approval of requirements changes – handled by an internal stakeholder meeting; approval of design changes – handled by the Product Owner Scrum role; release approval – handled by the internal stakeholder meeting.

The area where we haven’t been strong on approval is changes to internal procedures. Traditionally we have trusted that our team would do the correct thing when making updates and relied on the change history when things didn’t look right. After all, if someone was doing something wrong, they weren’t likely to get far without tripping over the build process.

Summary

This approach to Formal Management Systems may join with Agile Methods to make a very potent combination. The responsiveness of the company, team, and product is maintained without too much impact to the cadence of the teams.

The auditors will ultimately tell us whether this approach is acceptable, but it seems reasonable.

Why I Don’t Use Debuggers

Other notable authors have already written good posts about this topic, but I was recently encouraged to do so as well.

First, the title isn’t entirely true. I’ll break out gdb when I need to get the backtrace of a native application crash. I’ll do the equivalent for any runtime that doesn’t provide information about the method or function that produced the exception. However, I otherwise avoid them.

Debuggers make it possible to make sense of larger sets of code than you might otherwise be able to. This is helpful, but it can lead you into believing you can deal with more complexity than you can. Debuggers are a crutch that get you past some of your limitations, but when the limitations of the debuggers are reached, you may find yourself in a briar patch.

Complications

Threading and multiple processes

Stepping through multiple threads can be a bear. Threading and multi-processing in general can be dangerous to your health. I prefer concurrency models that isolate to an extreme degree any concurrency primitives. I’ve not tried but I’ve heard good things about Threading Building Blocks.

Runtime Data Models

Investigating data in a debugger may require some familiarity with runtime representations of data structures rather than interacting with your data structures via familiar interfaces.

Additionally, the runtime data structures typically don’t have standard implementations, so they can change from version to version without warning. In fact, depending on the goals of the implementation, the underlying form of the data could change run-to-run or intra-run. I tend to prefer to rely on the documented interfaces.

On the other hand, debuggers provide a good way to get familiar with runtime data structures.

Complex Failure Conditions

When using debuggers in my more youthful days, I found that complicated issues were easier to track down if some code changes were made to provide breakpoints under odd conditions. This seemed antithetical to the purpose of a debugger. Maybe I was doing it wrong…

Not Recorded

I’ve never seen anyone record a debugging session such that they could return to a particular debugger state. I’m sure it could be done, but I don’t know how valid that technique would be if small modifications were made to the original software and retried.

Debugger Unavailability

In some rare circumstances, debuggers aren’t available for the environment you’re using. If you haven’t developed any other techniques for finding problems, you may be stuck.

Preferred Techniques

Unit Testing

I cannot stress this enough: unit testing, done well, forces you to break your code into smaller, functional units. Small functional units are key to solid design. Think about the standard libraries in the language(s) you use: the standalone functionality, the complete independence of the operation of your code, the ease with which one could verify their correct functioning. That’s how you should be designing.

Note that once you know the method/function in which the fault happened, you’ve narrowed the problem down significantly. If you did a good job at functional decomposition, you’re typically only a few steps from the source of the problem. If not, judicious application of the other techniques will tease it out.

If you find that the behavior of your module is too complicated to easily write unit tests, that may be a sign that your module is too big. On rare occassion the input domain is so rich with varying behaviors that directly testing the interesting input combinations is impractical. Those can be managed with more advanced testing techniques, see: QuickCheck.

Invariant Checks

When code separation proves to be difficult because of a lack of regression testing around the parts that you’re changing, invariant checks are a technique that can be used as a temporary shim. These are tests that run inline with the code of interest to check conditions usually before and after a method/function call or a block of code.

The invariant code can form some of the foundational functions when you do eventually get around to creating the unit tests for your new module.

Print/Log Debugging

The dreaded printf() debugging! This isn’t ideal, but it can give some quick info about a problem without much fuss. If you find you’re adding hundreds of printf()’s to track something down, I might suggest that you’re using some of the other techniques inadequately. If this is the position you’re in, a debugger might actually be a benefit.

Note that, again, all code written to try to weed out your problem may be valuable for one or more tests.

I’ve also used ring buffers embedded in devices that store the last 1000 or so interesting events so that when an obscure failure happens, there exists some record of what the software was doing at the time.

Where Debuggers Shine

Compiler debugging. This is an absolute pain without debuggers, and I wouldn’t recommend it.

Heisenbugs: Those bugs that disappear with any change to the code. Then a debugger is about the only way to attempt to get a look at it. Those bugs usually warrant pulling out all the stops. Fortunately, many of the newer languages have entirely eliminated these bugs. Good riddance.

Other Tools

I do appreciate other tools like code checkers and profilers. They usually work without much input and communicate their results in terms of the language they’re checking. I’m a fan of this model.

A seemingly close relative of debuggers, the REPL tools, look promising. I’ve never used them, but they look like they operate almost entirely in terms of the language that they’re debugging. I’m a fan of this model.

Summary

I prefer debugging techniques that produce additional useful work and provide support for refactoring if the situation warrants it. Every bug potentially reveals a need to re-implement or re-design. Debuggers feel more like patchwork.