Tag Archives: basics

Basics: Revision Control

Engineering disciplines have been using revision control techniques to manage changes to their documents for decades if not centuries.

With computers we get new, automated, and more comprehensive techniques for doing this.

Advantages

You have a history of changes. Where did that file go? Look through the history. What changed between this revision and the last? Some revision control tools will show you exactly what changed.

They can coincide with and support your backups, some completely passive. Depending on the frequency with which you do backups, you may already effectively have a form of revision control, though it might be difficult to get some of the related features.

For engineers and other professions, it’s sometimes incredibly important to know what you were seeing at a particular point in time. Some revision control tools give you exactly that.

Disadvantages

Space. Recording revisions usually requires extra disk space. Disk space is relatively cheap, and most revision control systems are far more effective than recording multiple full copies of the same file(s).

Complexity. Controlling revisions isn’t as simple as just writing contiguous bytes to disk. Unfortunately, filesystems aren’t generally as simple as just writing contiguous bytes to disk either. Computing capabilities are solid enough that this is usually a minor concern.

Classification

There are probably two broad categories into which revision control tools can be divided, passive and active. Each has its own advantages and disadvantages.

Passive

As the heading suggests, these revision control systems require minimal interaction to make them do their job.

Some passive revision control systems include:

  1. Dropbox – Look at the web UI. You can easily find older revisions of files that you’ve stored.
  2. ownCloud – This is an open source, self-hosted web tool similar to DropBox. It’s supported on all major operating systems and has apps for every major mobile OS. I use this at home.
  3. Apple Time Machine – Apple provides a way to periodically backup to a secondary drive and provides the revision history of those files. There are similar tools for Windows.
  4. Copy-on-Write Filesystems (CoW) – Several filesystems offer revisioning as a core capability based on their underlying data model.

Most of these will not record every change but will instead catch changes that fall below some roughly-defined frequency. Revision recording throughput is affected by the number of files that have changed and the total amount of extra data that would need to be stored. Because these snapshots are taken without user intervention, there’s really no chance to augment the changes with additional information or to group related changes in meaningful ways.

However, with just a little setup — pointing at a server or drive, entering account credentials, choosing a directory to sync — you can rest assured that changes made to your files will be recorded and available in the event of emergency or curiosity.

I believe every computer shipped should come with some form of passive revision controlling backup system out-of-the-box.

Active

Active revision control systems offer much more capability at the expense of learning curve. However, there’s simply no better way of working on digital (is there any other?) projects.

Some active revision control systems include:

  1. Subversion – an old favorite, but slowing deferring to
  2. git – distributed tool that’s consuming the world
  3. cvs – “ancient” predecessor to subversion

This list is far from exhaustive. I can think of at least five or six others off the top of my head, but I don’t think any others have nearly the significance today.

The features that the various active revision control tools offer are vastly varied, but they all provide a core set of functionality that distinguish them from the passive revision control tools.

  • Explicit commits. Every bit of work committed to the revision control system is done so explicitly. Specific changes can usually be grouped. Comments and other meta-data can be included with the commits to provide extra context for the change.
  • Change diffs. Every modification made can be compared to the previous version and changes between versions can be viewed.
  • View history. Every commit and meta-data ever made can be listed.
  • Checkout previous revisions. It can often be helpful to look back in time to find out why a problem didn’t seem to exist in the past or to determine when it was introduced. In rarer circumstances, you might want to know why a problem seemed to disappear.
  • Revert commits or to a previous revision. Sometimes changes were committed that were ultimately detrimental and should be removed.
  • Multi-user commits. Virtually all active revision control systems support accepting work from multiple users with techniques for merging changes that can’t be trivially combined.

Like the passive revision control systems, not all active revision control systems are also backups. In most cases you would need to take extra steps to backup the revision control system. Pairing an active revision control system with a passive revision control system could be a way to do this.

Few, if any, of the active revision control systems handle binary data well. They can usually be handled, but the efficiency of storage might be lacking and the diff capability is usually absent. This might be their single largest weakness.

No significant (or insignificant) project should be started without one of these revision control tools, and project tools should be structured in a way that allows independent, verifiable revision control.

Visualization

Most of these tools don’t seem to provide much at-a-glance functionality, and I think it’s really useful to have. The things I’m most interested in seeing:

  1. Have any files been modified (active)?
  2. Are there files that could be lost (active)?
  3. Are there any upstream changes that aren’t synced (active, especially useful for multi-user projects)?
  4. Are there any local files that haven’t been recorded (passive)?

For the active revision control questions, tools like Github Desktop for Mac and Windows, TortoiseGit and TortoiseSVN for Windows, and RabbitVCS integration for Nautilus on Linux might do the trick. Some active revision control systems provide these features out-of-the-box, but they tend to be pricey.

On (4), I’ve not seen a passive system that provides this information. It seems like it might be useful to know if all local files have synced before shutting down for a while. I’ll keep my eye out for this.

For those with a bash habit, I have a version of ls that provides (1) and (2) above. I plan to make this available shortly.