Principled code

Have you ever encountered source code underpinned by a series of principles? I’m not talking about agile principles guiding the way the code was developed, or even software craftmanship, which extends agile on a few points. I’m not even talking about ethical principles, such as how the code is licensed or how development is funded. I’m talking about principles that run through the code itself.

I’ve tried to make my own code principled, but let’s take a look at a really excellent example of principled code.

bindgen

I encountered this principled codebase a few years back. The underlying principles jump right out of the code overview. What’s bindgen you may ask?

The opening sentence of the code overview answers that question. It’s a complete, high-level description of what bindgen does:

bindgen takes C and C++ header files as input and generates corresponding Rust #[repr(C)] type definitions and extern foreign function declarations.

But the code overview soon helps readers understand the high-level structure of the code:

We walk over libclang's AST and construct our own internal representation (IR). The ir module and submodules (src/ir/*) contain the IR type definitions and libclang AST into IR parsing code.

[...] 

After constructing the IR, we run a series of analyses on it. These analyses do everything from allocate logical bitfields into physical units, compute for which types we can #[derive(Debug)], to determining which implicit template parameters a given type uses. The analyses are defined in src/ir/analysis/*. They are implemented as fixed-point algorithms, using the ir::analysis::MonotoneFramework trait.

The doc comments for the ir::analysis::MonotoneFramework give a general introduction to the fixed-point approach:

### Fix-point analyses on the IR using the "monotone framework".

A lattice is a set with a partial ordering between elements, where there is a single least upper bound and a single greatest least bound for every subset. We are dealing with finite lattices, which means that it has a finite number of elements, and it follows that there exists a single top and a single bottom member of the lattice. For example, the power set of a finite set forms a finite lattice where partial ordering is defined by set inclusion, that is a <= b if a is a subset of b. Here is the finite lattice constructed from the set {0,1,2}:

                   .----- Top = {0,1,2} -----.
                  /            |              \
                 /             |               \
                /              |                \
             {0,1} -------.  {0,2}  .--------- {1,2}
               |           \ /   \ /             |
               |            /     \              |
               |           / \   / \             |
              {0} --------'   {1}   `---------- {2}
                \              |                /
                 \             |               /
                  \            |              /
                   `------ Bottom = {} ------'

A monotone function f is a function where if x <= y, then it holds that f(x) <= f(y). It should be clear that running a monotone function to a fix-point on a finite lattice will always terminate: f can only "move" along the lattice in a single direction, and therefore can only either find a fix-point in the middle of the lattice or continue to the top or bottom depending if it is ascending or descending the lattice respectively.

For a deeper introduction to the general form of this kind of analysis, see [Static Program Analysis by Anders Møller and Michael I. Schwartzbach][spa].

[spa]: https://cs.au.dk/~amoeller/spa/spa.pdf

What I love about all this is that the code is based on some clear principles and the authors have made it possible for sufficiently motivated readers to understand those principles.

It’s not an easy codebase to understand. I added a small feature to it and it took a while to get up to speed (with help from some of the contributors). But the longer I spent with the code, the better I understood it.

Other examples

If you’d like simpler and smaller example of principled code, take a look at Docker/OCI image relocation, which I developed. The principles here derive from Docker/OCI and how this interface is supported by the container registry library that the image relocation code depends on.

Or Tomcat - a much larger codebase, still based on clear principles. Some of these principles, such as filters, derive directly from the servlet specification whereas others, such as engines and valves, are implementation choices.

Then there’s Eclipse Virgo, which I worked on, with various concepts (principles) notably its deployment pipeline, shown below.

The pipeline, and each pipeline stage, accepts a tree of install artifacts as input and outputs a possibly modified tree.

where install artifacts includes “OSGi bundles” (think JAR files defining self-contained modules of Java code that can import and export packages between them). The definitions of pipelines and pipeline stages make it easy to string pipelines together and to nest pipelines inside pipeline stages, as shown in the diagram above.

Another example, which I’ve mentioned before in this blog, is the Cloud Foundry Java Client – a Java client for the Cloud Foundry (CF) API. It was led by Ben Hale who set and maintained an extremely high level of coding standards and uniformity. The code itself was a fascinating application of Project Reactor in the functional reactive programming style. I have often described the code as “crystalline” in structure. The structure derived from the repeating patterns in the CF API consistently accessed using Reactor primitives. The principles underlying this codebase were, in a sense, superficial in that they derived from the principles underlying the CF API and those underlying Reactor. But the result was maintainable code where it was relatively easy to implement new features in the CF API.

If you know of more examples, I’d love to hear from you.

Kinds of principles

Principles may be vague or clear, implicit or explicit. Indeed there may be a spectrum from vague/implicit to clear/explicit.

Principles may also be informal (just in the implementer’s head or written down informally), formal (carefully worked out in writing, perhaps even with mathematical properties), or reified¹ in code (as types, subroutines, macros, or whatever is available in the programming language being used).

We can think of this in two-dimensions:

vague/implicit -> clear/explicit on the X-axis, informal -> reified on the Y-axis

The kind of principled code I’ve been discussing ideally has explicit principles which are reified in code. It’s much harder to accidentally overlook those kinds of principles, so the code tends to be easier to maintain.

How to write principled code

To write principled code, you first need to identify and clarify any principles in the problem domain. For example, in the Docker/OCI image relocation project - Docker/OCI is based on a pretty good set of principles, but clarifying the details of those principles was one of the contributions of the project.

Separately from the problem domain, you also have to decide what principles you’ll use to structure the implementation. The general idea is that any piece of code should be understandable at a higher-level. For instance, the codebase might be split into modules and types (or something similar, depending on the programming language) and developers need to know the purpose of each module and type and how these fit together to make the whole.

Another approach is to pay attention to the code structure and the way it fits together and then to reify that structure in modules, types, etc.

Some developers like to defer introducing abstractions until it is clear that they will pull their weight. I suspect when abstractions represent principles in the problem domain, they are more likely to pull their weight.

Determining the principles underlying an implementation is trickier as there is usually more than one way of slicing and dicing any given piece of code. But once an arrangement has been chosen and seems to be working well, I think it’s possible to step back and observe principles underpinning the arrangement.

Conclusion

That’s my view on what makes code principled and how to improve code to make it more principled. What’s your perspective?

Footnotes:

¹ To reify means to make an abstract concept concrete. In the context of software, a concept is reified if it is represented as some construct in a programming language.

← Previous
Guest book
Next →
Advent of blogs half done