What does it mean to build code? Why are dependency conflicts even a thing? What are build systems made of? Is there a perfect build-system? What are diamonds doing in here?
You may have read what’s a build system, which I wrote before getting into illustrating certain of my ideas with simple diagrams. If you’re more of a visual person, this post is for you.
Who Uses Build Systems, Anyway?
The short answer is everyone: build systems are interesting because they are practically everywhere in the computing world. What is programming essentially defines programming as fiddling with the inputs of a special program until it spits out something that behaves as you want it to.
That special program generally is a build system, which is much more than a mere compiler, as we’re about to see.
Illustrating Build Systems
What’s a program (again?)
Let’s just define as a program something that reads an input and gives us an output:
As mentioned above, a build system is a particular kind of program that, taking your input, will output something that can be run and has a particular behavior: the build system spits out other programs.
If we zoom in a little bit, we’ll see that it has the following parts:
The inputs are:
- your code
- dependencies your code needs
- some configuration
The program itself is (generally) composed of one or multiple:
- dependency manager(s)
- compiler(s)
For the end-user, dependencies (and how to manage and retrieve them) are one of the most important aspects of what the build-system does for you1.
Why do we even need to deal with dependencies? Well, programming generally – hopefully! – involves the re-use of existing code and other programs: even the most motivated developper does not have the time to re-invent everything. And even then, they might want to re-use some of the things they already built themselves.
The compiler itself has no clue as to where to find these existing things, and this is where the dependency manager comes into play.
The build system’s main job, then, is to properly orchestrate the dependency manager and the compiler, so that the latter always has what it needs available when compiling your code2.
About Trees – Graphs All The Way Down
Let’s get to a less obvious aspect of build systems: graphs. More specifically, action graphs and dependency trees.
From Dependency Trees…
The stuff your code depends on, dependencies, can be seen as a directed edge in a graph between a node that represents your code, and one or multiple other nodes that represent the code or packages you depend on. In the context of dependencies, we generally speak of a depedendency tree that has your code as the root:
The tree above simply says “your code has dependencies on packages A and B”.
It is important to note that the things you depend on will themselves depend on other things as well! This seems like a minor side-note, but it actually is the sources of most of the problems you’ll encounter with build systems. These dependencies of dependencies are called transitive dependencies:
Above, the tree tells us that our dependency on package A additionally makes us depend on packages x and σ. Similarly for the transitive dependencies introduced by package B: y, z, γ. The distinction between direct and transitive (or indirect) dependencies is more than a matter of naming: direct dependencies are the ones you can control, whereas you have no say in what the transitive dependencies will be – unless you happen to be responsible for them. This notion becomes important when dealing with dependency conflicts, a subject that will be kept for another day.
…To Action Graphs
Action graphs are mentioned less often than dependency trees, but by taking a step back you’ll see that they are pretty similar. How so?
Given the example above, let us go through the steps that your build system needs to take in order to make something useful out of your code:
- Analyze the direct dependencies
- Download them along with all their transitive dependencies
- Compile your code
- Package your compiled code and its dependencies into a single useful binary.
The above involve some actions (analyzing, downloading, compiling, packaging) along with dependencies between these actions, for there won’t be anythng to package if nothing was compiled or downloaded. Eg, step 3 depends on the output of step 2, while step 4 depends on both the output of steps 2 and 3.
Putting this into visual form:
The above isn’t a tree, but a graph. An important aspect of such a graph is that it is directed and acyclic3, which is what allows you to walk through the nodes in a way that lets you reach the final desired output after having fulfilled everything it depends on.
In more complex projects, “your code” will itself likely be split across several modules that have their own inter-dependencies, and turning it into something useful will involve a more complex action graph, such as running tests or generating code, for example.
Here, A is what you are trying to build, and the other nodes are what you are implicitly depending on: it is your build system’s job to execute the individual actions in the proper order (indicated via the red numbers) in order to provide you with the required output.
It is essential to remember this representation of things; it will let you understand most of what a build system does, for dependencies may not only be declared towards existing code, but towards other actions.
So, to sum it up:
The build system is what takes the necessary actions, in the correct order, to produce your desired output. It does so by walking through a directed and acyclical action graph3.
If the topic of build systems and dependency managers is of interest to you and you like to understand things through a visual form, you can check out my Build Systems Illustrated guide, which offers a concise introduction to build-engineering related topics.
Happy building!
-
Yet this aspect is conspicuously absent from a lot of programming teachings. If I ever organise a course about programming it will be called the logistics of code and will cover everything but programming itself… ↩︎
-
Note that this also applies to non-compiled languages. As a developper you’re probably interested in having everything that’s required to be available at runtime, and this too will generally be handled by your build system. ↩︎
-
You will often run into the terms directed and acyclic when dealing with build systems, generally packaged in the acronym DAG for Directed Acyclic Graph. ↩︎