How to Start Every Project: Build Tools
This post is the start of a series I’m calling “What School Doesn't Teach You”, where we dive into topics that typically aren't covered in a university education for computer science. Below we'll dive into Build tools, their history and purpose, what they're good for, and why you should cultivate mastery over your build system.
The Problem
At the beginning of our programming careers, we all find ourselves running arcane strings of commands to build our code. For me, this meant running `javac` directly, then figuring out how to set up the `java` command to use all of my output class files. Obviously this is a huge pain.
Maybe you started differently, working on existing projects. This could mean you just press the build button in an IDE. Possibly someone had documented how to build the code with a long document, or maybe a coworker just let you know "Hey, just run `make`". This sucks a little less, but you have no idea what's going on for the minutes to hours this process takes. You probably remember learning about compilers in school, and maybe even some details about how compilers work, but these tools seem to be doing a lot of things you never learned about.
In my experience, most developers don't know very much about how their code is actually built: the tool they're using, what it does, how it works, or really even how to use it.
This all leads to at least a few questions...
History
Build automation started before build tools were created. For companies like IBM working in the early software industry, the most common way of building software were shell scripts included with the source code. The shell scripts were primarily for creating a final build artifact in a consistent way, and for installing that artifact onto the system.
For developers, this posed some problems that may sound familiar. If they were working on a large software project, but only needed to change one file, why should they wait to rebuild everything? Why not simply recompile the one file they've edited and re-test? This is a perfectly fine idea! So much time saved.
But one day, you've got the worst bug. You've been working through it all day. You know it should be fixed, you're looking at everything and it all seems ok. Then you see it. The modified time on compiled code you fixed hasn't been updated because you never compiled the change!
This is exactly the situation Stuart Feldman found himself (and his colleagues) in at Bell Labs in 1976. He created `Make` one weekend as a simple way of identifying which files needed to be rebuilt, what files are needed to rebuild it, what order to build those files in, and how to actually build those outputs. Thus (as far as I can tell), the first build tool was born.
What are Build Tools?
Learning from `Make` can tell us a lot about what build tools do. The primary purpose of a build tool is to create a packaged, deployable artifact. Beyond that, the goal is to do so efficiently and repeatedly by understanding the dependency graph of the components of the software and the steps to build the software. So at it's core, a build tool can:
Compile code
Identify changed files to only build what is needed
Understand the dependencies between different elements (either pieces of code, or even phases of the build) to only build what is needed
There are exceptions to all of these, for example most build tools let you completely skip the check for changed files (2) specifically to produce a clean build from the entire project source.
From there, build tools can have a ton of additional features. Just a few examples include:
running tests
retrieve and managing remote packages to use as dependencies
deploying code to various environments
generating code from non-code sources
automatic bug and vulnerability detection
parallelizing commands
documentation generation
configuration management
arbitrary command execution
Not all build tools have all of these features, many of them have more. Most build tools have plugin or extension mechanisms or even features to run arbitrary commands, which makes them quite flexible.
Some build tools feature more scripted elements, giving users even more expression with their build system. In large organizations or complex projects this could revert back entirely to sets of custom shell scripts; though this may sound negative, if a large team specifically has a group focused on the build system, it is entirely reasonable to use custom build scripts to successfully meet the goals of the overall project. At smaller scales, most teams benefit from staying close to the publicly available documentation of large, well used build tools.
Do I need it?
The short answer is that you would probably benefit from a build tool if you aren't already using one. More than that, I'd argue basically every developer would benefit from learning more about their build tool.
There are definitely cases where someone might not need a build tool. The fewer times a project will be built, and the fewer people working on it, the less likely you will benefit from a build tool. Scientists and analysts frequently run code on a known environment a relatively low number of times, and work on their own. For exploratory analysis, things like Jupyter Notebooks or R Studio make tons of sense, and there's no need to get a build tool involved.
For most other cases, even with a single developer, some type of build automation system is going to help make your life a little easier. Build tools can protect you from yourself by running tests whenever a build is attempted. You can ensure your new code agrees with your past (or current) expectations. It can help you work more efficiently, allowing you to focus on what really matters.
What else does it do?
Enables Teams
Build tools are nearly a prerequisite for Continuous Integration (CI) or Continuous Deployment (CD), where changes from all developers are regularly tested together (CI) and potentially released to various shared environments (including production!) automatically (CD). These same automation tools can be used to keep a history of built artifacts with various different change sets which can be invaluable for testing and debugging.
Build tools also create a rational environment for including code from other developers. Their dependency management mechanisms help control the chaos of constantly changing code from various other teams, groups, and companies.
Most importantly, using a build tool typically improves the "bus factor" of teams as they will no longer have to rely on a single person to understand their build (this can still happen using build tools... one of the reasons I'm writing this article).
Ensures Quality
Because build tools make it so easy to run tests, and to prevent bad builds and releases from even being created, they are critical to ensuring high quality software. Because the tedious tasks are removed from the process, human errors are far less likely. Because the process runs faster than a normal build, it can be run more often; this means more time running tests and a faster cycle to know a given code change is working. The famous "Joel Test" features a number of items that are solved by build automation tools, primarily "Making a build in one step", "Making daily builds", and "Using the best tools money can buy". If you interview at a company and people are not using a build tool, you may want to be a bit concerned.
Build quality is an issue that even large companies have failed at; in 2012 LinkedIn released non-automated builds that caused site failures, and separately a financial firm went bankrupt in 45 minutes, losing $460 million because of a non-automated software build.
Expands Capabilities
Without using a build tool, understanding how to include code published by another group of people can be a tedious problem to solve. Perhaps you have to download the artifact first, store it somewhere, and be sure to include it at compilation time, runtime, or both. Build tools typically automate this problem out of existence. Simply by knowing what package you'd like to include, the tool can typically download it, manage it's location and lifecycle, retrieve new versions when available, include them in a final build, package them for deployment, and include them at runtime. This *Dependency Management* process is as key way to leverage existing code.
Amongst the many types of dependencies a project can have, Frameworks typically have an opinionated build system for projects using that framework. If you'd like to use Play Framework for example, you will most likely want to use SBT as that is the tool supported by Play Framework.
Code Generation is another leverage point and can be used in numerous different ways. Various data serialization mechanisms represent data with some type of structure markup language. Code generation tools can turn this markup into code in your language of choice allowing more expressive interaction with the data. Various software frameworks integrate with build tools to scaffold new components or whole projects. All of these can dramatically reduce the amount of boiler-plate code that must be written, and thus decrease the time to complete a project.
How do build tools change the way I work?
Testing
Build tools enable developers to run tests easily and frequently. Typically build tools will prevent artifacts from being created if a testing phase fails. Many tools allow automatic re-testing of affected files since the last change. This creates an environment where developers can iterate quickly on changes with confidence that they haven't caused other problems. Assuming the tests are well designed to be useful to the developers and representative of real production environments, this can dramatically decrease how long it takes to add a feature or fix a bug.
Testing is a large subject, one I'd like to discuss more in the future, and definitely something that belongs in the “What School Doesn't Teach You” series.
Dependencies
In general, you probably don't want to implement your own version of algorithms you've already read about. If you've read about it, more likely than not, someone has already implemented, open sourced, and optimized some version of that algorithm in your language of choice.
Maybe you want to do some geospatial work: compute the distance between two points on the globe, or decide if a point is inside some geofence. You figure you did a lot of math in school and this shouldn't be any problem. You Google "great circle {language of choice}", find a code snippet that's maybe 50 lines long using only the standard library. Great! A little copy-pasta and your code can compute distance between two points.
You go to check the result against google maps and you find you're off by an un-ignorable amount. It turns out the world isn't a sphere, it's an ellipsoid, and now you have to copy paste a different formula. It also turns out that which ellipsoid you want to measure on is a big deal, and may depend on what country you're in, or what other systems you may be using.
What if I told you there are whole industries of people working with geospatial information systems (GIS)? And those people share amazing packages with the rest of the world? Take for example GeoTools, a toolkit for JVM languages to work with geospatial data. With these libraries, you can specify your ellipsoid, choose different distance functions, and do much more than you'd ever be able to by copy pasting out of StackOverflow.
Why am I ranting about geospatial data processing in a post about build tools? Build tools allow you to easily include a library like GeoTools in a matter of seconds. Rather than spend time becoming an expert in a given problem area, it's generally OK to trust that there are existing experts who've already done more than you can imagine. Including these libraries, whether from outside your organization or inside, is made significantly easier by your friendly build tool.
In general, you should remember you have the capability of including code from other people, and your build tool is the gateway to accessing that code.
Like testing, there's more to be said about Dependencies in the future...
Deployment
The original, and perhaps most critical function of a build tool is creating deployable artifacts. When you're trying to understand where your software ends up and how it gets there, think of your build tool.
The variety of ways to package code is immense. Javascript can be minified, Java can be turned into a Jar or War, packages can be shared to public repositories, code can be pre-built into binaries for installation on different target architectures, and plenty of things can be turned into Docker images.
Your build tool probably has more deployment options than you could possibly imagine, so keep it in mind.
Code Generation
For a lot of new programmers, starting a project with an empty directory can induce dread. Maybe they've been working at a few different companies but always on existing projects without having to create something completely new, or with a team member who always gets the new projects started.
A decent number of build tools include templating features to initialize new projects. Frequently, community members have built a significant number of project templates using various frameworks and design patterns. Even if you just want a basic "Hello World" program, you might as well use a project template to set up your build files, create basic unit tests, and generate basic documentation.
In general, try to start every project from some kind of template, even if you make your own!
There are plenty of other use cases for code generation. If you're using an API documented with OpenAPI specs, you can generate client libraries for your chose language and frameworks. You may be integrating with Protobuf or Avro or another serialization format, in which case you can use code generation at build time to generate more expressive objects, classes, services, etc.
Why didn't anyone teach me this?
I think there are a few reasons people don't learn about build tools. First, is that academic software tends to be written by a single person, or a very small group of people. The overhead of recompiling/repackaging isn't that high when there's only one person, and frequently this software is being deployed in a non standard way, or simply run locally.
In a boot camp environment, the goal is to get you working productively in a short period of time. Though you will likely use a build tool and understand something about it, it's unlikely you had enough time to really dig into what it was doing.
Another contributing factor is that many developers simply don't understand build tools themselves (hence this article). If the people on your team don't understand what's going on, it's hard for them to share that information with you.
Differences in Scripting Languages
If you started programming with scripting languages (Python, JavaScript, Bash, etc.), you may never run into a build tool since the file you type in is also the file that you run.
Using Python as an example, dependencies are controlled through the Python environment itself. The package manager `pip` can be used to install dependencies into a Python runtime environment. When starting out, users tend to only have a single global Python environment before moving towards tools that let them manage multiple separate Python environments (`virtualenv`, `conda`, etc.). Deployment of a python application can vary as well, but ultimately amounts to copying the Python files to a destination. Running tests in the Python ecosystem is frequently done with a standalone tool.
JavaScript _does_ have build tools with familiar features even though JavaScript doesn't need to be compiled, but the ecosystem is somewhat fragmented with Grunt, Gulp, and Webpack all seeing significant usage, and tools like Yarn for dependency management getting into the mix as well. Since JavaScript is frequently used on both the front and backend, you may find different tools to be more common in each community.
Concluding Thoughts
In short, get familiar with your build tool. When starting new projects, try to start with the build tool. Remember that you have entire suites of additional tooling available to your software project, all you have to do is look. If you're trying to help your team adopt new tools, research indicates that making sure the tool integrates with your existing workflow and simply sharing your build tool experience with teammate can help ease the transition.
If you have questions about build tools, comments on how you use your build tool, or want to add to the conversation, feel free to comment below!
Appendix: Build Tool Project Templating Examples
JVM
Maven
Installation
# Windows
choco install maven
# Ubuntu/WSL
apt install maven
# OSX
brew install maven
Project Generation
For more information, see this page.
mvn archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DarchetypeArtifactId=maven-archetype-simple
# Build and run the new project
mvn package
java -cp target/mvn-example-1.0-SNAPSHOT.jar com.swengsup.App
SBT
Installation
# Windows
choco install sbt
# Ubuntu/WSL
apt install sbt
# OSX
brew install sbt
Project Generation
sbt new scala/scala-seed.g8
# Running the new project
sbt run
Rust
Installation
# WSL
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Windows
choco install rustup.install
# ubuntu
apt install rustc
# OSX
brew install rustup-init
Project Generation
cargo new hello-rust
cargo run