Reducing testing and building time in Monorepos
Let's call N
the number of files that you need to compile/transpile, the number of projects/packages that you need to build, the number of tests that you need to run.
It does not matter if you have the fastest algorithm if your N is too big. Sometimes you need to reduce N to make sure your things do not grow with N.
As your project grows, your “N” will grow together.
Your dependencies sizes will grow.
Your codebase will grow.
Your tests will grow (if you care about the long term).
Your build process will slow down as you are building more code.
Your type checking will get slower.
How to make it fast?
One way to solve this is to use a directly acyclic graph (DAG) to avoid testing, type checking, rebuilding what has not changed since last time.
In this article we won’t use a DAG, we are going to use a much simpler strategy that can be modified to add a DAG to make it more robust.
The first thing that we want to do is to only run tests that are affected by the current files.
How do we know which files were changed?
We are going to use some git
commands to make this possible.
CircleCI concepts
We are going to use CircleCI in this article, but you can apply the concepts to any CI you have.
CircleCI is a continuous integration system that lets you automate common software tasks, for instance: linting, tests, type checking, build, deploy.
A new pipeline is triggered when you do some work (push a new commit, for instance).
A pipeline can contain many workflows.
Each workflow has jobs that represent the cycle of your software process (for instance, install → test → deploy).
A job contains N steps of commands to perform something, like run lint, run tests.
A more concrete example:
You can have a workflow to build your react native app for android and another workflow to build your react native for iOS.
Each workflow will be composed of some jobs like install dependencies, lint, type checking, running tests, building the app, and deploying the app to stores.
Compare URL
To reduce the number of things to build and to test we need to know which files changes inside a pull request and when merging into master.
This is called a Git compare URL that has the 2 commits, the begin and the end, of the changes of this branch or merge.
CircleCI had a concept of CIRCLECI_COMPARE_URL before v2.1, that you can recreate in v2.1 using pipeline variables, like this:
The above command is creating the CIRCLE_COMPARE_URL to version v2.1.
Here is a few examples of COMPARE_URL: https://github.com/owner/repo/compare/...a2e2709c4
The url above says that this pipeline contains code from the base branch (master or main) to the a2e2709c4
commit.
https://github.com/owner/repo/compare/37ac69404a5b...69bc6e3fe337
The above contains code from commit 37ac69404a5b
to 69bc6e3fe337
.
Test by yourself here https://github.com/sibelius/monorepo-101/compare/master...8f8760
Problems with CIRCLE_COMPARE_URL using pipeline variables
The first problem we hit when using CIRCLE_COMPARE_URL using pipeline variables was that they are only bringing the range from the current commit to the last commit.
We would like to always test files related to the whole changes in the pull requests.
To fix this, we discard the “base revision” (the first commit in the compare url), and use the base branch instead.
this
https://github.com/owner/repo/compare/37ac69404a5b...69bc6e3fe337
turns into
https://github.com/owner/repo/compare/master...69bc6e3fe337
Before getting the files using git diff (spoiler), we need to check out the latest master.
This is needed because CircleCI only does a shallow checkout to make it faster, you don’t need all git history or branches in every pipeline.
Testing only affected test files
After all this setup we need to get a list of all affected files to make sure we are testing the right amount of tests.
We are using Entria Deploy CLI to helps us get a list of all files that changed using compare URL and a base ref/branch.
What we are really just doing is a git diff — name-only, like this
that returns a list of all files that changes, like this:
packages/packageB/src/index.ts
After this, we are going to use jest — findRelatedTests
to find which testfiles to test
The result is for this example is this:
packages/packageB/src/__tests__/packageB.spec.ts
Before testing these files we need to check if there is any testfile to test, this could happen if you are only modifying README.md or other files
Splitting tests to make them blazing fast
We can use circleci tests split
to split test files to run them in parallel
Build and Deploy only affected packages
After bringing down the testing time in pull requests to a mean of 3x, can we only build and deploy affected packages? sure we can.
Our monorepo contains many GraphQL servers, and some REST (just one I swear), we also have many frontends and many apps (some white labels).
We used to build and deploy all of them to staging on master after every commit.
To solve this, we used the CIRCLE_COMPARE_URL (that provides the right range in master), to check if we changed any file that would affect the package to be deployed.
entria-deploy hasChanged
is just
git diff master...8f8760 --name-only packages/shared packages/api
Stopping the Build
We need to use circleci-agent step halt
the command to step job execution
This made our pull request test time much slower, making it fast to give feedback to our developers when their changes broke some tests.
This also reduces the time to release and update our staging and production environments.
This also reduced our CircleCI costs, as we build and tested only the needed parts.
What next? I hope to add machine learning like the Firefox team did (https://hacks.mozilla.org/2020/07/testing-firefox-more-efficiently-with-machine-learning/) to make the finding required tests even more fine-grained
I also run a newsletter about Productivity and Startups, you should subscribe to get content early.