How to decrease a monolithic build time by 90% using TeamCity, terraform, and common sense

8 min readApr 17, 2020

No one likes to wait. Especially for a build that takes an hour and a half. Only to find out that there’s a failure in a test in a build. And then you find out that when you push a fix for the failing test, you have to wait another hour and a half because someone else pushed a commit and your fix wasn’t included in the build. And then you find out that the test was really a known flaky test that has lingered in the codebase for years and wasn’t even related to your original change. When all of this happens because you were just trying to push a key feature at 4:30 PM on a Friday night and you end up leaving the office at 8 PM, you know there’s a problem.

Sound familiar to any engineers out there? Thankfully, at Buildium, we’ve been able to engineer our way out of this bit of a recurring nightmare with a foundation of infrastructure as code as well as with some common sense approaches to the everyday process of testing in our monolithic application.

How did we get here?

The testing pyramid

At Buildium, we love testing. A lot. We run tens of thousands of automated tests against our application every day to the tune of millions of test runs over the course of a year. The only problem: over time, our testing strategy began resembling the testing ice-cream cone (anti-pattern) rather than the testing pyramid.

A testing-unfriendly architecture

At Buildium, we followed an active-record-like pattern for persisting and reading data from our database for years. Unfortunately, this made unit testing difficult without adopting patterns like the repository pattern, which means that the vast majority of our tests are integration tests. When you have over 10,000 integration tests reaching out to the database, each test taking about a second to complete, you have a recipe for a multi-hour test run.

Flaky tests

Flaky tests are a fact of testing life that’s worth accepting. But something that’s not worth accepting is test suite pass rates dropping all the way to 50%. If you have to wait for two complete runs of your test suite to get a passing build, you’ve doubled the runtime of your tests.

What’s the way out?

Sometimes when you’re in this sort of mode, it’s hard to even think that things could be different fast enough for you to take advantage of it. But it’s always worth trying something to dig out of this craziness.

Here’s what didn’t work

Trying to change our non-testable architecture overnight

Architecture foundations don’t just happen overnight. Neither do they change overnight. Over the last 7 years, we’ve worked hard at making our core engineering patterns testable. And we’ve made a lot of progress. We’ve gone from about 95% of our tests being integration tests to about 70%. But at this rate of change, we’d either have to wait until 2025 to see most of our integration tests rewritten as fast unit tests, or hire/reroute engineering resources to focus exclusively on an aggressive tech debt project. #hardtojustify

Manually trying to partition tests

At some point, we saw that the best way to tackle this problem was to divide and conquer the tests in our monolith. We tried to break out tests into suites based on the logical core parts of our business (rentals, associations, tasks, integrations, etc., then one for everything else) using NUnit attributes. Then, we configured separate TeamCity build configurations and integration test databases to run each of these suites in parallel whenever a build starts.

Trying to understand how to organize lines of business in your testing code makes sense as a long term approach, but manual test organization requires constant rebalancing of the tests in your monolith to make sure any single build doesn’t end up taking a disproportionate amount of time relative to the other builds running in parallel. For us, this left us at an hour and a half for the longest one of our parallel builds before we gave up on this approach.

Asking engineers to choose which tests run and which tests don’t

Our testing time got so bad at one point, we decided to ask engineers to self-select which tests were critical and which weren’t. The critical tests would be run on every build as a form of a smoke test, but the non-critical tests would be run less frequently. This turned out to be a hard program to implement, since making the call as to which test would prevent a critical problem in production is hard. In practice, engineers just waited for all of the tests to complete (critical and non-critical), so the experience didn’t work.

Here’s what worked

More horsepower

In order to test in parallel to the extent that we needed, we decided to scale the number of build machines horizontally. Ultimately, we started with one AWS EC2 instance, then scaled out to 6, then optimized down to 3. This is where we decided to spend most of our money in solving our build time problem.

Automated, equal test partitioning

We wrote a simple script that gets the names of all the tests that need to be run and divides them alphabetically into 24 partitions. Each of these sets of tests all have roughly the same amount of tests and are passed to a corresponding TeamCity build to be run whenever the application compilation completes.

One database per build/test partition

We decided to stand up one database for each of our 24 build configurations to eliminate any chance tests are flaky because there’s some kind of database contention on test code operation. We never commit any database transactions in our tests, so our databases always look/behave the same after 20,000 tests as they did when they were first created.

Build once, test n times in parallel

When trying to test in parallel, we started off by having all of our 24 parallel build configurations do everything our builds do.

Restore nuget packages
Build the application
Partition tests into 24 chunks
Build 24 database configuration files
Update each database with any schema changes
Run the tests assigned to each build in parallel

What we found when analyzing performance, though, was that our server load was highest when compilation was happening, then leveled off when we began testing. This makes sense since we had 6 AWS EC2 instances for our builds, and each of those six machines were running 4 builds at a time, which meant we were running expensive .NET compilations 4 times on every server all at the same time.

The large spike in load corresponded to .NET compilation, and the steady state after corresponded to 6 NUnit processes running in parallel. The steady state before the spike was when no builds were happening.

Once we understood that, it made sense to pursue an optimization to build once on each build machine, then double the number of parallel test configurations we were running on each agent server from 4 to 8. This enabled us to achieve a much flatter curve for the duration of each test run and bring our overall number of build machines down from 6 to 3, halving that part of our build costs.

With this optimization, we are squeezing as much of the machine’s performance out for the duration of the build (~80% CPU/memory) as possible

Everything as code

Once we committed to running 24 tests in parallel, it became clear that we needed to automate everything about our builds to ensure low-cost long-term maintenance for our builds.

Test partitioning
Build machine infrastructure (terraform + saltstack)
Build configuration as code (TeamCity using the Kotlin DSL)
Database configuration as code

Eradicate flaky tests

All of this sophistication around builds wouldn’t be useful if all it meant was that our tests still fail every other build because of uncared for tests in our application. So we just began doing a few things:

Track flaky tests in a spreadsheet.
Demand that flaky tests be tackled by the team that introduced them.
Ruthlessly delete tests that are consistently flaky and ignored.

This part of the rollout just took some doggedness and some accountability, but we haven’t found much of a problem in explaining the importance of test hygiene. Engineers seem to naturally get the idea that you can trace a bad test back to their team, they ought to fix it ASAP. As a result, our test success rate has jumped from ~50% to ~90% within just a few months.

So, what did we learn?

It is possible to bring a build down from over an hour to less than 10 minutes!

In some of our experiments, we at some point got a build running 48 configurations in parallel and got some builds to finish between 8 and 9 minutes. However…

Bringing a build time significantly down costs $$$

Whether you want to pay in time to refactor integration tests to unit tests or in build machine hosting costs, if you’re getting to the point where your build time is unacceptable, be prepared to spend some money. But in our experience, hosting costs <<< refactoring costs. We’ve decided to stay at 24 parallel configurations for the foreseeable future and have accepted build times close to 15 minutes.

Infrastructure as code opens doors

It’s been about a year since we completed phase one of our experiment, and we can’t emphasize how much having our infrastructure and builds expressed as code has opened the door to share ideas, refine patterns, and mass produce reusable build patterns.

What’s next?

In the next few articles in this series, we’ll plan on sharing a bit more about how we set up the codebases for our infrastructure and builds. Future plans involve running more of our builds in parallel using AWS spot fleets or docker containers to save even more on infra costs and get even more and more testing in, partitioning our tests not by test name but by average runtime, etc.

But we want to hear from you! What have you done to bring down build times for your projects? Is this the sort of thing you’d want to hear more about? Let us know in the comments below.

About the author

Peter is an automation architect at Buildium. Joining Team Buildium in 2014, he has built systems, managed engineering teams, and directed software and testing platforms that have helped establish Buildium’s industry-leading screening, insurance, payments, and core property management products. He’s passionate about all things devops, automated testing, software architecture, and generally making customers successful.