Fixtures and Factories: Balancing Trade-offs (Part 1)

ActiveRecord models are at the heart of many Rails applications, so having a fast and convenient way to create or access instances of ActiveRecord models in tests is important. Fixtures and factories are two possible means to provide such instances to our tests.

Naive approach

One possible way to construct ActiveRecord instances for use in tests is to simply use plain Rails methods, like ActiveRecord::Base::new. For example, if we are writing some unit tests related to email confirmation:

let(:confirmed_user) { User.new(confirmed_at: 1.day.ago) }

This approach has a few drawbacks. One problem is that this user is not very fleshed out and probably isn’t valid?. This invalid User might cause trouble for the tests that we want to write, and the bare-bones attributes of this user (which lacks a name, email, etc.) runs afoul of the principle that we want to write tests exercising application states that our application is reasonably expected to encounter.

So, we could set up the user with some other attributes, enough to make the user valid?:

let(:confirmed_user) do
  User.new(name: 'Elon', email: 'elon@tesla.com', password: 'rockets', confirmed_at: 1.day.ago)
end

The problem now is that our test setup distracts from the focus of our tests (which are concerned with email confirmation). We don’t really care what the user’s name, email, and password are; we only added those attributes to make the user valid.

Furthermore, this additional code in the test setup introduces an unnecessary maintenance burden.

Another potential issue here is that this model is not persisted to the database. There are a number of reasons that we might prefer to have database-persisted models to use in our tests. We could fix this issue by using ActiveRecord::Base::create rather than ActiveRecord::Base::new. The problem with this approach is that it’s going to slow down our test suite. Writing a new record to the test database takes time, and this performance penalty might be paid repeatedly — once per test example that references the confirmed_user. With most relational databases, reads are significantly faster than writes, so it would be better if a valid User record already existed in the test database, and our tests could simply retrieve that record each time that it’s needed, rather than repeatedly persisting it to the database.

Fixtures and factories are testing tools that address some of these concerns.

Fixtures (the Rails way)

Test fixtures are natively supported by Rails, and the Rails guides provide a great overview. To quote the Rails guide:

Fixtures is a fancy word for sample data. Fixtures allow you to populate your testing database with predefined data before your tests run. Fixtures are database independent and written in YAML. There is one file per model.

So, we could create a spec/fixtures/users.yml file with the following content:

elon:
 name: Elon
 email: elon@tesla.com
 password: rockets

and have Rails turn that YAML description of a User into a persisted User record before any of our tests are executed. Then, our tests can access that User record via the users method that Rails will define for us in tests:

let(:user) { users(:elon) }

We now have a valid record, without the boilerplate that specifies the user’s name, email, and password 👍, and the record is already persisted to the database, so we don’t pay the performance penalty associated with writing the record to the database “on the fly.” Nice.

However, getting back to our tests related to email confirmation, let’s modify Elon so that he has a confirmed_at timestamp:

let(:confirmed_user) do
  users(:elon).tap { |user| user.update!(confirmed_at: 1.day.ago) }
end

The trouble with this approach is that, when using a database with a Multiversion Concurrency Control (MVCC) model like Postgres’s, an UPDATE statement is actually basically an INSERT under the hood (creating a new row, and marking the old row as deleted), so our update! statement will be just about as costly as creating a brand-new record. If we want to get our tests back to a more performant state, we could avoid the necessity of this update statement by creating a fixture that we already know to be confirmed:

# spec/fixtures/users.yml
confirmed_user:
 name: Elon
 email: elon@tesla.com
 password: rockets
 confirmed_at: 2018-03-22 18:58:57.114611000 Z

and then in our test:

let(:confirmed_user) { users(:confirmed_user) }

Not bad. However, this approach of manually defining Users in YAML is going to be a pain to maintain when we want to test more complex “states” that a User might be in, and as we change the database schema over time. For example, if our app has a concept of user “approval”, we might have fields like approved_at and approved_by_id, requiring that we specify both of these attributes in order to create a representative example of an approved user. And, down the road, we might even change the approval process so that it can be performed by Admins or by other Users, adding an approved_by_type column in order to support a polymorphic association. When this happens, it would be annoying to have to update all of our fixtures defined in our users.yml fixture file.

Enter factories.

Factories

To quote Wikipediaa factory is a function or method that returns objects of a varying prototype or class from some method call, which is assumed to be “new”.

The concept of a factory in programming is not specific to testing, but it’s particularly prevalent in testing, since tests commonly have a need for objects built with certain, known traits in order to then test assertions about those objects.

When it comes to using factories for building ActiveRecord objects in tests, thoughtbot’s factory_bot gem (formerly known as factory_girl) is the go-to library.

factory_bot allows us to write factory definitions using Ruby code, including a concept called “traits” that allows us to conveniently specify certain attributes of our models.

# spec/factories/users.rb

FactoryGirl.define do
 factory :user do
   name 'Elon'
   email 'elon@tesla.com'
   password 'rockets'

   trait :confirmed do
     confirmed_at { 1.day.ago }
   end
 end
end

factory_bot provides us with helper methods in our tests, like #create, so we can say

let(:confirmed_user) { create(:user, :confirmed) }

This is a nice approach, because if the implementation of what it means to be “confirmed” changes, or if our database schema changes, then we just need to update our factory definition, rather than updating a bunch of fixtures written out in YAML.

However, this example pays that performance penalty of writing to the database on the fly — but there’s an app a gem for that.

FixtureBuilder

At Hired, we use a gem called FixtureBuilder that gives us some of the best of both worlds (fixtures and factories). We get the performance benefit of fixtures (reading from the database in tests, rather than writing to it) with the convenience, clarity, and maintainability of factories. FixtureBuilder allows you to build YAML file fixtures using factories (such as factory_bot). In other words, with FixtureBuilder, we can use factory_bot to generate our fixture YAML files, rather than writing and maintaining the YAML ourselves.

With FixtureBuilder, we can write something like

# spec/support/fixture_builder.rb

FixtureBuilder.configure do |fbuilder|
 fbuilder.factory do
   fbuilder.name(:confirmed_user, FactoryBot.create(:user, :confirmed))
   #
   # ... many other fixture definitions ...
   #
 end
end

We’ll run this FixtureBuilder script before our tests, which will generate our spec/fixtures/users.yml (and other model fixture files) for us, based on the fixtures that we defined using factories in spec/support/fixture_builder.rb. Then, in our tests we’ll write

let(:confirmed_user) { users(:confirmed_user) }

just as we would when using native Rails fixtures derived from manually written YAML fixture definitions.

Note that our use of FixtureBuilder to leverage factory_bot for creating fixtures does not preclude us from also creating objects “on the fly” using factory_bot in our tests themselves. We can use either approach in any test (or even a combination of pre-built fixtures and live-generated factory objects). At Hired, neither approach predominates over the other. We’ll discuss how to choose between these approaches in more detail below.

Pros and Cons

We’re looking pretty good at this point!

The good

To recap, here are a few of the “pros” of an approach like this:

  1. We have fixtures that we can read from the database for faster tests.
  2. We are using factory_bot to define our fixtures, which keeps our fixtures DRY and maintainable. If we change the database schema, we just need to update one factory definition, not make changes to many different statically defined fixtures.
  3. Our test setups are concise and highlight the relevant entity/attributes under test.

There’s another significant benefit of using fixtures (whether via manually written YAML files, or generated via a tool like FixtureBuilder) that we haven’t discussed yet. In our production app, we have millions of database records interacting with each other, sometimes in ways that we fail to consider (hence, bugs). In light of our principle that tests should exercise application states that our application is reasonably expected to encounter, and because our ActiveRecord instances in production operate in the context of a varied collection of other associated records, so too should the ActiveRecord instances in our tests exist in the context of other persisted ActiveRecord model instances. Fixtures, by pre-populating our test database with a few thousand records that have somewhat realistic attributes, get us closer to that goal, and give our tests a better chance of surfacing unanticipated interactions between records before those bugs are shipped to production.

Also, although we’ve focused on just a single User record without any associated records in our examples so far, which might not involve too bad of a performance penalty, the performance benefits of fixtures add up even more when they allow us to avoid writing multiple records to the database per test. For example, if a user gets a free ice cream after they have made five purchases, a test of this behavior might want to have a persisted User record, and five associated Order records, too. A test setup that leverages fixtures to write zero records to the database, rather than six, is going to execute significantly more quickly.

The bad

Fixtures are not without their downsides! Indeed, the downsides can be quite significant.

First, if you’re using FixtureBuilder, then in order to prevent the generated fixture YAML files from going out-of-date, you need to rebuild your fixtures whenever the database schema changes or you or any other developer changes a factory definition. This can take some time. Here at Hired, our FixtureBuilder-generated fixture set consists of over 2,000 records, and on a MacBook Pro it takes about one minute to execute all of our factory_bot factories, write the records to the database, and then dump the test database state into YAML files. Rebuilding the fixture set only needs to be done every so often, and paying this cost allows your tests to execute more quickly thereafter, but the delay of regenerating the fixture set is an unfortunate and annoying price to have to pay every once in a while. (Note that, although this is annoying for local development, when running all tests on a CI server, this cost is amortized away many times over, since every single test gets executed, and (ideally, at least) each fixture that we have defined will be used repeatedly. This won’t be the case in local development, where you’ll typically just run a small subset of known-to-be-relevant tests.)

Second, fixtures will slow down the initial execution of your tests. When you kick off a spec run (e.g. rspec spec/models/user_spec.rb), the first thing that Rails will do before it starts running any of your tests is to delete all records from the test database and then (re-)write your fixtures to the database (based on the definitions in your fixture YAML files.) This helps to ensure that our tests execute uniformly from one test run to the next, always starting with the initial database state that is specified by our fixtures. As an experiment, I made some tweaks to the activerecord gem so that it doesn’t destroy and then re-insert all fixtures in between test runs; this reduced the time-to-actual-test-execution on my machine from about 12 seconds to 9 seconds, suggesting that fixture re-insertion is slowing down initial test execution by about 3 seconds. This is another annoying up-front cost that one pays in order to get faster execution of the specs themselves.

Third, your fixtures should/will have associations connecting them together. As mentioned previously, avoiding the need to create multiple, associated records “on the fly” in specs is where you will see some of the largest performance gains from using fixtures. However, this means that you might have a relatively complex set of relationships between your various fixtures, which it might be important yet non-trivial to understand when using any of those fixtures in a test.

Finally, fixtures can be hard to change. Actually, changing a fixture is easy — it’s changing a fixture without breaking a bunch of tests that can be difficult. Once a useful fixture has been in the codebase for a while, many tests will be written using that fixture, and those tests might depend — explicitly or implicitly — on certain attributes of that fixture (and its associated records.) By making any change to a fixture or its associated records, you risk violating those assumptions and causing any or all of those tests to fail. Fixing these broken tests can be a real time suck when making changes to fixtures. An alternative to changing an existing fixture would be to simply create a new fixture, but this bloats your fixture set and compounds the performance penalties just discussed.

Balancing trade-offs

That’s a lot to consider — and all just to set up our tests! How is one to choose between using factories “on the fly” in test setups vs. using pre-built fixtures? When is it worthwhile to add yet another fixture to our fixture set, knowing that each additional fixture will incrementally degrade our developer experience and slow down at least certain parts of our workflow (while speeding up other parts)? Some guiding principles might be helpful.

Is it “a thing”?

When deciding whether or not to create a new fixture, I’ll ask myself “Is this entity that I’m testing ‘a thing’?”, in the sense of the slang phrase. In other words, does this entity represent some sort of meaningful type/class of object that exists within the product? For example, if users can have different roles, and I’m building a feature that adds a new type of user role, then I’m definitely going to add a new user fixture with that role. That new role type is “a thing” in the app.

In contrast, if I’m testing logic that depends on when a user record was last updated, I’m almost certainly not going to create new fixtures called user_updated_6_days_ago and user_updated_8_days_ago. Although that distinction might be meaningful for at least some aspect of our product, because it’s not a significant or commonly relevant distinction, it is probably best not to create fixtures that are distinguished by something so relatively trivial.

In other words, if you were to ask a non-engineer at your company “What different types of users do we have in our app?”, “What are some of the most relevant distinguishing features or states that an order can be in?”, etc., the answers to those questions are probably good candidates for fixtures, since presumably those differences will result in lots of different behaviors that we’ll want to have test coverage for. In other cases, I’d lean toward just updating some existing fixture(s) to have whichever attributes I need for a particular test, or just build a brand-new object using factory_bot “on the fly” in the test setup.

How many times will a fixture be used?

This guiding question has a lot of overlap with the above principle. Simply put, the more times that we use a given fixture in our test suite overall, the more worthwhile it is to pay the upfront cost of generating that fixture and inserting it into the database at the beginning of each test run. Inversely, a fixture that is purpose-built to capture an “edge case” or a relatively infrequent or insignificant state is probably not going to be used very much in our tests, and the upfront cost of making every developer spend some amount of time building that fixture might not be worthwhile.

Happy testing!

Since a lot of these questions ultimately boil down to a subjective judgment call, it’s the sort of thing that we at Hired like to leave to the discretion of individual developers and teams. Ultimately, although it’s good to be aware of these considerations when deciding how to create persisted ActiveRecord models for use in tests, the decision probably isn’t going to make or break your app, or even your test suite.

You don’t need to write “the perfect test”; what matters is that at least you’re writing a test. 🙂

About the Author

David Runger

David Runger is a Support Engineer at Hired and was previously a Software Engineer on Hired’s Client Experience team, working primarily with Ruby on Rails and React to build features that help companies find their next great team member. He passionately believes that automated testing is necessary in order to create reliable software applications and — for better or for worse — writes more lines of test code per line of application code than any other developer at Hired. When David’s not writing tests, you’ll most likely find him riding a bicycle.