Setting up PostgreSQL for running integration tests

thunderbong1 pts0 comments

Setting up PostgreSQL for running integration testsWhen it comes to testing, achieving performance and reliability is crucial. In this article, I'll explain how to set up PostgreSQL for tests and discuss some common pitfalls to avoid.

Isolation as the primary goal

Before we dive into the details, let's define our goals:

Isolation – We want to ensure that each test runs in isolation. At the very least, this means that each test should have its own database. This ensures that tests don't interfere with each other and that you can run tests in parallel without any issues.

performance – We want to ensure that setting up PostgreSQL for tests is fast. A slow solution is going to be cost prohibitive for running tests in CI/CD pipelines. The solution that we come up with must allow us to execute tests without introducing too much overhead.

The rest of this article will focus on what we have tried, what worked, and what didn't work.

What didn't work

Using transactions

The first approach we tried was to use transactions. We would start a transaction at the beginning of each test and roll it back at the end.

The basic idea is illustrated in the following example:

test('calculates total basket value', async () => {<br>await pool.transaction(async (tx) => {<br>await tx.query(sql.unsafe`<br>INSERT INTO basket (product_id, quantity)<br>VALUES (1, 2)<br>`);

const total = await getBasketTotal(tx);

expect(total).toBe(20);<br>});<br>});<br>The transaction approach works well for simple cases (e.g., testing a single function), but it quickly becomes a problem when dealing with tests that test integration between multiple components. Due to connection pooling, nested transactions, and other factroes, the necessary work to make the transaction approach work would have meant that we are not replicating the real-world behavior of our application, i.e. it would not provide the confidence we need.

For consistency, we also want to avoid mixing testing approaches. Even though using transactions would suffice for some tests, we want to have a consistent approach across all tests.

Using SQLite

Another approach we tried was to use SQLite. SQLite is an in-memory database that is fast and easy to set up.

Similar to the transaction approach, SQLite works well for simple cases. However, it quickly becomes a problem when dealing with code paths that use PostgreSQL-specific features. In our case, due to the use of various PostgreSQL extensions, PL/pgSQL functions, and other PostgreSQL-specific features, we couldn't use SQLite for our tests.

pglite provides PostgreSQL packaged as a WASM module that can be used in Node.js. This might be a good alternative, though we haven't tried it yet. Regardless, the current lack of support for extensions would have been a blocker for us.

Using pg_tmp

Another approach we tried was to use pg_tmp. pg_tmp is a tool that creates a temporary PostgreSQL instance for each test.

In theory, pg_tmp is a good solution. It allows a complete isolation of tests. In practice, is a lot slower than we could tolerate. With pg_tmp, it takes a few seconds to start and populate the database, and this overhead quickly adds up when running thousands of tests.

Let's say you have 1000 tests, and each test takes 1 second to run. If you add 2 seconds of overhead for creating a new database, you are looking at an additional 2000 seconds (33 minutes) of overhead.

If you like this approach, you could also probably get away with using Docker containers. Depending on many factors, Docker containers might be even faster than pg_tmp.

integresql is a project that I came across in a HN thread. It seems like a good alternative that reduces the overhead of creating a new database to about 500ms. It has a pooling mechanism that allows you to reduce the overhead even further. We decided against continuing on this path because we were happy with the level of isolation that we got from using template databases.

What worked

After trying various approaches, we settled on combining two approaches: template databases and mounting a memory disk. This approach allowed us to isolate each test at a database level without introducing too much overhead or complexity.

Template Databases

A template databases is a database that serves as a template for creating new databases. When you create a new database from a template database, the new database has the same schema as the template database. The steps to create a new database from a template database are as follows:

Create a template database (ALTER DATABASE is_template=true;)

Create a new database from the template database (CREATE DATABASE TEMPLATE ;)

The key advantage of using template databases is that you do not need to mess with managing multiple PostgreSQL instances. You can create copy databases and have each test run in isolation.

However, on its own, template databases are not fast enough for our use case. The time it takes to create a new database from a template database is still too high for...

database tests template postgresql test approach

Related Articles