Functionally migrating from Firestore to PostgreSQL | by Valentin Mouret | MediumSitemapOpen in appSign up<br>Sign in
Medium Logo
Get app<br>Write
Search
Sign up<br>Sign in
Functionally migrating from Firestore to PostgreSQL
Valentin Mouret
4 min read·<br>Jun 6, 2022
Listen
Share
Press enter or click to view image in full size
La Bourse aux Livres<br>At La Bourse aux Livres, we help our users sell the books they don’t read anymore. They just have to scan them using our app, we let them know if they are worth sending over, sell them, and split the benefits.<br>As you might imagine, we have a reasonable amount of data, between the scans from our users, the market information, and our orders.<br>The database that was chosen at the time the product was built was Firestore.<br>However, we wanted to move to a relational database for several reasons, not the least of which being that our data is highly relational. We chose PostgreSQL because it’s great and open source, but soon came the time of the hard question: how do we migrate our data over?<br>Normalise everything<br>We first tried a naive strategy: create a table for each of the Firestore collections we had, and adding constraints (types, primary keys, foreign keys).<br>The limits of this approach were:<br>We had to put a lot of work upfront to normalise the documents.<br>There were a lot of columns and tables to create and types were inconsistent.<br>Most of those columns made no sense in a relational schema: our documents were messy aggregates we wanted to split up, which yields even more complexity.<br>The code that was reading the data would have to be migrated too, with more complex queries to get the original document.<br>So, we had to invest a lot of work with very little value but a lot of complexity just because we had a messy data structure to start with.<br>Not a good deal, and not good for morale.<br>But then, we came up with an alternative.<br>Keep the documents<br>PostgreSQL can store JSON documents in a column using the jsonb data type.<br>This means that we can trivially mirror any Firestore collection in Postgres two columns:
Simple table to hold Firestore records.Migrating a record from the Firestore to Postgres is then trivial too:
Insert a JSON document in our table.Migrating reads is simple too:
Read a JSON document by its ID.Any code that was pulling a document from the Firestore can run the same with this query as we can restitute the original Firestore document.<br>This shows that we can run the migration very easily using a simple structure.<br>But that’s not all. Using jsonb methods from PostgreSQL, we can directly query the fields inside the documents:
We can refer to the JSON fields, including in the `where` clause.Note: ->> accesses the field of a JSON.
This looks nice, but since there is no index on the document, it will require a full table scan. Which takes a lot of time and resources.<br>There are two ways to go around this limitation:<br>indexing the JSON column (yes, you can)<br>extracting columns<br>Indexing the column<br>PostgreSQL supports indexing jsonb columns using a GIN index.
Create a GIN index on our JSON column.Then, some queries can leverage the index to be faster.<br>However, since we have a lot of records with a lot of fields per record, this had a significant memory footprint. We preferred using the second method, but this can be great if you have simpler documents.<br>Extracting columns<br>Another great feature of PostgreSQL is to generate columns. What do we mean by that? Well, we can tell PostgreSQL to generate a column from another one (or several).<br>For example:
Using a column generated from another one. Yeah, in France you are an adult at 18. :)The is_adult column cannot be written to: it is generated from other fields and stored. It is recomputed when the record changes, but the value is saved on disk.<br>We can combine this with what we saw before to put an index on meaningful fields:
We can add an index on a generated column since it’s physically written to the disk.This way, we can extract columns from our Firestore documents and put an indexes on them.<br>How great?<br>Summing up<br>If you recall, the limits of our first approach were:<br>We had to put a lot of work to normalise the documents
Here, we can migrate documents with only two generic columns at first and extract as we go.<br>We had to maintain two collections, some items were in one collection but not the other
Since our structure is simple, we can migrate all records. Furthermore, since we only derive our columns from our base documents, it’s all functional. Overwriting a document will generate our columns again.<br>Most of those columns made no sense in a relational schema
Here, we can focus on extracting the columns that matter.<br>We have to either rewrite all queries to migrate the code that was expecting the entire Firestore document or rewrite them to use the normalised ones.
Since a column still holds the full document, a piece of code that was expecting the Firestore document can select our document column...