Database Migrations

So you want to modify the database?

Key considerations

Cloud 2.0 includes two databases, nebula and server.

nebula contains all user management data associated with the Cloud 2.0 platform. The database is used by the nebula and auth services.

server contains all workflow data associated with the Cloud 2.0 platform. The database is used by the server service as well as server loop services (e.g. the Scheduler.).

admin

When you change admin's data schema, there are a few components to keep in mind:

The admin database. Data lives in Postgres, so the Postgres database may need to be physically adjusted to incorporate the new changes. We use a tool called alembic to manage such changes, which are called "migrations."
admin SQLAlchemy model definitions, which are stored in src/prefect_cloud/models/orm.py.

logic

When you change logic's data schema, there are a few components to keep in mind:

The logic database. Data lives in Postgres, so the Postgres database may need to be physically adjusted to incorporate the new changes. We use a tool called alembic to manage such changes, which are called "migrations."
Cloud 2.0's logic SQLAlchemy model definitions, which are stored in src/prefect_cloud/logic/database.py.
Prefect 2.0's logic SQLAlchemy model definitions, which are defined in our open source codebase.

Cloud 2.0's migrations for logic are different than the open source migrations.

Migrating the database

CLI tools

All commands for migrating the nebula database are available under prefect-cloud admin database ....

All commands for migrating the server database are available under prefect-cloud logic database .... Note that running prefect logic database ... will not produce the same behavior.

Create a migration

Cloud 2.0 uses SQLAlchemy and Alembic, meaning any changes must begin with a migration.

admin and Events

Navigate to the nebula repo root directory and create a migration file by executing:

alembic -n admin revision -m "<DESCRIBE YOUR MIGRATION>"

The -n indicates which database (or set of migrations) we're working with, either nebula or events. (Note, server is handled in a different way, below). The new file will be created at one of the following locations, depending on the database:

src/prefect_cloud/nebula/migrations/versions/<YOUR NEW FILE HERE>
src/prefect_cloud/events/storage/postgres/migrations/versions/<YOUR NEW FILE HERE>

This file is pre-populated with metadata, upgrade, and downgrade stubs. Please do not modify the metadata.

logic

Navigate to the nebula repo root directory and create a migration file by executing:

prefect-cloud logic database revision --message "<DESCRIBE YOUR MIGRATION>"

Navigate to src/prefect_cloud/logic/migrations/versions/<YOUR NEW FILE HERE> to view it. This file is pre-populated with metadata, upgrade, and downgrade stubs. Please do not modify the metadata.

Autogenerating a migration

Alembic has support for automatically detecting changes to database models.

To autogenerate a migration, add the --autogenerate flag to your revision command.

alembic -n <database> revision -m "<DESCRIBE YOUR MIGRATION>" --autogenerate

Any autogenerated migration code should be carefully reviewed, see Alembic's documentation for limits, as not everything can be autogenerated.

Writing a migration

An alembic migration consists of two parts: upgrade and downgrade, which are implemented as Python functions.

When we run migrations, each migration's upgrade function is called in sequence. These are called inside a single Postgres transaction, so upgrades are only applied if they all succeed; otherwise, they are rolled back.

Similarly, when downgrading the database, downgrade functions are called in sequence. We would typically not run downgrades on a production database unless something went wrong, but they are critical for enhancing our development experience (letting us cleanly move between different database states, perhaps from different git branches or environments) and helping us ensure that we have a way out of trouble.

If you need help, the Alembic tutorial is a good place to start.

Running migrations

admin

The best way to run migrations is prefect-cloud admin database upgrade or prefect-cloud admin database downgrade.

You can pass a few arguments to these commands. -n is passed to Alembic to tell it how to run the migration: - prefect-cloud admin database upgrade -n +5 upgrades the next 5 migrations - prefect-cloud admin database downgrade -n -2 downgrades the last 2 migrations - prefect-cloud admin database downgrade -y skips the confirmation (for both upgrades and downgrades; --yes is also accepted) - prefect-cloud admin database upgrade -n abc123xyz upgrades until migration with ID abc123xyz - prefect-cloud admin database upgrade -n heads upgrades until the latest migration (this is the default behavior, you don't need to pass this)

As a special case, you can completely downgrade/upgrade the database with prefect-cloud admin database reset.

logic

All of the above commands apply for prefect-cloud logic database .... See CLI help for more details.

What happens when you upgrade / downgrade the database?

When you run migrations via prefect-cloud admin database upgrade [or downgrade], Alembic runs all migrations in a transaction. This means that if there's any problem, the database is not affected because the entire transaction is rolled back.

Please try to write migrations that are backwards- and forwards-compatible if possible.

Tips for writing an effective migration

Please note: databases are finicky. Effective migrations often come down to truly understanding what you're asking the database to do and how it's going to do it. The tips here are not universal and may even be detrimental if misapplied.

Releasing migrations

In a best-case scenario, your migration will not affect the database physical schema (for example, adding or removing an index) or only affect source code. In this case, you can PR your migration and include it in any backend release.

However, destructive or non-backwards-compatible migrations may require special handling. This is because the various components of the admin architecture update at different times, so if they are out-of-sync, there will be downtime.

A classic example of a destructive migration is renaming a database column. Let's say we want to rename column a to column b on table test. This is a simple migration in the database (ALTER TABLE test RENAME COLUMN a TO b;) and potentially an easy change in source code. However, let's say that in this release, our database migration is grouped with one that takes a very long time. Our new source code, looking for column b, will be released rapidly, but the database will still contain column a until the migration completes. This inconsistent state will cause our application to crash until it is resolved.

A better approach is to split up destructive migrations. First, introduce a migration that adds a new column, b, to test. This can be one PR, which can be released. Next, release source code that reflects column b. Last, introduce a migration that drops column a. Note that if these three steps get combined into a single release, it will defeat the purpose of splitting them up - the goal is to separately release each one so that the application experiences no downtime.

Test with data

It's easy to see if your migration works properly on an empty database - just run the upgrade and downgrade commands in sequence a few times (admin will do this in unit tests!). It's harder to ensure it will work with data intact. admin takes a stab at doing this in a unit test, but you shouldn't rely on it because we can't recreate all situations.

Common gotchas: - Changing a column to allow NULL values in an upgrade, and forgetting to handle NULL values when downgrading back to a NON-NULL column. - Deleting data (accidentally or intentionally) when upgrading or downgrading - remember this could run on a production database. You want to make sure any present data is maintained!

Dropping constraints

The most expensive operation you can do to a database is an UPDATE; this is because updates are secretly a combination of an INSERT and a DELETE. When you INSERT data into a table, all triggers and constraints are checked and all indices have to be updated. These checks and updates can take more time than the actual insert itself. Furthermore, because UPDATE is secretly an INSERT, updating a single un-indexed column STILL requires checking all constraints on all columns (this is a common pitfall!).

Therefore, even though it looks like a lot of extra code, it can be better to drop all constraints/indices/triggers prior to inserting data into a table, and recreate them after. Generating them in one shot will ALWAYS be faster than having them update row-by-row.

Tools like Postico can help you see what indices / constraints are in place, and even give you the DDL to recreate them.

Locks

Migrations run in a transaction, so the changes they make are not "visible" to database consumers until and unless the transaction is successfully committed. However, the changes you make could nonetheless require various forms of database locks, and could consequently result in the database denying reads or writes while the transaction is ongoing.

Temporary tables

If you're adding a column to a table, the table will be locked (possibly in SHARE MODE, see above!) for the duration of that manipulation. It may actually be faster to do all your work in a temporary table, then drop the original table and rename the temporary table. This is a matter of trading off speed / annoyance: any references to the original table (foreign keys, views, triggers) will also need to be dropped and recreated.

Here's a tip: create a temporary schema for your temporary table and give the temporary table the same name as your original table. Names are only unique within a schema. When you drop the original table, just change the temporary table's schema and drop the temporary schema. This way you won't have to rename its indices and constraints. For example, a table named xyz_tmp might generate a primary key called xyz_tmp_pkey. If you don't follow this tip, you'll have to change that to xyz_pkey yourself.

Note that this will only be noticeable for large changes (>1,000,000 rows)

Here is the naive way, which will lock my_giant_table for the duration of the transaction:

ALTER TABLE my_giant_table ADD COLUMN b INT;

-- drop indices, foreign keys, etc
DROP INDEX ix_my_giant_table_a;

UPDATE TABLE my_giant_table
SET b = a + 1
WHERE a > 10;

-- create indices, foreign keys, etc.
CREATE INDEX ix_my_giant_table_a ON my_giant_table (a);
CREATE INDEX ix_my_giant_table_b ON my_giant_table (b);

Here is a more advanced approach, which will minimize database downtime:

-- create temporary schema
CREATE SCHEMA tmp_table;

-- create temporary table by writing directly from original table in one shot
CREATE TABLE tmp_table.my_giant_table AS (
    SELECT
        *,
        CASE WHEN a > 10 THEN a + 1 ELSE NULL END as b
    FROM my_giant_table
)

-- create all indices, foreign keys, etc.
CREATE INDEX ix_my_giant_table_a ON tmp_table.my_giant_table (a);
CREATE INDEX ix_my_giant_table_b ON tmp_table.my_giant_table (b);

-- drop the original table
DROP TABLE my_giant_table;

-- move the new table
ALTER TABLE tmp.my_giant_table SET SCHEMA public;

-- drop the schema
DROP SCHEMA tmp_table;

Avoid Sub-transactions

In general, you shouldn't create your own transactions. If your migration includes a transaction, it WILL take effect immediately. For example, if you wrote:

BEGIN;
DROP TABLE xyz;
COMMIT;

table xyz would be dropped even if all migrations (and the alembic transaction) hadn't finished -- and, in fact, even if that transaction fails.