PostgreSQL Database Flags

Overview

This document provides recommended PostgreSQL configuration flags for Customer-Managed Prefect databases. These flags are used in Prefect's production environments and are tuned for performance, monitoring, and reliability.

Important: The values provided are examples based on specific instance sizes. You should tune these flags according to your environment's specifications, workload characteristics, and monitoring feedback.

Applicability

These flags apply to all three PostgreSQL databases used by Customer-Managed Prefect:

Events - Stores events and automation data
Nebula - Stores authorization, RBAC, workspaces, and accounts
Orion - Stores flows, tasks, deployments, and run data

For more details on these databases, see Infrastructure Services.

Common Flags (All Databases)

These flags should be applied to all Customer-Managed Prefect databases.

Query Performance Monitoring

auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_format = json
auto_explain.log_min_duration = 1000
auto_explain.log_nested_statements = on
auto_explain.log_timing = off
auto_explain.log_triggers = on
auto_explain.log_verbose = on
auto_explain.sample_rate = 1

These flags enable automatic logging of slow queries (over 1000ms). The JSON format makes logs easier to parse and analyze. This is critical for identifying performance bottlenecks in production.

Note: Some managed database services require additional configuration to enable auto_explain. Consult your platform's documentation if these settings don't take effect.

Maintenance and Vacuuming

autovacuum_vacuum_scale_factor = 0.1

This sets the threshold for triggering autovacuum at 10% of table size (down from the default 20%). More aggressive vacuuming helps maintain query performance on high-write tables like flow runs and events.

Warning: Overly aggressive vacuuming can increase I/O load. Monitor your database performance when adjusting this value.

Checkpoint and Write-Ahead Log (WAL)

checkpoint_timeout = 900
log_checkpoints = on
max_wal_size = 5120
wal_compression = on

checkpoint_timeout - 15 minutes (900 seconds) between checkpoints, balancing crash recovery time with checkpoint overhead
log_checkpoints - Enables logging of checkpoint activity for performance monitoring
max_wal_size - 5GB maximum WAL size before forcing a checkpoint
wal_compression - Reduces disk I/O and storage requirements for WAL files

Connection and Lock Management

max_connections = 4250
max_locks_per_transaction = 500
log_lock_waits = on

max_connections - Set based on expected concurrent connections from all Prefect services and workers. The example value (4250) is for large deployments; adjust down for smaller installations to reduce memory overhead.
max_locks_per_transaction - Increased from default (64) to handle complex queries with many table locks, especially during migrations or large batch operations
log_lock_waits - Enables logging when queries wait for locks, helping identify contention issues

Warning: Each connection consumes memory. Setting max_connections too high on undersized instances can cause out-of-memory conditions.

Memory Configuration

work_mem = 14336

This sets per-operation memory for sorts and hash tables (14MB in the example). Higher values reduce disk-based operations but consume more memory per query.

Formula: Tune based on (available_memory - shared_buffers) / max_connections to avoid over-allocation.

Primary Database Flags

These flags apply to the primary (read-write) database instance.

maintenance_work_mem = calculation-based
shared_buffers = calculation-based

maintenance_work_mem

Used for maintenance operations like VACUUM, CREATE INDEX, and ALTER TABLE.

Example calculation for a 13GB instance:

min(floor(13312 / 8), 2048) = min(1664, 2048) = 1664 MB

General formula: min(floor(instance_memory_mb / 8), 2048)

shared_buffers

Database's shared memory buffer pool. This is critical for read performance.

Example calculation for a 13GB instance:

floor((13312 / 1024) * 0.4 * ((1024 * 1024) / 8)) = 5570 MB (approximately 40% of memory)

As a general formula, set to approximately 25-40% of total instance memory. The 40% value is appropriate for dedicated database servers. Use lower percentages (25-30%) if the database shares resources with other services.

Warning: Changing shared_buffers requires a database restart. Plan this change during a maintenance window.

Replica Database Flags

These flags apply to read replicas.

max_standby_archive_delay = 300000
max_standby_streaming_delay = 300000

Both values are set to 300,000 milliseconds (5 minutes). This controls how long queries on replicas can run before being cancelled due to WAL replay conflicts.

Higher values prioritize long-running read queries on replicas over replication lag. Lower values keep replicas closer to the primary at the cost of potentially cancelling slow queries.

Consider lower values (60,000-120,000) if read replica freshness is critical. Set higher values if you have long-running analytical queries on replicas.

Implementation Notes

Testing Configuration Changes

Test in non-production first - Always validate flag changes in a development or staging environment
Monitor key metrics - Watch query performance, connection counts, memory usage, and replication lag
Change one flag at a time - This makes it easier to identify which changes improve or degrade performance
Review logs - Check PostgreSQL logs for warnings or errors after applying changes

Monitoring After Changes

After applying these flags, monitor:

Query performance (via auto_explain logs and pg_stat_statements)
Connection saturation (current vs. max connections)
Checkpoint frequency and duration
Vacuum activity and table bloat
Replication lag (for replicas)
Memory usage and swap activity

Infrastructure Services - Overview of PostgreSQL's role in the stack
PostgreSQL Backend Details - Technical details on database architecture
Database Migrations - Migration management
Troubleshooting - Recovery procedures for database issues