Infrastructure Services¶

For Customer-Managed Prefect, consider the following infrastructure requirements.

Minimum Infrastructure Requirements¶

The following minimum specifications are required for a production Customer-Managed Prefect deployment. Kubernetes, PostgreSQL, and Redis may need to scale beyond these minimums depending on the number of flow runs, event volume, and concurrency in your environment.

Component	Min Version	Nodes	CPU	Memory	Storage
Kubernetes	1.29	3	16 vCPU / node	24 GiB / node	—
PostgreSQL	14 (recommended: 17)	—	8 cores	20 GiB	100 GiB
Redis (single instance)	6.2 (recommended: 7.0)	—	2 cores	8 GiB	—

Redis: A single instance is supported, but three dedicated instances (one per role) are recommended for better isolation and performance.

Instance	CPU	Memory
Cache	1 core	2 GiB
Work	1 core	2 GiB
Streams	1 core	4 GiB

Load Balancer¶

A load balancer is used to facilitate high traffic throughput, multi-zone availability, SSL termination, and integrates with an ingress controller. Either a Layer 4 or a Layer 7 load balancer can be used. Prefect requires SSL termination and host-based path routing to the Istio service mesh.

If bringing your own service mesh, ensure you route traffic from the API and UI to the appropriate ingress resources.

DNS¶

Two distinct Fully Qualified Domain Names are required for Customer-Managed Prefect.

api.<domain> - Used for API interactions such as the Workers and clients.
app.<domain> - Used to access the application from the web.

These domains will be used for: * Common Names and/or Subject Alternate Names for SSL Certificates * The CLOUD_UI_URL and CLOUD_API_URL values

Using your existing DNS provider, DNS records should be created to route traffic: one each for the api and app sub-domains route traffic to the ingress.

SSL Termination¶

SSL Certificates are required to properly secure the client interactions with the API, even in a customer-managed scenario.

Certificates can be requested through a vendor such as DigiCert or LetsEncrypt. The Prefect API requires a publicly signed SSL certificate. A self-signed certificate can be used; however, as the Prefect Client (the open-source Prefect package) is shipped with the certifi package, it will not be trusted by default. To ensure maximum availability and support, a publicly signed certificate is recommended.

Certificate expiration should be generated in accordance with internal / security policies.

Kubernetes Cluster¶

Kubernetes is the core execution platform required for Prefect. See Minimum Infrastructure Requirements for node sizing.

Istio¶

Istio is responsible for managing the networking Service mesh and auth routes to route traffic properly. This includes traffic from the ingress to the nebula service, auth service, ui service, orion service, and others.

This is configured and deployed as part of the Prefect installation through Helm. No additional configuration should be necessary.

Redis¶

Redis serves three distinct roles in Customer-Managed Prefect, each backed by a separate instance: caching, background task queues, and event streaming. All data in Redis is transient; durable state lives in PostgreSQL.

Cache¶

Caches frequently accessed data to reduce load on PostgreSQL. The auth service relies heavily on this instance to cache authentication tokens and session data, avoiding a database round-trip on every request. If the cache is empty (e.g., after a restart), these entries are regenerated from PostgreSQL on demand. The result is temporarily increased database load and slightly higher latency until the cache warms up.

Work¶

A dedicated instance for background task queues. Background services (like orion-background and nebula-background) enqueue deferred work here, including scheduling operations, deletion jobs, and push work-pool tasks. Multiple worker processes consume from these queues concurrently. See Background Services for details on individual services.

Streams¶

A dedicated instance for event-driven processing. Services like actions and ladler consume event messages from Redis streams to trigger automations, move events into PostgreSQL for long-term storage, and process other real-time reactions. Events are held for their retention period and then discarded.

Failure Behavior¶

Redis data loss (empty restart): No flow runs, task runs, or deployment state is lost. All of that lives in PostgreSQL. The practical impact is:

Cache instance: Temporarily higher latency and database load while caches rebuild. Users may see brief re-authentication prompts.
Work instance: Any in-flight background tasks that were queued but not yet processed are lost. These are generally retried or re-enqueued by the services that created them. You may see short delays in operations like scheduled deployments or deletions catching up.
Streams instance: Any unprocessed event messages are lost. Events already written to PostgreSQL by ladler are unaffected. Automation triggers that depended on in-flight events may not fire for that window, but new events will flow normally once Redis is back.

Redis unavailability (outage): While Redis is unreachable, services that depend on it will fail to enqueue or consume work. API endpoints remain functional for reads and writes against PostgreSQL, but background processing (scheduling, automations, push work-pools) will stall until Redis connectivity is restored. No data is lost in PostgreSQL during this period.

Backup and Persistence¶

Redis backup is optional. All data in Redis is either regenerable from PostgreSQL or ephemeral. Losing it does not cause permanent data loss; the impact is limited to the transient effects described above. The critical backup priority is PostgreSQL, which contains all durable workflow data (flows, runs, deployments, events).

Recovery process: If you need to migrate to new Redis instances or recover from data loss:

Spin down Prefect workloads (stop active workers and deployments)
Wait a few minutes for backend processing to finish
Redeploy with new/empty Redis settings
Caches rebuild and queues repopulate automatically

Your managed Redis service may provide automated snapshots, which can reduce downtime during migrations, but these are supplementary, not required.

PostgreSQL¶

PostgreSQL is used to manage state. There are three databases in use to maintain the application:

Table	Purpose
Events	Logs and Actions
Nebula	Authorization, RBAC, Workspaces, Accounts
Orion	Flows, Tasks, Deployments

For recommended PostgreSQL configuration flags to optimize performance and monitoring, see Database Flags.

Events DB¶

Used to store events data. The primary purpose is to provide a performant datastore for recent events data, with most queries returning in under ~200ms.

Nebula DB¶

The Nebula database stores information about users, accounts, teams, etc. that are used to control access to server data.

Orion DB¶

The Orion database stores information about flows, flow runs, task runs, etc.