Control plane¶
Multi-node PgDog deployments require synchronization to perform certain tasks, like atomic configuration changes, toggling maintenance mode, resharding, and more. To make this work, PgDog Enterprise comes with a control plane, an application deployed alongside PgDog, to provide coordination and collect and present system telemetry.
How it works¶
The control plane and PgDog processes communicate via the network using HTTP. They exchange messages to send metrics, commands, and other metadata that allows PgDog to transmit real-time information to the control plane, and for the control plane to control the behavior of each PgDog process.
Configuration¶
In order for PgDog to connect to the control plane, it needs to be configured with its endpoint address and an authentication token, both of which are specified in pgdog.toml:
[control]
endpoint = "https://control-plane-endpoint.cloud.pgdog.dev"
token = "cff57e5c-7c4f-4ca0-b81c-c8ed22cf873d"
The authentication token is generated by the control plane and identifies each PgDog deployment. PgDog nodes which are part of the same deployment should use the same token.
For example, if you're using our Helm chart, you can configure the endpoint and token in values.yaml as follows:
control:
endpoint: https://control-plane-endpoint.cloud.pgdog.dev
token: cff57e5c-7c4f-4ca0-b81c-c8ed22cf873d
Connection flow¶
The connection to the control plane is initiated by PgDog on startup and happens in the background. Upon connecting, PgDog will send its node identifier (randomly generated, or set in the NODE_ID envrionment variable) to register with the control plane, and start uploading telemtry and poll for commands.
Error handling
Since most PgDog functions (including sharding) are configuration-driven, the control plane connection is not required for PgDog to start and serve queries.
If any error is encounted while communicating with the control plane, PgDog will continue operating normally, while attempting to reconnect periodically.
This architecture makes the communication link more resilient to unreliable network conditions.
Telemetry¶
PgDog transmits the following information to the control plane:
| Telemetry | Description |
|---|---|
| Metrics | The same metrics as exposed by the Prometheus endpoint (and the admin database), are trasmitted at a much higher frequency, to allow for real-time monitoring. |
| Active queries | Queries that are currently executing through each PgDog node. |
| Query statistics | Real-time statistics on each query executed through PgDog, like duration, idle-in-transaction time, and more. |
| Errors | Recent errors encountered by clients, e.g. query syntax issues. |
| Query plans | Output of EXPLAIN for slow and sampled queries, collected by PgDog in the background. |
| Configuration | Current PgDog settings and database schema. |
High availability¶
The control plane itself is backed by a PostgreSQL database, used for storing historical metrics, query statistics, configuration, and other metadata.
This allows multiple instances of the control plane to be deployed in a high-avaibility setup, since all actions are syncrhonized by PostgreSQL transactions and locks.