Metrics
RKE2 provides metrics for monitoring the health and performance of the cluster.
Individual components provide most metrics. See the following component-specific documentation for more information:
Other components may provide additional metrics. Consult the upstream project documentation for any components not listed above.
Supervisor Metrics
When you start RKE2 with supervisor-metrics: true
, the RKE2 supervisor exposes metrics. You can access these metrics through the /metrics
endpoint on each node at port 9345
:
kubectl get --server https://NODENAME:9345 --raw /metrics
Metrics exposed by the RKE2 supervisor process include:
- RKE2 Cluster Management Metrics
- Lasso controller metrics
- Kubernetes client and workqueue metrics
- Go runtime metrics
- If the RKE2 embedded registry is enabled, Spegel metrics and libp2p metrics
RKE2 Cluster Management Metrics
rke2_certificate_expiration_seconds
Remaining lifetime in seconds of the certificate, labeled by certificate subject and usages.
- Type: Gauge
- Labels: subject usage
rke2_loadbalancer_server_connections
Count of current connections to the loadbalancer server, labeled by loadbalancer name and server address.
- Type: Gauge
- Labels: name server
rke2_loadbalancer_server_health
Current health state of loadbalancer backend servers, labeled by loadbalancer name and server address.
State is enum of 0=INVALID, 1=FAILED, 2=STANDBY, 3=UNCHECKED, 4=RECOVERING, 5=HEALTHY, 6=PREFERRED, 7=ACTIVE.
- Type: Gauge
- Labels: name server
rke2_loadbalancer_dial_duration_seconds
Time in seconds taken to dial a connection to a backend server, labeled by loadbalancer name and success/failure status.
- Type: Histogram
- Labels: name status
rke2_etcd_snapshot_save_duration_seconds
Total time in seconds taken to complete the etcd snapshot process, labeled by success/failure status.
- Type: Histrogram
- Labels: status
rke2_etcd_snapshot_save_local_duration_seconds
Total time in seconds taken to save a local snapshot file, labeled by success/failure status.
- Type: Histrogram
- Labels: status
rke2_etcd_snapshot_save_s3_duration_seconds
Total time in seconds taken to upload a snapshot file to S3, labeled by success/failure status.
- Type: Histrogram
- Labels: status
rke2_etcd_snapshot_reconcile_duration_seconds
Total time in seconds taken to sync the list of etcd snapshots, labeled by success/failure status.
- Type: Histrogram
- Labels: status
rke2_etcd_snapshot_reconcile_local_duration_seconds
Total time in seconds taken to list local snapshot files, labeled by success/failure status.
- Type: Histrogram
- Labels: status
rke2_etcd_snapshot_reconcile_s3_duration_seconds
Total time in seconds taken to list S3 snapshot files, labeled by success/failure status.
- Type: Histrogram
- Labels: status