跳到主要内容

Metrics

RKE2 provides metrics for monitoring the health and performance of the cluster.

Individual components provide most metrics. See the following component-specific documentation for more information:

Other components may provide additional metrics. Consult the upstream project documentation for any components not listed above.

Supervisor Metrics

When you start RKE2 with supervisor-metrics: true, the RKE2 supervisor exposes metrics. You can access these metrics through the /metrics endpoint on each node at port 9345:

kubectl get --server https://NODENAME:9345 --raw /metrics

Metrics exposed by the RKE2 supervisor process include:

RKE2 Cluster Management Metrics

rke2_certificate_expiration_seconds

Remaining lifetime in seconds of the certificate, labeled by certificate subject and usages.

  • Type: Gauge
  • Labels: subject usage

rke2_loadbalancer_server_connections

Count of current connections to the loadbalancer server, labeled by loadbalancer name and server address.

  • Type: Gauge
  • Labels: name server

rke2_loadbalancer_server_health

Current health state of loadbalancer backend servers, labeled by loadbalancer name and server address.

State is enum of 0=INVALID, 1=FAILED, 2=STANDBY, 3=UNCHECKED, 4=RECOVERING, 5=HEALTHY, 6=PREFERRED, 7=ACTIVE.

  • Type: Gauge
  • Labels: name server

rke2_loadbalancer_dial_duration_seconds

Time in seconds taken to dial a connection to a backend server, labeled by loadbalancer name and success/failure status.

  • Type: Histogram
  • Labels: name status

rke2_etcd_snapshot_save_duration_seconds

Total time in seconds taken to complete the etcd snapshot process, labeled by success/failure status.

  • Type: Histrogram
  • Labels: status

rke2_etcd_snapshot_save_local_duration_seconds

Total time in seconds taken to save a local snapshot file, labeled by success/failure status.

  • Type: Histrogram
  • Labels: status

rke2_etcd_snapshot_save_s3_duration_seconds

Total time in seconds taken to upload a snapshot file to S3, labeled by success/failure status.

  • Type: Histrogram
  • Labels: status

rke2_etcd_snapshot_reconcile_duration_seconds

Total time in seconds taken to sync the list of etcd snapshots, labeled by success/failure status.

  • Type: Histrogram
  • Labels: status

rke2_etcd_snapshot_reconcile_local_duration_seconds

Total time in seconds taken to list local snapshot files, labeled by success/failure status.

  • Type: Histrogram
  • Labels: status

rke2_etcd_snapshot_reconcile_s3_duration_seconds

Total time in seconds taken to list S3 snapshot files, labeled by success/failure status.

  • Type: Histrogram
  • Labels: status