Advanced Options and Configuration
This section contains advanced information describing the different ways you can run and manage RKE2.
Certificate Rotation
By default, certificates in RKE2 expire in 12 months.
If the certificates are expired or have fewer than 90 days remaining before they expire, the certificates are rotated when RKE2 is restarted.
As of v1.21.8+rke2r1, certificates can also be rotated manually. To do this, it is best to stop the rke2-server process, rotate the certificates, then start the process up again:
systemctl stop rke2-server
rke2 certificate rotate
systemctl start rke2-server
To renew agent certificates, restart rke2-agent in agent nodes. Agent certificates are renewed every time the agent starts.
systemctl restart rke2-agent
It is also possible to rotate an individual service by passing the --service
flag, for example: rke2 certificate rotate --service api-server
. See Certificate Management for more details.
Auto-Deploying Manifests
Any file found in /var/lib/rancher/rke2/server/manifests
will automatically be deployed to Kubernetes in a manner similar to kubectl apply
.
For information about deploying Helm charts using the manifests directory, refer to the section about Helm.
Configuring containerd
RKE2 will generate the config.toml
for containerd in /var/lib/rancher/rke2/agent/etc/containerd/config.toml
.
For advanced customization of this file you can create another file called config.toml.tmpl
in the same directory and it will be used instead.
The config.toml.tmpl
will be treated as a Go template file, and the config.Node
structure is being passed to the template. See this template for an example of how to use the structure to customize the configuration file.
Configuring an HTTP proxy
If you are running RKE2 in an environment, which only has external connectivity through an HTTP proxy, you can configure your proxy settings on the RKE2 systemd service. These proxy settings will then be used in RKE2 and passed down to the embedded containerd and kubelet.
Add the necessary HTTP_PROXY
, HTTPS_PROXY
and NO_PROXY
variables to the environment file of your systemd service, usually:
/etc/default/rke2-server
/etc/default/rke2-agent
RKE2 will automatically add the cluster internal Pod and Service IP ranges and cluster DNS domain to the list of NO_PROXY
entries. You should ensure that the IP address ranges used by the Kubernetes nodes themselves (i.e. the public and private IPs of the nodes) are included in the NO_PROXY
list, or that the nodes can be reached through the proxy.
HTTP_PROXY=http://your-proxy.example.com:8888
HTTPS_PROXY=http://your-proxy.example.com:8888
NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
If you want to configure the proxy settings for containerd without affecting RKE2 and the Kubelet, you can prefix the variables with CONTAINERD_
:
CONTAINERD_HTTP_PROXY=http://your-proxy.example.com:8888
CONTAINERD_HTTPS_PROXY=http://your-proxy.example.com:8888
CONTAINERD_NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
Node Labels and Taints
RKE2 agents can be configured with the options node-label
and node-taint
which adds a label and taint to the kubelet. The two options only add labels and/or taints at registration time, and can only be added once and not removed after that through rke2 commands.
If you want to change node labels and taints after node registration you should use kubectl
. Refer to the official Kubernetes documentation for details on how to add taints and node labels.
How Agent Node Registration Works
Agent nodes are registered via a websocket connection initiated by the rke2 agent
process, and the connection is maintained by a client-side load balancer running as part of the agent process.
Agents register with the server using the cluster secret portion of the join token, along with a randomly generated node-specific password, which is stored on the agent at /etc/rancher/node/password
. The server will store the passwords for individual nodes as Kubernetes secrets, and any subsequent attempts must use the same password. Node password secrets are stored in the kube-system
namespace with names using the template <host>.node-password.rke2
. These secrets are deleted when the corresponding Kubernetes node is deleted.
Note: Prior to RKE2 v1.20.2 servers stored passwords on disk at /var/lib/rancher/rke2/server/cred/node-passwd
.
If the /etc/rancher/node
directory of an agent is removed, the password file should be recreated for the agent prior to startup, or the entry removed from the server or Kubernetes cluster (depending on the RKE2 version).
Starting the Server with the Installation Script
The installation script provides units for systemd, but does not enable or start the service by default.
When running with systemd, logs will be created in /var/log/syslog
and viewed using journalctl -u rke2-server
or journalctl -u rke2-agent
.
An example of installing with the install script:
curl -sfL https://get.rke2.io | sh -
systemctl enable rke2-server
systemctl start rke2-server
Disabling Server Charts
The server charts bundled with rke2
deployed during cluster bootstrapping can be disabled and replaced with alternatives. A common use case is replacing the bundled rke2-ingress-nginx
chart with an alternative.
To disable any of the bundled system charts, set the disable
parameter in the config file before bootstrapping. An example of disabling all available system charts is:
# /etc/rancher/rke2/config.yaml
disable:
- rke2-coredns
- rke2-ingress-nginx
- rke2-metrics-server
- rke2-snapshot-controller
- rke2-snapshot-controller-crd
- rke2-snapshot-validation-webhook
It is the cluster operator's responsibility to ensure that components are disabled or replaced with care, as the server charts play important roles in cluster operability. Refer to the architecture overview for more information on the individual system charts role within the cluster.
Installation on classified AWS regions or networks with custom AWS API endpoints
In public AWS regions, to ensure RKE2 is cloud-enabled, and capable of auto-provisioning certain cloud resources, config RKE2 with:
# /etc/rancher/rke2/config.yaml
cloud-provider-name: aws
When installing RKE2 on classified regions (such as SC2S or C2S), there are a few additional pre-requisites to be aware of to ensure RKE2 knows how and where to securely communicate with the appropriate AWS endpoints:
-
Ensure all the common AWS cloud-provider prerequisites are met. These are independent of regions and are always required.
-
Ensure RKE2 knows where to send API requests for
ec2
andelasticloadbalancing
services by creating acloud.conf
file, the below is an example for theus-iso-east-1
(C2S) region:
# /etc/rancher/rke2/cloud.conf
[Global]
[ServiceOverride "ec2"]
Service=ec2
Region=us-iso-east-1
URL=https://ec2.us-iso-east-1.c2s.ic.gov
SigningRegion=us-iso-east-1
[ServiceOverride "elasticloadbalancing"]
Service=elasticloadbalancing
Region=us-iso-east-1
URL=https://elasticloadbalancing.us-iso-east-1.c2s.ic.gov
SigningRegion=us-iso-east-1
Alternatively, if you are using private AWS endpoints, ensure the appropriate URL
is used for each of the private endpoints.
- Ensure the appropriate AWS CA bundle is loaded into the system's root ca trust store. This may already be done for you depending on the AMI you are using.
# on CentOS/RHEL 7/8
cp <ca.pem> /etc/pki/ca-trust/source/anchors/
update-ca-trust
- Configure RKE2 to use the
aws
cloud-provider with the customcloud.conf
created in step 1:
# /etc/rancher/rke2/config.yaml
...
cloud-provider-name: aws
cloud-provider-config: "/etc/rancher/rke2/cloud.conf"
...
-
Install RKE2 normally (most likely in an airgapped capacity)
-
Validate successful installation by confirming the existence of AWS metadata on cluster node labels with
kubectl get nodes --show-labels
Control Plane Component Resource Requests/Limits
The following options are available under the server
sub-command for RKE2. The options allow for specifying CPU requests and limits for the control plane components within RKE2.
--control-plane-resource-requests value (components) Control Plane resource requests [$RKE2_CONTROL_PLANE_RESOURCE_REQUESTS]
--control-plane-resource-limits value (components) Control Plane resource limits [$RKE2_CONTROL_PLANE_RESOURCE_LIMITS]
Values are a comma-delimited list of [controlplane-component]-(cpu|memory)=[desired-value]
. The possible values for controlplane-component
are:
kube-apiserver
kube-scheduler
kube-controller-manager
kube-proxy
etcd
cloud-controller-manager
Thus, an example config may value may look like:
# /etc/rancher/rke2/config.yaml
control-plane-resource-requests:
- kube-apiserver-cpu=500m
- kube-apiserver-memory=512M
- kube-scheduler-cpu=250m
- kube-scheduler-memory=512M
- etcd-cpu=1000m
The unit values for CPU/memory are identical to Kubernetes resource units (See: Resource Limits in Kubernetes)
Extra Control Plane Component Volume Mounts
The following options are available under the server
sub-command for RKE2. These options specify host-path mounting of directories from the node filesystem into the static pod component that corresponds to the prefixed name.
Flag | ENV VAR |
---|---|
--kube-apiserver-extra-mount | RKE2_KUBE_APISERVER_EXTRA_MOUNT |
--kube-scheduler-extra-mount | RKE2_KUBE_SCHEDULER_EXTRA_MOUNT |
--kube-controller-manager-extra-mount | RKE2_KUBE_CONTROLLER_MANAGER_EXTRA_MOUNT |
--kube-proxy-extra-mount | RKE2_KUBE_PROXY_EXTRA_MOUNT |
--etcd-extra-mount | RKE2_ETCD_EXTRA_MOUNT |
--cloud-controller-manager-extra-mount | RKE2_CLOUD_CONTROLLER_MANAGER_EXTRA_MOUNT |
RW Host Path Volume Mount
/source/volume/path/on/host:/destination/volume/path/in/staticpod
RO Host Path Volume Mount
In order to mount a volume as read only, append :ro
to the end of the volume mount.
/source/volume/path/on/host:/destination/volume/path/in/staticpod:ro
Multiple volume mounts can be specified for the same component by passing the flag values as an array in the config file.
Prior to April 2024 releases (v1.27.13+rke2r1, v1.28.9+rke2r1, v1.29.4+rke2r1), only directories can be mounted.
# /etc/rancher/rke2/config.yaml
kube-apiserver-extra-mount:
- "/tmp/foo:/root/foo"
- "/tmp/bar.txt:/etc/bar.txt:ro"
Extra Control Plane Component Environment Variables
The following options are available under the server
sub-command for RKE2. These options specify additional environment variables in standard format i.e. KEY=VALUE
for the static pod component that corresponds to the prefixed name.
Flag | ENV VAR |
---|---|
--kube-apiserver-extra-env | RKE2_KUBE_APISERVER_EXTRA_ENV |
--kube-scheduler-extra-env | RKE2_KUBE_SCHEDULER_EXTRA_ENV |
--kube-controller-manager-extra-env | RKE2_KUBE_CONTROLLER_MANAGER_EXTRA_ENV |
--kube-proxy-extra-env | RKE2_KUBE_PROXY_EXTRA_ENV |
--etcd-extra-env | RKE2_ETCD_EXTRA_ENV |
--cloud-controller-manager-extra-env | RKE2_CLOUD_CONTROLLER_MANAGER_EXTRA_ENV |
Multiple environment variables can be specified for the same component by passing the flag values as an array in the config file.
# /etc/rancher/rke2/config.yaml
kube-apiserver-extra-env:
- "MY_FOO=FOO"
- "MY_BAR=BAR"
kube-scheduler-extra-env: "TZ=America/Los_Angeles"
Deploy NVIDIA operator
The NVIDIA operator allows administrators of Kubernetes clusters to manage GPUs just like CPUs. It includes everything needed for pods to be able to operate GPUs.
Depending on the underlying OS, some steps need to be fulfilled
- SLES
- Ubuntu
- RHEL
The NVIDIA operator cannot automatically install kernel drivers on SLES. NVIDIA drivers must be manually installed on all GPU nodes before deploying the operator in the cluster.
The SLES repositories have the nvidia-open packages, which include the open gpu kernel drivers. If your GPU supports the open drivers, you can check that in this list, you can install the drivers by executing:
sudo zypper install -y nvidia-open
If the GPU does not support gpu kernel drivers, you will need to download the proprietary drivers from an NVIDIA repo. In that case, follow these steps:
sudo zypper addrepo --refresh 'https://developer.download.nvidia.com/compute/cuda/repos/sles15/x86_64/' NVIDIA
sudo zypper --gpg-auto-import-keys refresh
sudo zypper install -y –-auto-agree-with-licenses nvidia-gl-G06 nvidia-video-G06 nvidia-compute-utils-G06
Then reboot.
If everything worked correctly, after the reboot, you should see the NVRM and GCC version of the driver when executing the command:
cat /proc/driver/nvidia/version
Finally, create the symlink:
sudo ln -s /sbin/ldconfig /sbin/ldconfig.real
The NVIDIA operator can automatically install kernel drivers on Ubuntu using the nvidia-driver-daemonset
, although not all versions are supported. You can also pre-install them manually and the operator will detect them:
sudo apt install nvidia-driver-550-server
Then reboot.
If everything worked correctly, after the reboot, you should see a correct output when executing the command:
cat /proc/driver/nvidia/version
The NVIDIA operator can automatically install kernel drivers on RHEL using the nvidia-driver-daemonset
. You would only need to create the symlink:
sudo ln -s /sbin/ldconfig /sbin/ldconfig.real
Once the OS is ready and RKE2 is running, install the GPU Operator with the following yaml manifest:
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: gpu-operator
namespace: kube-system
spec:
repo: https://helm.ngc.nvidia.com/nvidia
chart: gpu-operator
targetNamespace: gpu-operator
createNamespace: true
valuesContent: |-
toolkit:
env:
- name: CONTAINERD_SOCKET
value: /run/k3s/containerd/containerd.sock
The NVIDIA operator restarts containerd with a hangup call which restarts RKE2
After one minute approximately, you can make the following checks to verify that everything worked as expected:
1 - Check if the operator detected the driver and GPU correctly:
kubectl get node $NODENAME -o jsonpath='{.metadata.labels}' | jq | grep "nvidia.com"
You should see labels specifying driver and GPU (e.g. nvidia.com/gpu.machine or nvidia.com/cuda.driver.major)
2 - Check if the gpu was added (by nvidia-device-plugin-daemonset) as an allocatable resource in the node:
kubectl get node $NODENAME -o jsonpath='{.status.allocatable}' | jq
You should see "nvidia.com/gpu":
followed by the number of gpus in the node
3 - Check that the container runtime binary was installed by the operator (in particular by the nvidia-container-toolkit-daemonset
):
ls /usr/local/nvidia/toolkit/nvidia-container-runtime
4 - Verify if containerd config was updated to include the nvidia container runtime:
grep nvidia /etc/containerd/config.toml
5 - Run a pod to verify that the GPU resource can successfully be scheduled on a pod and the pod can detect it
apiVersion: v1
kind: Pod
metadata:
name: nbody-gpu-benchmark
namespace: default
spec:
restartPolicy: OnFailure
runtimeClassName: nvidia
containers:
- name: cuda-container
image: nvcr.io/nvidia/k8s/cuda-sample:nbody
args: ["nbody", "-gpu", "-benchmark"]
resources:
limits:
nvidia.com/gpu: 1
env:
- name: NVIDIA_VISIBLE_DEVICES
value: all
- name: NVIDIA_DRIVER_CAPABILITIES
value: all