Taints, Tolerations and Node Affinity in Kubernetes
I am learning Kubernetes and came across a fascinating concept of how pods are scheduled on nodes in Kubernetes. I will explain this in a way that doesn’t require any prior understanding of Kubernetes but for 3 concepts:
That’s all you need to know about Kubernetes to understand the concept of Taints, Tolerations, and Node Affinity.
A few disclaimers:
Let’s get started and understand these concepts with a scenario.
Scenario
We have a Kubernetes cluster where some nodes are provisioned with specialized hardware (GPU, SSD etc.) and some nodes with standard hardware. There are applications that require specialized hardware to run and there are applications for which standard hardware will suffice. Let’s name them for ease of understanding.
SpNode1, SpNode2 and SpNode3 are nodes that have specialized hardware (Sp denotes Specialized).
StNode4, StNode5 are nodes that are provisioned with standard hardware (St denotes Standard).
There are applications encapsulated as pods that will need to be scheduled on these nodes. Let’s name these pods as well.
SpPod1, SpPod2, SpPod3 are the pods that will need nodes with specialized hardware to work.
StPod4, StPod5 are the pods that can run on nodes with standard hardware i.e., these pods do not require specialized hardware nodes to work.
Let’s make the scenario slightly more interesting. The nodes with specialized hardware belong to Team 1. The nodes with standard hardware belong to Team 2. This is a common scenario where teams share a cluster. Let’s assume we are part of Team 1 and hence have access to the nodes/pods belonging to Team 1 and do not have access to nodes/pods belonging to Team 2.
Now that we understood the landscape, the requirement is as follows:
Simple enough and a common requirement, isn’t it? Let’s see if the concept of Taints and Tolerations will help in meeting the requirement.
Solution 1 – Taints and Tolerations
Let’s apply the concept of Taints and Tolerations and see if it meets our requirement. So, what are taints and tolerations?
Tainting a node essentially means setting a property (key=value) on the node. Doing this will ensure the node will not accept any pod that cannot tolerate the taint.
In essence, let’s say, if we taint the node SpNode1 with (specialized=true), the node SpNode1 will only accept the pods that can tolerate the taint (specialized=true). The way to ensure that a pod can tolerate the taint on a node is by setting a toleration level on the pod as well.
Now what does Kubernetes do? Kubernetes scheduler checks if a node is tainted and checks the pods that have the same toleration level set. If they match, the Kubernetes scheduler goes ahead and schedules the pod on the node. If the taint and the toleration do not match, the pod will not be scheduled on the node.
The syntax to set taint on a node using kubectl command is:
kubectl taint nodes <node-name> <key=value>:<taint-effect>
For this specific example, the command we will run is:
kubectl taint nodes SpNode1 specialized=true:NoSchedule
[where “SpNode1” is the name of the node, “specialized” is the key and “true” is the value]
Similarly, the way to specify toleration on pods is using a pod specification file [YAML file]. The below is a snippet from the pod specification file.
tolerations:
- key: "specialized"
operator: "Equal"
value: "true"
effect: "NoSchedule"
It is important to understand the field “effect”. It can have values of “NoSchedule” and “NoExecute”.
The value of “NoSchedule” means that if the taint and toleration values do not match, the pod will not be scheduled on the node. And for some reason, after the pod is scheduled on the node, if there is a mismatch between taint and toleration values [due to change in the toleration value of pod], the pod will continue to run on the same node.
The value of “NoExecute” means that the pod will be evicted from the node at any time if the taint and toleration values go out of sync.
Now that we understood what taints and tolerations are and how to apply them on nodes and pods respectively, let’s get back to our requirement. Let’s go ahead and apply taints and tolerations as follows (I am skipping the actual syntax for sake of simplicity):
We applied taints and tolerations on nodes and pods that belong only to Team 1 as we do not have access to nodes and pods belonging to Team 2.
Will the above solution meet the requirement?
No.
Because, there is a chance that the Kubernetes scheduler might schedule SpPod1 on either StNode4 or StNode5. Can you guess why? Because StNode4 or StNode5 do not have any taints. An un-tainted node will accept any pod irrespective of whether a toleration value is set.
So, while it’s a good start, this solution doesn’t guarantee that our requirement will be met.
Important points to remember about Taints and Tolerations:
In essence, Taints on nodes will repel the pods away if the toleration doesn’t match the taint. However, nodes that do not have any taints will accept any pod (with or without toleration set on them).
Recommended by LinkedIn
Solution 2 – Node Affinity
Let’s apply the concept of Node Affinity and see if it meets our requirement. So, what is Node Affinity?
Node affinity is a characteristic of pod that attracts them to nodes. So how do we specify node affinity? It’s done in two steps.
In Step 1, we label the nodes by setting a property in key=value format. In Step 2, we specify node affinity property (in the similar key=value format) on the pod in the pod specification YAML file.
In essence, let’s say, if we label the node SpNode1 with (specialized=true) and specify node affinity property on the pod SpPod1 (specialized=true), then the pod SpPod1 will be attracted to the node SpNode1 as the label matches.
Kubernetes scheduler checks if the label on the node and the value specified in the nodeAffinity property in the pod specification file match. If they do, the pod is scheduled on the node. If not, the pod is not scheduled on the node.
The syntax to set labels on nodes is as follows:
kubectl label nodes <node name> <key=value>
For our example, it will be
kubectl label nodes SpNode1 specialized=true
The syntax to specify node affinity property in the pod specification YAML file is as follows:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: specialized
operator: In
values:
- true
Note: Only the relevant snippet of the pod specification file is shown here.
It is important to understand the nodeAffinity types. There are two nodeAffinity types:
The "IgnoredDuringExecution" part of the names means that, if labels on a node change at runtime such that the affinity rules on a pod are no longer met, the pod continues to run on the node. In a future version of Kubernetes, there might be a new node affinity type introduced viz. “requiredDuringSchedulingRequiredDuringExecution” which is similar to “requiredDuringSchedulingIgnoredDuringExecution” except that Kubernetes will evict pods from nodes that cease to satisfy the pods' node affinity requirements.
Now that we understood what labels and node affinity is and how to apply them on nodes and pods respectively, let’s get back to our requirement. Let’s go ahead and label the nodes and specify node affinity values on the pod specification files as follows (I am skipping the actual syntax for sake of simplicity):
Like in the previous solution of taints and tolerations, we labeled the nodes and specified affinity on pods that belong only to Team 1 as we do not have access to nodes and pods belonging to Team 2.
Will the above solution meet the requirement?
No.
Because, there is a chance that StPod4 or StPod5 could end up being scheduled on one of SpNode1 or SpNode2 or SpNode3 nodes. Can you guess why? Node affinity can only guarantee that the Pod on which affinity is specified will be scheduled on the node that has the matching label set. It doesn’t stop other pods that do not have the affinity specified to be scheduled on the labeled node. And since the pods StPod4 and StPod5 do not have any affinity set, they might get scheduled on one of specialized nodes which is not our requirement.
So, this solution too doesn’t guarantee that our requirement will be met.
Important points to remember about Node Affinity:
In essence, node affinity is a property on a pod that attracts it to a labeled node with the same value. However, pods that do not have any affinity specified might get scheduled on any nodes irrespective of whether the nodes are labeled.
Solution 3 – Taints, Tolerations and Node Affinity
Neither solution 1 (taints and tolerations) nor solution 2 (node affinity) could address the requirement fully. Let’s see if a combination of both the concepts will address the requirement.
Step 1 – Apply taints and tolerations on nodes and pods.
Step 2 – Label the nodes and specify node affinity on pods.
The nodes are both tainted and labeled. Toleration levels and node affinity is specified on pods.
Will this solution meet the requirement?
Yes.
The situation that we encountered in Solution-1 (pods SpPod1 getting scheduled on StNode4 or StNode5) is not possible here. Why? The pods SpPod1, SpPod2 and SpPod3 will get scheduled only on SpNode1, SpNode2 and SpNode3 because of the labels and node affinity property.
The situation that we encountered in Solution-2 (pods StPod4 or StPod5 getting scheduled on one of SpNode1 or SpNode2 or SpNode3) is not possible. Why? The pods StPod4 and StPod5 will get scheduled only on StNode4 or StNode5 because the other nodes SpNode1, SpNode2 and SpNode3 are tainted, and they will not accept pods that do not tolerate the taints.
This way, we ensure that the pods that need specialized hardware are only scheduled on specialized nodes and not scheduled on nodes that are running standard hardware. Similarly, the pods that do not need specialized hardware are only scheduled on nodes that are running standard hardware.
Often, one of Taints and Tolerations or Node affinity might be enough to schedule the pods on the nodes of our choice. But if your requirement is complex, consider applying both the concepts.
Conclusion
The concept of how Kubernetes scheduling works is vast and complex. This article touched upon the basic concepts of Taints, Tolerations and Node Affinity with an example. Key important things to remember:
Credits: Initial draft feedback: Gowri Satya; Cover image : Kiran Mohan
Associate Director - Organon
3yGood information explained in simple terms 👍
Global Director, Digital Business Services- Cloud , AI, AD, Gen AI
3yThis is particularly useful when we want to dedicate a set of nodes for exclusive use by a particular set of users, add a toleration to their pods. Then, add a corresponding taint to those nodes. The pods with the tolerations are allowed to use the tainted nodes, or any other nodes in the cluster.