Taints, Tolerations and Node Affinity in Kubernetes
Credits: Initial draft feedback: Gowri Satya; Cover image: Kiran Mohan

Taints, Tolerations and Node Affinity in Kubernetes

I am learning Kubernetes and came across a fascinating concept of how pods are scheduled on nodes in Kubernetes. I will explain this in a way that doesn’t require any prior understanding of Kubernetes but for 3 concepts:

  • Pods – This is the smallest execution unit encapsulating one or more containerized applications.
  • Nodes – Virtual machine or a physical machine where the pods are deployed and run.
  • Cluster – Collection of nodes that run containerized applications.

That’s all you need to know about Kubernetes to understand the concept of Taints, Tolerations, and Node Affinity.

A few disclaimers:

  • Kubernetes scheduling is a vast and complex subject. The intent of this article is to explain a few basic concepts of scheduling pods on nodes using the concepts of Taints, Tolerations and Node Affinity. The concept of node selectors is deliberately not covered in this article.
  • The word taint, as we know, in English language has a negative connotation. In Kubernetes, it is a powerful concept [I wonder why they had to choose that word though].
  • The article is long, detailed, and technical in nature. You have been warned.

Let’s get started and understand these concepts with a scenario.

Scenario

We have a Kubernetes cluster where some nodes are provisioned with specialized hardware (GPU, SSD etc.) and some nodes with standard hardware. There are applications that require specialized hardware to run and there are applications for which standard hardware will suffice. Let’s name them for ease of understanding.

SpNode1, SpNode2 and SpNode3 are nodes that have specialized hardware (Sp denotes Specialized).

StNode4, StNode5 are nodes that are provisioned with standard hardware (St denotes Standard).

There are applications encapsulated as pods that will need to be scheduled on these nodes. Let’s name these pods as well.

SpPod1, SpPod2, SpPod3 are the pods that will need nodes with specialized hardware to work.

StPod4, StPod5 are the pods that can run on nodes with standard hardware i.e., these pods do not require specialized hardware nodes to work.

Let’s make the scenario slightly more interesting. The nodes with specialized hardware belong to Team 1. The nodes with standard hardware belong to Team 2. This is a common scenario where teams share a cluster. Let’s assume we are part of Team 1 and hence have access to the nodes/pods belonging to Team 1 and do not have access to nodes/pods belonging to Team 2.

Now that we understood the landscape, the requirement is as follows:

  • Pods that require specialized hardware should be scheduled only on the nodes that are provisioned with specialized hardware i.e., the pods SpPod1, SpPod2 and SpPod3 must only be scheduled on the nodes SpNode1, SpNode2 or SpNode3.
  • Pods that do not require specialized hardware should be scheduled only on the nodes that are provisioned with standard hardware i.e., the pods StPod4 and StPod5 must only be scheduled on the nodes StNode4 or StNode5.

Simple enough and a common requirement, isn’t it? Let’s see if the concept of Taints and Tolerations will help in meeting the requirement.

Solution 1 – Taints and Tolerations

Let’s apply the concept of Taints and Tolerations and see if it meets our requirement. So, what are taints and tolerations?

Tainting a node essentially means setting a property (key=value) on the node. Doing this will ensure the node will not accept any pod that cannot tolerate the taint.

In essence, let’s say, if we taint the node SpNode1 with (specialized=true), the node SpNode1 will only accept the pods that can tolerate the taint (specialized=true). The way to ensure that a pod can tolerate the taint on a node is by setting a toleration level on the pod as well.

Now what does Kubernetes do? Kubernetes scheduler checks if a node is tainted and checks the pods that have the same toleration level set. If they match, the Kubernetes scheduler goes ahead and schedules the pod on the node. If the taint and the toleration do not match, the pod will not be scheduled on the node.

The syntax to set taint on a node using kubectl command is:

kubectl taint nodes <node-name> <key=value>:<taint-effect>         

For this specific example, the command we will run is:

kubectl taint nodes SpNode1 specialized=true:NoSchedule        

[where “SpNode1” is the name of the node, “specialized” is the key and “true” is the value]

Similarly, the way to specify toleration on pods is using a pod specification file [YAML file]. The below is a snippet from the pod specification file.

tolerations:

- key: "specialized"

  operator: "Equal"

  value: "true"

  effect: "NoSchedule"        

It is important to understand the field “effect”. It can have values of “NoSchedule” and “NoExecute”.

The value of “NoSchedule” means that if the taint and toleration values do not match, the pod will not be scheduled on the node. And for some reason, after the pod is scheduled on the node, if there is a mismatch between taint and toleration values [due to change in the toleration value of pod], the pod will continue to run on the same node.

The value of “NoExecute” means that the pod will be evicted from the node at any time if the taint and toleration values go out of sync.

Now that we understood what taints and tolerations are and how to apply them on nodes and pods respectively, let’s get back to our requirement. Let’s go ahead and apply taints and tolerations as follows (I am skipping the actual syntax for sake of simplicity):

  • Apply taint on SpNode1 [specialized = true]
  • Apply taint on SpNode2 [specialized = true]
  • Apply taint on SpNode3 [specialized = true]
  • Apply toleration on SpPod1 [specialized = true]
  • Apply toleration on SpPod2 [specialized = true]
  • Apply toleration on SpPod3 [specialized = true]

We applied taints and tolerations on nodes and pods that belong only to Team 1 as we do not have access to nodes and pods belonging to Team 2.

Will the above solution meet the requirement?

No.

Because, there is a chance that the Kubernetes scheduler might schedule SpPod1 on either StNode4 or StNode5. Can you guess why? Because StNode4 or StNode5 do not have any taints. An un-tainted node will accept any pod irrespective of whether a toleration value is set.

So, while it’s a good start, this solution doesn’t guarantee that our requirement will be met.

Important points to remember about Taints and Tolerations:

  • Taints are set on Nodes.
  • Tolerations are set on Pods.
  • Tainted nodes will only accept the pods that have similar toleration set.
  • A pod (with or without a particular toleration value) may be scheduled on an un-tainted node.

In essence, Taints on nodes will repel the pods away if the toleration doesn’t match the taint. However, nodes that do not have any taints will accept any pod (with or without toleration set on them).

Solution 2 – Node Affinity

Let’s apply the concept of Node Affinity and see if it meets our requirement. So, what is Node Affinity?

Node affinity is a characteristic of pod that attracts them to nodes. So how do we specify node affinity? It’s done in two steps.

In Step 1, we label the nodes by setting a property in key=value format. In Step 2, we specify node affinity property (in the similar key=value format) on the pod in the pod specification YAML file.

In essence, let’s say, if we label the node SpNode1 with (specialized=true) and specify node affinity property on the pod SpPod1 (specialized=true), then the pod SpPod1 will be attracted to the node SpNode1 as the label matches.

Kubernetes scheduler checks if the label on the node and the value specified in the nodeAffinity property in the pod specification file match. If they do, the pod is scheduled on the node. If not, the pod is not scheduled on the node.

The syntax to set labels on nodes is as follows:

kubectl label nodes <node name> <key=value>         

For our example, it will be

kubectl label nodes SpNode1 specialized=true         

The syntax to specify node affinity property in the pod specification YAML file is as follows:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: specialized
            operator: In
            values:
            - true        

Note: Only the relevant snippet of the pod specification file is shown here.

It is important to understand the nodeAffinity types. There are two nodeAffinity types:

  • requiredDuringSchedulingIgnoredDuringExecution – specifying this value will mandate that the Kubernetes scheduler will schedule the pod on a labeled node only if the label on the node matches with the value specified in the node affinity on the pod specification.
  • preferredDuringSchedulingIgnoredDuringExecution – specifying this value will inform Kubernetes scheduler that the preference is to find a node with matching label value.

The "IgnoredDuringExecution" part of the names means that, if labels on a node change at runtime such that the affinity rules on a pod are no longer met, the pod continues to run on the node. In a future version of Kubernetes, there might be a new node affinity type introduced viz. “requiredDuringSchedulingRequiredDuringExecution” which is similar to “requiredDuringSchedulingIgnoredDuringExecution” except that Kubernetes will evict pods from nodes that cease to satisfy the pods' node affinity requirements.

Now that we understood what labels and node affinity is and how to apply them on nodes and pods respectively, let’s get back to our requirement. Let’s go ahead and label the nodes and specify node affinity values on the pod specification files as follows (I am skipping the actual syntax for sake of simplicity):

  • Label node SpNode1 [specialized = true]
  • Label node SpNode2 [specialized = true]
  • Label node SpNode3 [specialized = true]
  • Specify affinity on SpPod1 [specialized = true; requiredDuringSchedulingIgnoredDuringExecution]
  • Specify affinity on SpPod2 [specialized = true; requiredDuringSchedulingIgnoredDuringExecution]
  • Specify affinity on SpPod3 [specialized = true; requiredDuringSchedulingIgnoredDuringExecution]

Like in the previous solution of taints and tolerations, we labeled the nodes and specified affinity on pods that belong only to Team 1 as we do not have access to nodes and pods belonging to Team 2.

Will the above solution meet the requirement?

No.

Because, there is a chance that StPod4 or StPod5 could end up being scheduled on one of SpNode1 or SpNode2 or SpNode3 nodes. Can you guess why? Node affinity can only guarantee that the Pod on which affinity is specified will be scheduled on the node that has the matching label set. It doesn’t stop other pods that do not have the affinity specified to be scheduled on the labeled node. And since the pods StPod4 and StPod5 do not have any affinity set, they might get scheduled on one of specialized nodes which is not our requirement.

So, this solution too doesn’t guarantee that our requirement will be met.

Important points to remember about Node Affinity:

  • Nodes are labeled.
  • Affinity is a property on a pod specified in the pod specification/manifest file.
  • Pods that have an affinity specified will be scheduled on the nodes that are labeled with the same value.
  • A pod that does not have affinity specified might get scheduled on any nodes irrespective of whether the nodes are labeled.

In essence, node affinity is a property on a pod that attracts it to a labeled node with the same value.  However, pods that do not have any affinity specified might get scheduled on any nodes irrespective of whether the nodes are labeled.

Solution 3 – Taints, Tolerations and Node Affinity

Neither solution 1 (taints and tolerations) nor solution 2 (node affinity) could address the requirement fully. Let’s see if a combination of both the concepts will address the requirement.

Step 1 – Apply taints and tolerations on nodes and pods.

  • Apply taint on SpNode1 [specialized = true]
  • Apply taint on SpNode2 [specialized = true]
  • Apply taint on SpNode3 [specialized = true]
  • Apply toleration on SpPod1 [specialized = true]
  • Apply toleration on SpPod2 [specialized = true]
  • Apply toleration on SpPod3 [specialized = true]

Step 2 – Label the nodes and specify node affinity on pods.

  • Label node SpNode1 [specialized = true]
  • Label node SpNode2 [specialized = true]
  • Label node SpNode3 [specialized = true]
  • Specify affinity on SpPod1 [specialized = true; requiredDuringSchedulingIgnoredDuringExecution]
  • Specify affinity on SpPod2 [specialized = true; requiredDuringSchedulingIgnoredDuringExecution]
  • Specify affinity on SpPod3 [specialized = true; requiredDuringSchedulingIgnoredDuringExecution]

The nodes are both tainted and labeled. Toleration levels and node affinity is specified on pods.

Will this solution meet the requirement?

Yes.

The situation that we encountered in Solution-1 (pods SpPod1 getting scheduled on StNode4 or StNode5) is not possible here. Why? The pods SpPod1, SpPod2 and SpPod3 will get scheduled only on SpNode1, SpNode2 and SpNode3 because of the labels and node affinity property.

The situation that we encountered in Solution-2 (pods StPod4 or StPod5 getting scheduled on one of SpNode1 or SpNode2 or SpNode3) is not possible. Why? The pods StPod4 and StPod5 will get scheduled only on StNode4 or StNode5 because the other nodes SpNode1, SpNode2 and SpNode3 are tainted, and they will not accept pods that do not tolerate the taints.

This way, we ensure that the pods that need specialized hardware are only scheduled on specialized nodes and not scheduled on nodes that are running standard hardware. Similarly, the pods that do not need specialized hardware are only scheduled on nodes that are running standard hardware.

Often, one of Taints and Tolerations or Node affinity might be enough to schedule the pods on the nodes of our choice. But if your requirement is complex, consider applying both the concepts.

Conclusion

The concept of how Kubernetes scheduling works is vast and complex. This article touched upon the basic concepts of Taints, Tolerations and Node Affinity with an example. Key important things to remember:

  1. Taints are set on nodes.
  2. Tolerations are specified on pods typically in a pod specification/manifest file.
  3. Pay attention to the taint effect value (NoSchedule / NoExecute).
  4. Nodes are labelled.
  5. Node affinity is a characteristic of a pod that is typically specified in a pod specification/manifest file.
  6. Pay attention to the node affinity types (requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution).
  7. Taints on nodes will repel the pods away if the toleration doesn’t match the taint. However, nodes that do not have any taints will accept any pod (with or without toleration set on them).
  8. Node affinity is opposite of Taints. They attract pods to the labelled nodes if a matching label is found on the nodes.

Credits: Initial draft feedback: Gowri Satya; Cover image : Kiran Mohan

Soumya Janardhan

Associate Director - Organon

3y

Good information explained in simple terms 👍

Vinod Kharade

Global Director, Digital Business Services- Cloud , AI, AD, Gen AI

3y

This is particularly useful when we want to dedicate a set of nodes for exclusive use by a particular set of users, add a toleration to their pods. Then, add a corresponding taint to those nodes. The pods with the tolerations are allowed to use the tainted nodes, or any other nodes in the cluster.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics