Managing Multiple EKS Clusters with Terraform
Image from Wikipedia

Managing Multiple EKS Clusters with Terraform

One cluster is easy, as the provider configuration is static, doesn't change. Once you start adding multiple clusters, your Terraform implementation becomes less DRY and messy. What I'm going to propose is better and makes your code a lot more flexible. What you can't do, is iterate over multiple providers. Why?

Why Iterating Over Providers Doesn't Work

As of 2024, providers can be instantiated with an alias. This allows you to have multiple static providers configured and pass the one you need into your module. The one key point here is that they are static, as they need to outlive any of the resources they create. Meaning, dynamic providers is dangerous in the sense that it can cause resources to dangle out in space on their own. Here's what the alias looks like

provider "aws" {
  alias  = "infrawest"
  region = "us-west-2"
}        

The alias is a string, but when you are ready to use this provider with alias, Terraform expects a variable reference. The variable must contain the instance of the provider.

module "example" {
  source    = "./example"
  providers = {
    aws = aws.infrawest
  }
}        

This makes it impossible to iterate, can't do it. To add onto this issue, the module itself requires allowed configuration aliases. It is also expecting a list of variables.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 2.7.0"
      configuration_aliases = [ aws.infrawest ]
    }
  }
}        

So we can't iterate over providers and it is designed intentionally this way and doesn't appear to be changing any time soon. So what can we do?

Iterating Over Providers without Iterating Over Providers

I went down this rabbit hole because I wanted to stamp out EKS clusters for each VPC that was created with custom configurations and network setups. Anyhow, not important. Just know that I wanted to control the size of my VPC configuration and EKS clusters through a global configuration and iterate over it.

There are a few requirements to the approach I'm going to suggest:

  1. There must be two separate module runs. One called "infra" and one called "app", but you can call them anything you want. "INFRA" is VPC/EKS cluster setup and "APP" is everything else that requires interaction with the Kubernetes Control Plane API.
  2. The APP module must instantiate a child module where all the work is done. This keeps the root APP module clean.
  3. There must be a separate providers.tf file in the root APP module.
  4. The APP module must be able to read remote state of the INFRA module as a data source.

Seems simple, and it is because the main.tf and providers.tf in the root module will not be static, but auto-generated from the "INFRA" module.

Auto Generating the APP Module Files

There are some up sides to this approach as it introduces a Separation of Concern between the infrastructure automation that provides the control plane services and the automation that interacts with the control plane services. I think you know where I'm going with this approach.


INFRA main.tf

Call to sub-module aws-eks

# Create EKS Cluster
# TODO Additional validation here to check for public subnet count
#      if public_endpoint_access is true.
# TODO Create Security group for access from each VPC cidr if
#      private endpoint is specified.
module "aws-eks" {
  source       = "../../modules/aws-eks"
  depends_on   = [module.aws-networking]
  for_each     = module.globals.env_eks_config[var.environment].vpcs
  project_name = module.globals.env_global_config.project_name
  environment  = var.environment
  vpc_name     = each.key
  vpc_id       = module.aws-networking.vpcs[each.key].vpc_id
  ....
}        

Sub-module: aws-eks outputs.tf

output "eks_cluster_name" {
  description = "EKS Cluster Name"
  value       = aws_eks_cluster.eks_cluster.name
}

output "eks_vpc_name" {
  description = "VPC Name"
  value       = var.vpc_name
}

output "eks_arn" {
  description = "EKS Cluster ARN"
  value       = aws_eks_cluster.eks_cluster.arn
}

output "eks_id" {
  description = "EKS Cluster ID"
  value       = aws_eks_cluster.eks_cluster.id
}

output "eks_endpoint" {
  description = "EKS Cluster Endpoint"
  value       = aws_eks_cluster.eks_cluster.endpoint
}

output "eks_certificate_authority_data" {
  description = "EKS Cluster Certificate Authority Data"
  value       = aws_eks_cluster.eks_cluster.certificate_authority.0.data
}

output "eks_oidc_issuer" {
  description = "EKS Cluster OIDC Issuer"
  value       = aws_eks_cluster.eks_cluster.identity.0.oidc.0.issuer
}

output "eks_cluster_auth" {
  description = "EKS Cluster Auth"
  value       = data.aws_eks_cluster_auth.default
}        

The INFRA module iterates over my eks config and calls a child module called aws-eks that creates the cluster, node group, permissions, OIDC, and access config. The below config for EKS could have N number of EKS clusters but it corresponds with the VPC name "egress" which can be referenced in another config map for VPC configuration. This is just some background on how I'm iterating over the configs, not important to the approach, you can iterate however you like.

EKS Config

env_eks_config = {
    test = {
      vpcs = {
        "egress" = {
          authentication_mode     = "API_AND_CONFIG_MAP"
          private_endpoint_access = false
          public_endpoint_access  = true
          disk_size               = 20
          instance_types          = ["t3.small"]
          ami_type                = "AL2_x86_64"
          capacity_type           = "ON_DEMAND"
          scaling_config = {
            min_size     = 1
            max_size     = 3
            desired_size = 2
          }
          log_types             = ["api", "audit"]
          log_retention_in_days = 3
          access_configuration = [
            {
              principal_arn = "current_user"
              policy_arn    = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
              access_scope = {
                type = "cluster"
              }
            }
          ]
        }
      }
    }
  }        

INFRA main.tf

Back to the main.tf in our INFRA root module, we now have a set of Kubernetes clusters available for our automation to interact with. The problem is that we need a provider config for each, and in the INFRA main.tf we iterate over these clusters and dynamically create the main.tf and provider.tf for our APP root module.

# Generate Provider File for Kubernetes and Helm
resource "local_file" "provider_file" {
  content  = <<EOT
%{for cluster in module.aws-eks~}
provider "kubernetes" {
  host = "${cluster.eks_endpoint}"
  alias = "${cluster.eks_vpc_name}"
  cluster_ca_certificate = <<CONTENT
${base64decode(cluster.eks_certificate_authority_data)}
CONTENT
  token = "${cluster.eks_cluster_auth.token}"
}
provider "helm" {
  alias = "${cluster.eks_vpc_name}"
  kubernetes {
    host = "${cluster.eks_endpoint}"
    cluster_ca_certificate = <<CONTENT
${base64decode(cluster.eks_certificate_authority_data)}
CONTENT
    token = "${cluster.eks_cluster_auth.token}"
  }
}
%{endfor~}
provider "aws" {
  region  = "${var.region}"
}
EOT
  filename = "${path.cwd}/../${var.environment}-app/provider.tf"
}

# Generate main.tf File for 
resource "local_file" "main_file" {
  content = <<EOT
terraform {
  required_providers {
    kubernetes = {
      source = "hashicorp/kubernetes"
      version = "~> 2.0"
    }
    helm = {
      source = "hashicorp/helm"
      version = "~> 2.12"
    }
    aws = {
      source = "hashicorp/aws"
      version = "~> 5.0"
    }
    tls = {
      source = "hashicorp/tls"
      version = "~> 3.0"
    }
  }
  backend "s3" {
    encrypt = true
    bucket = "tf-dev-state-environment"
    dynamodb_table = "tf-dev-state-locking"
    key = "test-app-terraform.tfstate"
    region = "us-east-2"
    assume_role = {
      role_arn = "arn:aws:iam::533267038789:role/tf-dev-state"
    }
  }
}

%{for cluster in module.aws-eks~}
# Standup App for ${cluster.eks_vpc_name}
module "${cluster.eks_vpc_name}-app" {
  source      = "../../modules/env-app"
  environment = "${var.environment}"
  region      = "${var.region}"
  vpc_name    = "${cluster.eks_vpc_name}"
  providers = {
    kubernetes = kubernetes.${cluster.eks_vpc_name}
    helm       = helm.${cluster.eks_vpc_name}
  }
}

%{endfor~}
EOT
  filename = "${path.cwd}/../${var.environment}-app/main.tf"
}        

This generates the main.tf and provider.tf for APP.

main.tf

terraform {
  required_providers {
    kubernetes  = {
      source    = "hashicorp/kubernetes"
      version   = "~> 2.0"
    }
    helm = {
      source    = "hashicorp/helm"
      version   = "~> 2.12"
    }
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    tls = {
      source  = "hashicorp/tls"
      version = "~> 3.0"
    }
  }
  backend "s3" {
    encrypt        = true
    bucket         = "tf-dev-state-environment"
    dynamodb_table = "tf-dev-state-locking"
    key            = "test-app-terraform.tfstate"
    region         = "us-east-2"
    assume_role = {
      role_arn = "arn:aws:iam::533267038789:role/tf-dev-state"
    }
  }
}

# Standup App for egress
module "egress-app" {
  source      = "../../modules/env-app"
  environment = "test"
  region      = "us-east-2"
  vpc_name    = "egress"
  providers = {
    kubernetes = kubernetes.egress
    helm       = helm.egress
  }
}        

provider.tf

provider "kubernetes" {
  host                   = "https://6A3B0F2954D..."
  alias                  = "egress"
  cluster_ca_certificate = <<CONTENT
-----BEGIN CERTIFICATE-----
MIIDBTCCAe2gAwIBAgIIAdjeuiJdrq8wDQYJKoZIhvcNAQELBQAwFTETMBEGA1UE
AxMKa3ViZXJuZXRlczAeFw0yNDAyMTEwNDMwMDFaFw0zNDAyMDgwNDM1MDFaMBUx
EzARBgNVBAMTCmt1YmVybmV0ZXMwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEK
AoIBAQC3F4eJ++woVqMUxSZ59MjzxAKZicyz/jgYxhh9pIRhjkOWaFKGZNXC22Yd
PguUEEwYMwQ6/CJq7k2eoOSnkzqRlcucetMpy0jQdbootqG2OWJ5ZnZOqblInjzA
Y/sytYzy4t/DCX9CHEdtV83P1oeOnZvyJg4W7XKtYMccWB7G4bLRc7KHjYq+q83K
xQYJqb8aqT1xt1l7+aYlKMK0iH6Y6jxO49+hxyLGAh5apzXrZda9G/9EC9IlifHf
...
-----END CERTIFICATE-----

CONTENT
  token                  = "k8s-aws-v1.aHR0cHM6Ly9zdHMudXMtZWF..."
}
provider "helm" {
  alias                  = "egress"
  kubernetes {
    host                 = "https://6A3B0F2954DEF86..."
    cluster_ca_certificate = <<CONTENT
-----BEGIN CERTIFICATE-----
MIIDBTCCAe2gAwIBAgIIAdjeuiJdrq8wDQYJKoZIhvcNAQELBQAwFTETMBEGA1UE
AxMKa3ViZXJuZXRlczAeFw0yNDAyMTEwNDMwMDFaFw0zNDAyMDgwNDM1MDFaMBUx
EzARBgNVBAMTCmt1YmVybmV0ZXMwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEK
...
-----END CERTIFICATE-----

CONTENT
    token                = "k8s-aws-v1.aHR0cHM6Ly9zdHMudXMtZWFzdC0yLmFtYXpvbmF3cy5jb20vP0FjdGlvbj1HZXRDYWxsZXJJZGVudGl0eSZWZXJzaW9uPTIwMTEtMDYtMTUmWC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBWFlLSlJJSkNRRk5XQ0FLQ..."
  }
}
provider "aws" {
  region  = "us-east-2"
}        

I've removed a lot of the certificate and token information for obvious reasons but one thing to keep in mind is that these credentials do expire, so you will need to run the INFRA module again to generate the provider.tf file. You could separate this into its own module so that you can just target the credential generation and file generation without touching any other parts of the INFRA module. Well, hope this helps!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics