Sandeep

AKS with Terraform

AKS with Terraform

How to create a Kubernetes cluster with Azure Kubernetes Service - AKS using Terraform.

AKS - Azure-Kubernetes-Service


Azure Kubernetes Service (AKS) manages your hosted Kubernetes environment. AKS allows you to deploy and manage containerized applications without container orchestration expertise. AKS also enables you to do many common maintenance operations without taking your app offline. These operations include provisioning, upgrading, and scaling resources on demand.

AKS - Terraform


Terraform provides following Azure Providers to provision infrastructure in Azure Public Cloud.

We will be using azurerm and azuread providers for creating AKS.

AKS - Terraform Prerequisites


  • Azure subscription: Create a free account which to have Azure subscription.
  • Terraform: Install Terraform latest version
  • Azure service principal: Follow the directions in the Create the service principal section. Create an Azure service principal with Azure CLI. Take note of the values for the appId, displayName, password, and tenant. If not, then I will show you how to create service principal using Terraform.

Terraform AKS


Terraforming AKS doesn’t involve lot of moving components like AWS-EKS.

Make sure have terraform installed and Azure service principal details ready.

Terraform Variables used must be put in variables.tf.

#1 - Terraform Environment Variables


terraform.tfvars must be used in case, common or secret variables are getting injected as environment variables.

terraform.tfvarsbase_tags = {
  Tier        = "Internal"
  CostCentre  = "xyzxyz"
  Compliance  = "no"
  Owner       = "azure-sub"
  Escalation  = "azure-subm@me.com"
}
svc_prpl_pwd = ""
ARM_ENVIRONMENT     = "public"
ARM_CLIENT_ID       = ""
ARM_SUBSCRIPTION_ID = ""
ARM_TENANT_ID       = ""

#2 - Terraform Providers


Terraform Providers are responsible for understanding API interactions with given providers resources. They can create any resource, if proper credentials for an account in public cloud is given.

For AKS, we will need 4 providers to run our terraform code successfully.

terraform providers- azurerm
- azuread
- local
- tls

Definition of providers in terraform is shown below. In Azure, with proper permissions, we can get all the 4 variables needed to initiliase AKS azurerm providers terraform code.

You can pin providers to sepecific versions in terraform as shown below.

providers.tfprovider "azurerm" {
  version         = "=2.14.0"
  subscription_id = var.subscription_id
  client_id       = var.client_id
  client_secret   = var.client_secret
  tenant_id       = var.tenant_id
  features {}
}

provider "azuread" {
  version = "=0.10.0"
}

provider "local" {
  version = "~> 1.4"
}

provider "tls" {
  version = "~> 2.1" 
}

#2 - Create AD Service Principal


We can create Azure Service Principal and a password with an expiry date. Password can be stored in same repo encrypted with sops or can be also retrive from Azure Vault.

azure-ad.tfresource "azuread_application" "aks_app" {
  name = "aks_rbac"
}

resource "azuread_service_principal" "aks_svc_prnpl"{
  application_id                = azuread_application.aks_app.application_id
  app_role_assignment_required  = false
  tags                          = ["aks", "azure", "team-1"]
}

resource "azuread_service_principal_password" "aks_svc_prnpl_pwd" {
  service_principal_id  = azuread_service_principal.aks_svc_prnpl.id
  value                 = var.svc_prpl_pwd
  end_date              = "2099-01-01T01:02:03Z"
  description           = "My managed password"
}

#3 - Create Resource Group


We will start creating AKS with creating a Resource Group first. All our resources created will be under this Resource Group.

rg.tfresource "azurerm_resource_group" "k8s" {
  name     = var.resource_group_name
  location = var.location
  tags     = local.tags
}

#4 - Create SSH-Key (Optional)


This part is optional. If you want to ssh worker nodes, its better to create a key-pair. Private key can be stored in same repo encrypted using a key with sops or can be put in Azure Vault.

module.tfvariable "public_ssh_key" {
  description = "A custom ssh key to control access to the AKS cluster"
  default     = ""
}

module "ssh-key" {
  source         = "./modules/ssh-key"
  public_ssh_key = var.public_ssh_key == "" ? "" : var.public_ssh_key
}

Module ssh-key looks like as below.

./modules/ssh-key/main.tfvariable "public_ssh_key" {
  description = "An ssh key set in the main variables of the terraform-azurerm-aks module"
  default     = ""
}

resource "tls_private_key" "ssh" {
  algorithm = "RSA"
  rsa_bits  = 4096
}

resource "local_file" "private_key" {
  count    = var.public_ssh_key == "" ? 1 : 0
  content  = tls_private_key.ssh.private_key_pem
  filename = "./aks_private_ssh_key"
}

output "public_ssh_key" {
  # Only output a generated ssh public key
  value = var.public_ssh_key != "" ? "" : tls_private_key.ssh.public_key_openssh
}

#5 - Create AKS


azurerm_kubernetes_cluster is the main resource which manages and creates Azure AKS.

Below azurerm_kubernetes_cluster resource has many arguement blocks and arguements. Lets explore them, one by one.

arguements can be Required or Optional. The names are quite self explanatory here.

arguements AKS  name                      = var.cluster_name
  location                  = azurerm_resource_group.k8s.location
  resource_group_name       = azurerm_resource_group.k8s.name
  dns_prefix                = var.dns_prefix
  kubernetes_version        = "1.16.9"
  private_cluster_enabled   = false
  tags                      = local.tags
  sku_tier                  = "Free"

service_principal block contains Service Principal applicaition id and the secret as show below. Values are coming from the resources created in Step-2.

service_principal block service_principal {
      client_id     = azuread_service_principal.aks_svc_prnpl.application_id
      client_secret = azuread_service_principal_password.aks_svc_prnpl_pwd.value
    }

default_node_pool block contains the worker-nodes details like total node counts, min/max node counts, vm size, disk size, tags, taints on nodes, whether nodes have public ip’s etc.

We are using below arguements for worker nodes.

default_node_pool block default_node_pool {
      name                  = "default"
      enable_node_public_ip = false
      enable_auto_scaling   = false
      node_count            = 2
      min_count             = 2
      max_count             = 6
      vm_size               = var.agent_size
      type                  = "VirtualMachineScaleSets"
      os_disk_size_gb       = 50
      node_taints           = ["vm=OnDemand:NoSchedule"]
      tags                  = local.tags
      node_labels           = {
          Tier        = "internal"
          Team        = "team-1"
          Type        = "OnDemand"
      }
    }

auto_scaler_profile block lets us know about cluster autoscaler needs with in the AKS cluster. If you have already work with cluster-autoscaler, all the arguements in block looks quite similar.

auto_scaler_profile block auto_scaler_profile {
    balance_similar_node_groups       = true
    max_graceful_termination_sec      = 300
    scale_down_delay_after_add        = "10m"
    scale_down_delay_after_delete     = "10s"
    scan_interval                     = "10s"
    scale_down_delay_after_failure    = "3m"
    scale_down_unneeded               = "10m"
    scale_down_unready                = "20m"
    scale_down_utilization_threshold  = 0.5
  }

addon_profile block can create a azure policy, application routing, kubernetes dashboards and most importantly can enable azure monitoring for the cluster.

addon_profile block dynamic addon_profile {
    for_each = var.enable_log_analytics_workspace ? ["log_analytics"] : []
    content {
      oms_agent {
        enabled                    = true
        log_analytics_workspace_id = azurerm_log_analytics_workspace.main[0].id
      }
    }
  }

linux_profile block contains admin username for cluster and the secret key to login inside vm. ssh-key is oming from the module that we created in Step-4.

linux_profile block linux_profile {
    admin_username = var.admin_username
    ssh_key {
      key_data = replace(var.public_ssh_key == "" ? module.ssh-key.public_ssh_key : var.public_ssh_key, "\n", "")
    }
  }

timeouts block allows you to specify timeouts for certain actions

timeouts arguements- create - (Defaults to 90 minutes) Used when creating the Kubernetes Cluster.
- update - (Defaults to 90 minutes) Used when updating the Kubernetes Cluster.
- read   - (Defaults to 5 minutes)  Used when retrieving the Kubernetes Cluster.
- delete - (Defaults to 90 minutes) Used when deleting the Kubernetes Cluster.
timeouts block timeouts {
    create = "2h"
    delete = "2h"
    update = "2h"
    read   = "5m"
  }

Similarly there are other blocks in azurerm_kubernetes_cluster as mentioned below. We can add them as per cluster requirements.

other blocksazure_active_directory{}
azure_policy{}
http_application_routing{}
kube_dashboard{}
oms_agent{}
network_profile{}
role_based_access_control{}
api_server_authorized_ip_ranges = ""
identity {}

Combining all the above arguements and blocks, azurerm_kubernetes_cluster looks like as below.

aks.tfresource "azurerm_kubernetes_cluster" "k8s" {
  name                      = var.cluster_name
  location                  = azurerm_resource_group.k8s.location
  resource_group_name       = azurerm_resource_group.k8s.name
  dns_prefix                = var.dns_prefix
  kubernetes_version        = "1.16.9"
  private_cluster_enabled   = false
  tags                      = local.tags
  sku_tier                  = "Free"

  default_node_pool {
      name                  = "default"
      enable_node_public_ip = false
      enable_auto_scaling   = false
      node_count            = 2
      min_count             = 2
      max_count             = 6
      vm_size               = var.agent_size
      type                  = "VirtualMachineScaleSets"
      os_disk_size_gb       = 50
      node_taints           = ["vm=OnDemand:NoSchedule"]
      tags                  = local.tags
      node_labels           = {
          Tier        = "internal"
          Team        = "team-1"
          Type        = "OnDemand"
      }
    }

  service_principal {
      client_id     = azuread_service_principal.aks_svc_prnpl.application_id
      client_secret = azuread_service_principal_password.aks_svc_prnpl_pwd.value
    }

  linux_profile {
    admin_username = var.admin_username
    ssh_key {
      key_data = replace(var.public_ssh_key == "" ? module.ssh-key.public_ssh_key : var.public_ssh_key, "\n", "")
    }
  }

  dynamic addon_profile {
    for_each = var.enable_log_analytics_workspace ? ["log_analytics"] : []
    content {
      oms_agent {
        enabled                    = true
        log_analytics_workspace_id = azurerm_log_analytics_workspace.main[0].id
      }
    }
  }

  auto_scaler_profile {
    balance_similar_node_groups       = true
    max_graceful_termination_sec      = 300
    scale_down_delay_after_add        = "10m"
    scale_down_delay_after_delete     = "10s"
    scan_interval                     = "10s"
    scale_down_delay_after_failure    = "3m"
    scale_down_unneeded               = "10m"
    scale_down_unready                = "20m"
    scale_down_utilization_threshold  = 0.5
  }

  timeouts {
    create = "2h"
    delete = "2h"
    update = "2h"
    read   = "5m"
  }

}

#6 - AKS Logs


azurerm_log_analytics_workspace manages a Log Analytics (formally Operational Insights) Workspace. azurerm_log_analytics_solution manages a Log Analytics (formally Operational Insights) Solution.

aks-logs.tfresource "random_id" "log_analytics_workspace_name_suffix" {
    byte_length = 8
}

resource "azurerm_log_analytics_workspace" "main" {
  count               = var.enable_log_analytics_workspace ? 1 : 0
  name                = join("-", [var.log_analytics_workspace_name, 
                                  random_id.log_analytics_workspace_name_suffix.dec, 
                                  "workspace"])
  location            = azurerm_resource_group.k8s.location
  resource_group_name = azurerm_resource_group.k8s.name
  sku                 = var.log_analytics_workspace_sku
  retention_in_days   = var.log_retention_in_days
  tags = local.tags
}

resource "azurerm_log_analytics_solution" "main" {
  count                 = var.enable_log_analytics_workspace ? 1 : 0
  solution_name         = "ContainerInsights"
  location              = azurerm_resource_group.k8s.location
  resource_group_name   = azurerm_resource_group.k8s.name
  workspace_resource_id = azurerm_log_analytics_workspace.main[0].id
  workspace_name        = azurerm_log_analytics_workspace.main[0].name

  plan {
    publisher = "Microsoft"
    product   = "OMSGallery/ContainerInsights"
    promotion_code = ""
  }
}

#7 - AKS Create


Run terraform with below commands to create AKS.

aks createterraform init
terraform validate
terraform plan
terraform apply

References