Ensure Node Auto-Repair is enabled for GKE nodes

LOW

Description

Description:

Nodes in a degraded state are an unknown quantity and so may pose a security risk.

Rationale:

Kubernetes Engine's node auto-repair feature helps you keep the nodes in your cluster in a healthy, running state. When enabled, Kubernetes Engine makes periodic checks on the health state of each node in your cluster. If a node fails consecutive health checks over an extended time period, Kubernetes Engine initiates a repair process for that node.

If multiple nodes require repair, Kubernetes Engine might repair them in parallel. Kubernetes Engine limits number of repairs depending on the size of the cluster (bigger clusters have a higher limit) and the number of broken nodes in the cluster (limit decreases if many nodes are broken).

Node auto-repair is not available on Alpha Clusters.

Remediation

Using Google Cloud Console

  1. Go to Kubernetes Engine by visiting https://console.cloud.google.com/kubernetes/list
  2. Select Kubernetes clusters for which node auto-repair is disabled
  3. Click on the name of the Node pool that requires node auto-repair to be enabled
  4. Within the Node pool details pane click EDIT
  5. Under the 'Management' heading, ensure the 'Enable Auto-repair' box is checked.
  6. Click SAVE.

Using Command Line

To enable node auto-repair for an existing cluster with Node pool, run the following command:

gcloud container node-pools update $POOL_NAME --cluster $CLUSTER_NAME --zone $COMPUTE_ZONE --enable-autorepair