Kubernetes Module

The Chaos Engine Kubernetes Module is able to connect to a Kubernetes cluster and interact with deployed PODs.

Supported Versions

Chaos Engine supports minimum Kubernetes version 1.9 and currently supports up to version 1.15. Support is driven by the Kubernetes Client SDK version compatibility.

SDK

The official Kubernetes Java Client is used to interact with the cluster.

Resource https://github.com/kubernetes-client/java
Version 15.0.1
Maven Repositories https://mvnrepository.com/artifact/io.kubernetes/client-java

Configuration

Environment variables that control how the Chaos Engine interacts with Kubernetes.

Key Name Description Default Value Mandatory
kubernetes The presence of this key enables Kubernetes module. N/A Yes
kubernetes.url Kubernetes server API url e.g. None Yes
kubernetes.token JWT token assigned to service account. You can get the value by running kubectl describe secret name_of_your_secret None Yes
kubernetes.namespaces Comma-separated list of namespaces where experiments should be performed default Yes
kubernetes.debug Enables debug log of Kubernetes java client false No
kubernetes.validateSSL Enables validation of sever side certificates false No

Required Kubernetes Cluster Configuration

A service account with a role binding needs to be created in order to access specific API endpoints required for Kubernetes Experiments

Please replace the {{namespace}} fillers with the appropriate values and apply to your cluster.

Experiments on single namespace

chaos-engine-service-account.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: chaos-engine-role
  namespace: {{namespace}}
rules:
- apiGroups:
  - apps
  resources:
  - daemonsets
  - daemonsets/status
  - deployments
  - deployments/status
  - replicasets
  - replicasets/status
  - statefulsets
  - statefulsets/status
  verbs:
  - get
  - list
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - delete


- apiGroups:
  - ""
  resources:
  - pods
  - pods/status
  - replicationcontrollers/status
  verbs:
  - get
  - list

- apiGroups:
  - ""
  resources:
  - pods/exec
  verbs:
  - create
  - get

---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: chaos-engine-serviceaccount
  namespace: {{namespace}}

---

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: chaos-engine-rolebinding
  namespace: {{namespace}}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: chaos-engine-role
subjects:
- kind: ServiceAccount
  name: chaos-engine-serviceaccount
  namespace: {{namespace}}

You can retrieve the token by running kubectl describe secret chaos-engine -n {{namespace}}

Experiments on multiple namespaces

When your experiment targets are located in multiple namespaces, you need to bind roles allowing access to appropriate namespace to your service account. Or you can simply create a cluster role and binding by running below yaml.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: chaos-engine-crole
rules:
- apiGroups:
  - apps
  resources:
  - daemonsets
  - daemonsets/status
  - deployments
  - deployments/status
  - replicasets
  - replicasets/status
  - statefulsets
  - statefulsets/status
  verbs:
  - get
  - list
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - delete


- apiGroups:
  - ""
  resources:
  - pods
  - pods/status
  - replicationcontrollers/status
  verbs:
  - get
  - list

- apiGroups:
  - ""
  resources:
  - pods/exec
  verbs:
  - create
  - get

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: chaos-engine-rolebinding
roleRef:
  kind: ClusterRole
  name: chaos-engine-crole
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: chaos-engine-serviceaccount
  namespace: {{namespace}}

---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: chaos-engine-serviceaccount
  namespace: {{namespace}}

Verify Service Account Setting

Run the following sequence of commands to verify service account permissions.

NAMESPACE={{namespace}}
TOKEN=$(kubectl describe secret chaos-engine -n $NAMESPACE)
SERVER_ENDPOINT=https//example.com
kubectl config set-credentials chaos-engine-token --token="$TOKEN"
kubectl config set-cluster chaos-engine-target-cluster --server=$SERVER_ENDPOINT --insecure-skip-tls-verify
kubectl config set-context --cluster=chaos-engine-target-cluster --namespace=$NAMESPACE chaos-engine-context
kubectl config use-context chaos-engine-context 
kubectl --token="$TOKEN" get pods

If the output of the get pods is similar to what you see below. Your configuration is not valid. Please check role binding from the previous section was done properly.

Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:{{SERVICE_ACCOUNT_NAME}}:{{NAMESPACE}}"
 cannot list resource "pods" in API group "" in the namespace "{{NAMESPACE}}"

Node Discovery

Mechanism

API: listNamespacedPod

The Kubernetes Platform generates list of available containers by calling listNamespacedPod API. Filtering is done according to namespace name provided as a platform configuration parameter. Only one namespace can be targeted right now.

Note

Job and Cron Job controllers are not supported by the Engine. Containers managed by those controllers are considered unhealthy and they are automatically skipped by scheduler.

Self Awareness

Not yet implemented

Experiments

Delete Pod

Mechanism

The Engine invokes deleteNamespacedPod API call with zero graceful period. That leads to immediate POD termination. In theory all containers should be backed by a controller so the deleted container should be replaced by brand new container instance created by the controller.

API: deleteNamespacedPod

Health Check

Experiment is finished when originally targeted container is no more present on the platform and the controller is back to desired number of replicas.

API Check pod: listNamespacedPod

API Controllers:

Self Healing

None

Shell based experiments

Mechanism

Shell base experiments is a suite of shell scripts that are randomly selected and transferred to the targeted Kubernetes container and executed.

API: connectGetNamespacedPodExec

Health Check

Same as Delete Pod experiment, actual replicas count and container existence is checked.

Self Healing

Target container is deleted with no graceful period.

API: listNamespacedPod

List of experiments

See Included Script Experiments for a list of experiments included with the engine.