Chaos Engine Full Logo

Chaos Engine is an application for creating random Chaos Events in cloud applications to test resiliency. It follows the Principles of Chaos to create random faults (experiments) that could reasonably occur in a real application deployment.

Chaos Engine makes intelligent decisions in how and when to create experiments. When properly configured, experiments can be restricted to occur only during normal business hours (i.e., no pager alerts).

Chaos Engine currently supports Amazon Web Services, Pivotal Cloud Foundry, and Kubernetes. We have future plans to add support for Google Cloud Platform.

Warning

Running chaos experiments in a non-resilient system can result in significant faults. We highly recommend you use a graduated approach to chaos implementation, and build confidence in your development and staging environments before attempting the same in your production environment.

Supported Experiments

Google Cloud Platform

Google Compute Engine

Experiment Target Description
Simulate Maintenance Event All Instances Performs a simulated maintenance event, to either migrate the VM to another host, or delete and recreate it on another host
Stop Instance Non-Grouped Instances Stops selected instance
Restart Instance Non-Grouped Instances Restarts selected instance
Remove Firewall Tags Non-Grouped Instances Removes all firewall tags from an instance
Recreate Instance in Group Grouped Instances Triggers an instance group to delete and recreate an instance

Amazon Web Services

EC2 Instances

Experiment Target Description
Stop Instance All EC2 Instances Stops selected instance
Restart Instance All EC2 Instances Restarts selected instance
Remove Security Groups All EC2 Instances Removes all assigned security groups
Instance Termination EC2 Instance in ASG only Terminates an instance when it is running in ASG
Deploy Shell Experiment EC2 Instance in ASG only Deploys an experiment from shell experiment suite described below

RDS Instances

Experiment Target Description
Restart Instance All RDS Instances Restarts selected instance
Remove Security Groups All RDS Instances Removes all assigned security groups
Take Snapshot All RDS Instances Takes a snapshot of the DB
Restart Subset of Nodes RDS Cluster only Randomly selects set of nodes and restarts them
Initiate failover RDS Cluster only Initialize failover between nodes

Kubernetes

Experiment Target Description
Delete POD POD Deletes randomly selected pod
Deploy Shell Experiment Container Deploys an experiment from shell experiment suite described below

Pivotal Cloud Foundry

Experiment Target Description
Rescale Application Application Rescales an application to random number of instances
Restage Application Application Redeploys an application
Restart Application Application Restarts all application containers
Restart Instance Container Restarts selected container
Deploy Shell Experiment Container Deploys an experiment from shell experiment suite described below

Shell Experiments Suite

Experiment Target Description
BurnIO EC2, PCF or Kubernetes resource supporting remote command execution Utilize system disk to maximum
CPU Burn EC2, PCF or Kubernetes resource supporting RCE Simulates high CPU usage on all available processing units
DNS Block EC2, PCF or Kubernetes resource supporting RCE Removes all DNS servers from system configuration
Fill Disk EC2, PCF or Kubernetes resource supporting RCE Creates large file on the system root partition
Fork Bomb EC2, PCF or Kubernetes resource supporting RCE Runs endless recursion that corrupts system memory
Memory Consumer EC2, PCF or Kubernetes resource supporting RCE Consumes all free memory
Null Route EC2, PCF or Kubernetes resource supporting RCE Adds an IP table rule that will forward traffic from specific subnet to black hole
Random Generator Starvation EC2, PCF or Kubernetes resource supporting RCE Simulates entropy starvation
Process Termination EC2, PCF or Kubernetes resource supporting RCE Terminates random process