Building for Resilience is Chaos Engineering

Businesses are increasingly adopting cloud-native deployments as a means to increase developer velocity. The CNCF 2021 annual survey stated, “Kubernetes has crossed the adoption chasm to become a mainstream global technology.” According to CNCF’s respondents, 96% of organizations are either using or evaluating Kubernetes. This rapid adoption of Kubernetes has created significant complexity and revealed the inadequacy of traditional systems testing.

Chaos engineering has emerged as a new testing discipline and means to transform the reliability of cloud-native services. According to Gartner, “40% of organizations will implement chaos engineering practices as part of DevOps initiatives by 2023, reducing unplanned downtime by 20%.” Many organizations considered early adopters of chaos engineering are manually running experiments on a few applications in a pre-production environment. Very few organizations are automating this practice throughout the software delivery lifecycle (SDLC) due to complexity and lack of industry maturity.

In this talk, Matt will discuss how chaos engineering can build tech resilience in software and how developers, QA engineers, and SREs can work together to improve software reliability.