Building Reliability One Step at a Time
Building reliability in our organizations allows us to continue giving our customers a great user experience when they need it most. Many companies measure their reliability via nines of availability or system uptime percentage. Site Reliability Engineering has taught us some of the practices we can adopt to reach our availability goals but we have to keep in mind that reaching your organization’s availability goals will be a learning experience and we have to be proactive about responding to failure. In this talk, Ana will share how she has been using Chaos Engineering since 2016 to learn more about the systems she worked on and how this practice can be used to decouple our system’s weak points, learn from incidents and improve monitoring and observability.
Speaker
Ana Margarita Medina
Senior Software Engineer @Gremlin
Ana Margarita is currently working as a Senior Chaos Engineer at Gremlin, helping companies avoid outages by running proactive chaos engineering experiments. Before Gremlin, she has worked at various-sized companies including Google, Uber, SFEFCU, and Miami-based startup. Ana is an internationally recognized speaker and has spoken at: AWS re:Invent, KubeCon, DockerCon, DevOpDays, AllDayDevOps, Write/Speak/Code, and many others. Catch her tweeting at @Ana_M_Medina about traveling, diversity in tech, and mental health.
Sponsors
Learn more about the organizations that joined us on this journey

LaunchDarkly is a feature management platform that empowers all teams to safely deliver and control software through feature flags.

Moogsoft is the AI-driven observability leader that provides intelligent monitoring solutions for smart DevOps. Moogsoft delivers the most advanced cloud-native, self-service platform for software engineers, developers, and operators to instantly see everything, know what’s wrong, and fix things faster.