You are viewing content from a past/completed QCon - March 2021. Check out our upcoming events.

Building Reliability One Step at a Time

Building reliability in our organizations allows us to continue giving our customers a great user experience when they need it most. Many companies measure their reliability via nines of availability or system uptime percentage. Site Reliability Engineering has taught us some of the practices we can adopt to reach our availability goals but we have to keep in mind that reaching your organization’s availability goals will be a learning experience and we have to be proactive about responding to failure. In this talk, Ana will share how she has been using Chaos Engineering since 2016 to learn more about the systems she worked on and how this practice can be used to decouple our system’s weak points, learn from incidents and improve monitoring and observability.


Ana Margarita Medina

Senior Software Engineer @Gremlin

Ana Margarita is currently working as a Senior Chaos Engineer at Gremlin, helping companies avoid outages by running proactive chaos engineering experiments. Before Gremlin, she has worked at various-sized companies including Google, Uber, SFEFCU, and Miami-based startup. Ana is an internationally recognized speaker and has spoken at: AWS re:Invent, KubeCon, DockerCon, DevOpDays, AllDayDevOps, Write/Speak/Code, and many others. Catch her tweeting at @Ana_M_Medina about traveling, diversity in tech, and mental health.

Find Ana Margarita Medina at:


Learn more about the organizations that joined us on this journey