Continuous Lifecycle 2014 » Agenda »
// Postmortems and Human Factors
Our daily work takes place in a myriad of systems. They are comprised of software, hardware and humans. And everybody who has worked with complex systems at any scale knows: Failure is not an option, it's inevitable.
At Etsy we are embracing the fact that failures happen and that the only way to understand how the accident happened is to investigate it without blaming the humans involved. This is why we have a blameless postmortem for every outage that occurs. It is an open meeting and everybody is invited to join and find out what happened and how we can make the system safer.
This talk will explain how postmortems at Etsy are conducted and how we maintain and scale the process as the team grows and new people start. It will go over the tools we built and utilize to make postmortems efficient and also share the learnings from each one with all the people in the company.
// Referent
//
Daniel Schauenberg
@mrtazz
is a Senior Software Engineer at Etsy's infrastructure and development tools team. Automation, documentation and simplicity are his usual tools for improving the status quo. He previously worked in systems and network admninistration, on connecting chemical plants to IT systems and as an embedded systems networking engineer. Things he thoroughly enjoys when not writing code include coffee, breakfast, tv shows and basketball.