Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

Finding a Common Language for Incidents with John Allspaw

Finding a Common Language for Incidents with John Allspaw

FromScreaming in the Cloud


Finding a Common Language for Incidents with John Allspaw

FromScreaming in the Cloud

ratings:
Length:
32 minutes
Released:
Aug 17, 2021
Format:
Podcast episode

Description

About JohnJohn Allspaw has worked in software systems engineering and operations for over twenty years in many different environments. John’s publications include the books The Art of Capacity Planning (2009) and Web Operations (2010) as well as the forward to “The DevOps Handbook.”  His 2009 Velocity talk with Paul Hammond, “10+ Deploys Per Day: Dev and Ops Cooperation” helped start the DevOps movement.John served as CTO at Etsy, and holds an MSc in Human Factors and Systems Safety from Lund UniversityLinks:

The Art of Capacity Planning: https://www.amazon.com/Art-Capacity-Planning-Scaling-Resources/dp/1491939206/


Web Operations: https://www.amazon.com/Web-Operations-Keeping-Data-Time/dp/1449377440/


The DevOps Handbook: https://www.amazon.com/DevOps-Handbook-World-Class-Reliability-Organizations/dp/1942788002/

Adaptive Capacity Labs: https://www.adaptivecapacitylabs.com

John Allspaw Twitter: https://twitter.com/allspaw

Richard Cook Twitter: https://twitter.com/ri_cook

Dave Woods Twitter: https://twitter.com/ddwoods2

TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at the Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part my Cribl Logstream. Cirbl Logstream is an observability pipeline that lets you collect, reduce, transform, and route machine data from anywhere, to anywhere. Simple right? As a nice bonus it not only helps you improve visibility into what the hell is going on, but also helps you save money almost by accident. Kind of like not putting a whole bunch of vowels and other letters that would be easier to spell in a company name. To learn more visit: cribl.io Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. I’m joined this week by John Allspaw, who’s—well, he’s done a lot of things. He was one of the founders of the DevOps movement—although I’m sure someone’s going to argue with that—he’s also written a couple of books, The Art of Capacity Planning and Web Operations and the foreword of The DevOps Handbook. But he’s also been the CTO at Etsy and has gotten his Master’s in Human Factors and System Safety from Lund University before it was the cool thing to do. And these days, he is the founder and principal at Adaptive Capacity Labs. John, thanks for joining me.Corey: And now for something completely different!John: Thanks for having me. I’m excited to talk with you, Corey.Corey: So, let’s start at the beginning here. So, what is Adaptive Capacity Labs? It sounds like an experiment in auto-scaling, as is every use of auto-scaling, but that’s neither here nor there. I’m guessing it goes deeper.John: Yeah. So, I managed to trick, or let’s say convince some of my heroes, Dr. Richard Cook and Dr. David Woods, these folks are what you would call heavies in the human factors, system safety, and resilience engineering world, Dave Woods is credited with creating the field of resilience engineering. And so what we’ve been doing for the past—since I left Etsy is bringing perspectives, techniques, approaches to the software world that are, I guess, some of the most progressive practices that saved other safety, critical domains, like aviation, and power plants, and all of the stuff that makes news.And the way we’ve been doing that is largely through the lens of incidents. And so we do a whole bunch of different things, but that’s the core of what we do is activities and projects for clients that have a concern around incidents; both, are we learning well? Can you tell us that? Or can you tell us how to understand incidents and analyze them in such a way that we can learn from them effectively?Corey: Generally speaking, my naive guess, based upon the times I spent working in various operations
Released:
Aug 17, 2021
Format:
Podcast episode

Titles in the series (100)

Screaming in the Cloud with Corey Quinn features conversations with domain experts in the world of Cloud Computing. Topics discussed include AWS, GCP, Azure, Oracle Cloud, and the "why" behind how businesses are coming to think about the Cloud.