Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

Reflections on Incidents & Resilience with Nick Rockwell #42

Reflections on Incidents & Resilience with Nick Rockwell #42

FromThe Engineering Leadership Podcast


Reflections on Incidents & Resilience with Nick Rockwell #42

FromThe Engineering Leadership Podcast

ratings:
Length:
42 minutes
Released:
May 25, 2021
Format:
Podcast episode

Description

Nick Rockwell, SVP of Engineering & Infrastructure @ Fastly shares his recent reflections on incidents, resiliency, blamelessness, and accountability. You’ll hear why the heroic model of incident response is unsustainable, how to improve reliability by closing the long-feedback loop, plus opportunities to maximize post-mortems for process improvement AND emotional processing."We started doing a biweekly meeting. We talk about resilience. We revisit everything that has not been closed, whether it's a year old, or it's a day old, , we're forced to keep coming back to it. So how to move away from that incident based post-mortem to something that's more like a continual revisiting of every thread or pathway that's been opened until they're not even open anymore. So that's the lines I'm thinking along." ABOUT NICK ROCKWELLNick Rockwell is SVP of Engineering & Infrastructure @ Fastly helping build the next-generation edge infrastructure for a faster, safer, more resilient Internet. Nick was formerly Chief Technology Officer at The New York Times, overseeing product engineering, infrastructure and R&D. Previously he was Chief Technology Officer of Conde Nast, and Digital CTO at MTV Networks. Throughout his career, Nick has worked at the intersection of media and the Internet, building digital products at scale. Nick graduated from Yale in 1990 with a B.A in Literary Theory. SHOWNOTESNick’s story of why incidents, resiliency, accountability & blamelessness are top of mind (2:20)The “heroic model” of incident mitigation and it’s emotional impact (6:41)Building a resilient system & transitioning away from heroics to a more mechanistic incident management model (12:12)“The long feedback loop” of incidents (15:57)Grappling with the risks of a more process-driven, mechanistic model of incident management (21:27)Dedicated vs. distributed incident response teams & how incident management evolves over time (24:43)Balancing individual accountability and a culture of blamelessness (28:37)Why you need to talk about incidents and process their residual emotions (33:12)On maximizing post-mortems for process improvement & emotional processing (37:01)Takeaways (40:15) Special thanks to our exclusive accessibility partner Mesmer! Mesmer's AI-bots automate mobile app accessibility testing to ensure your app is always accessible to everybody.To jump-start, your accessibility and inclusion initiative, visit mesmerhq.com/ELC
Released:
May 25, 2021
Format:
Podcast episode

Titles in the series (100)

We share the most critical perspectives, habits & examples of great software engineering leaders to help evolve leadership in the tech industry. Join our community of software engineering leaders @ www.sfelc.com!