Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

Whiteboard Confessional: Click Here to Break Production

Whiteboard Confessional: Click Here to Break Production

FromAWS Morning Brief


Whiteboard Confessional: Click Here to Break Production

FromAWS Morning Brief

ratings:
Length:
10 minutes
Released:
May 8, 2020
Format:
Podcast episode

Description

About Corey QuinnOver the course of my career, I’ve worn many different hats in the tech world: systems administrator, systems engineer, director of technical operations, and director of DevOps, to name a few. Today, I’m a cloud economist at The Duckbill Group, the author of the weekly Last Week in AWS newsletter, and the host of two podcasts: Screaming in the Cloud and, you guessed it, AWS Morning Brief, which you’re about to listen to.Links
CHAOSSEARCH
@QuinnyPig
TranscriptCorey: Welcome to AWS Morning Brief: Whiteboard Confessional. I’m Cloud Economist Corey Quinn. This weekly show exposes the semi-polite lie that is whiteboard architecture diagrams. You see, a child can draw a whiteboard architecture, but the real world is a mess. We discuss the hilariously bad decisions that make it into shipping products, the unfortunate hacks the real-world forces us to build, and that the best to call your staging environment is “theory”. Because invariably whatever you’ve built works in the theory, but not in production. Let’s get to it.On this show, I talk an awful lot about architectural patterns that are horrifying. Let’s instead talk for a moment about something that isn’t horrifying. CHAOSSEARCH. Architecturally, they do things right. They provide a log analytics solution that separates out your storage from your compute. The data lives inside of your S3 buckets, and you can access it using APIs you’ve come to know and tolerate, through a series of containers that live next to that S3 storage. Rather than replicating massive clusters that you have to care and feed for yourself, instead, you now get to focus on just storing data, treating it like you normally would other S3 data and not replicating it, storing it on expensive disks in triplicate, and fundamentally not having to deal with the pains of running other log analytics infrastructure. Check them out today at CHAOSSEARCH.io.Today on the AWS Morning Brief: Whiteboard Confessional, I'm telling a different story than I normally do. Specifically, this is the tale of an outage from several weeks ago. The person who shared this story with me has requested to remain anonymous and further wishes me to not mention their company at all. This is, incidentally, a common occurrence. Folks don't generally want to jeopardize their relationship with AWS by disclosing a service issue they see, whereas I don't have that particular self-preservation instinct. Then again, I'm not a big AWS customer myself. I'm not contractually bound to AWS in any meaningful way, and I'm not an AWS partner, nor am I an AWS Hero. So, all that AWS really has over me in terms of leverage is the empty threat of taking away my birthday. So, let's dive into this anonymous story. It's a good one. A company was minding its own business, and then had a severity one incident. For those who aren't familiar with that particular designation, you can think of that as being the company's primary service fell over in an embarrassingly public way. Customers noticed, and everyone runs around screaming a whole lot. Now, if we skip past the delightful hair-on-fire diagnosis work, the behavior that was eventually tracked down was that an SNS topic had a critical listener get unsubscribed. That SNS topic invoked said listener, which in turn drove a critical webhook call via API gateway. This is a bad thing, obviously. Fundamentally, customers stopped receiving webhooks that they were expecting, and this caused a nuclear meltdown given the nature of what the company does, which I can't disclose and isn't particularly relevant anyway. But, for those who are not up to date on the latest AWS terminology, service names, and parlance, what this means at a high level is that a thing happens inside of AWS, and whenever that thing happens, it's supposed to fire off an event that notifies this company's paying customers. This broke because something somewhere unsubscribed the firing off dingus from the notification system. Now that we're aw
Released:
May 8, 2020
Format:
Podcast episode

Titles in the series (100)

The latest in AWS news, sprinkled with snark. Posts about AWS come out over sixty times a day. We filter through it all to find the hidden gems, the community contributions--the stuff worth hearing about! Then we summarize it with snark and share it with you--minus the nonsense.