Discover this podcast and so much more

Podcasts are free to enjoy without a subscription. We also offer ebooks, audiobooks, and so much more for just $11.99/month.

The Kinesis Outage

The Kinesis Outage

FromAWS Morning Brief


The Kinesis Outage

FromAWS Morning Brief

ratings:
Length:
28 minutes
Released:
Dec 11, 2020
Format:
Podcast episode

Description

Links
Follow Last Week In AWS on Twitter

AWS Outage Message

"Kinesis Outage" by Ryan Frantz
TranscriptCorey: This episode is sponsored in part by our friends at Linode. You might be familiar with Linode; they’ve been around for almost 20 years. They offer Cloud in a way that makes sense rather than a way that is actively ridiculous by trying to throw everything at a wall and see what sticks. Their pricing winds up being a lot more transparent—not to mention lower—their performance kicks the crap out of most other things in this space, and—my personal favorite—whenever you call them for support, you’ll get a human who’s empowered to fix whatever it is that’s giving you trouble. Visit linode.com/screaminginthecloud to learn more, and get $100 in credit to kick the tires. That’s linode.com/screaminginthecloud.Pete: Hello, everyone. Welcome to the AWS Morning Brief. It's Pete Cheslock again—Jesse: And Jesse DeRose.Pete: We are back to talk about ‘The Kinesis Outage.’Jesse: [singing] bom bom bum.Pete: So, at this point, as you're listening to this, it's been a couple of weeks since the Kinesis outage has happened, and I'm sure there are many, many armchair sysadmins out there speculating at all the reasons why Amazon should not have had this outage. And guess what? You have two more system administrators here to armchair quarterback this as well.Jesse: We are happy to discuss what happened, why it happened. I will try to put on my best announcer voice, but I think I normally fall more into the golf announcer voice than the football announcer voice, so I'm not really sure if that's going to play as well into our story here.Pete: It's going, it's going, it's gone.Jesse: It’s—and it's just down. It's down—Pete: It's just—Jesse: —and it's gone.Pete: No, but seriously, we're not critiquing it. That is not the purpose of this talk today. We're not critiquing the outage because you should never critique other people's outages; never throw shade at another person's outage. That's not only crazy to do because you have no context into their world. It's just, it's not nice either, so just try to be nice out there.Jesse: Yeah, nobody wants to get critiqued when their company has an outage and when they're under pressure to fix something. So, we're not here to do that. We don't want to point any fingers. We're not blaming anyone. We just want to talk about what happened because honestly, it's a fascinating, complex conversation.Pete: It is so fascinating and honestly, loved the detail, a far cry from the early years of Amazon outages that were just, “We had a small percentage of instances have some issues.” This was very detailed. This gave out a lot of information. And the other thing too is that, when it comes to critiquing outages, you have to imagine that there are unlikely to be more than a handful of people even inside Amazon Web Services that fully understand the scope of the size and the interactions of all these different services. There may not even be a single person who truly understands how these dozens of services interact with each other. I mean, it takes teams and teams of people working together to build these things and to have these understandings. So, that being said, let's dive in. So, the Wednesday before Thanksgiving, Kinesis decided to take off early. You know, long weekend coming up, right? But really, what happened was is that there was an addition of capacity to Kinesis, and it caused it to hit an operating system limit causing an outage. But interestingly enough—and what we'll talk about today—are the interesting and downstream effects that occurred via CloudWatch, Cognito, even the status page, and the Personal Health Dashboard. I mean, that's a really interesting contributing factor or a correlating outage. I don't know the words here, but it's interesting to hear that both CloudWatch goes down and the Personal Health Dashboard goes down.Jesse: That's when somebody from the product side says, “Oh, that's a featur
Released:
Dec 11, 2020
Format:
Podcast episode

Titles in the series (100)

The latest in AWS news, sprinkled with snark. Posts about AWS come out over sixty times a day. We filter through it all to find the hidden gems, the community contributions--the stuff worth hearing about! Then we summarize it with snark and share it with you--minus the nonsense.