Norberto Lopes’ Post

View profile for Norberto Lopes, graphic

VP of Engineering at incident.io 🔥

Crowdstrike’s outage has a lot to unpack for anyone deploying software to production. Lawrence Jones was kind enough to join me to dive into their post-incident review and discuss: 📢 Communications When it comes to incident comms this is a perfect storm: a huge audience extending far beyond your direct customer base, in the news all day long, and a public company with the complexities this brings. Critique of Crowdstrike’s comms was fierce but what does good actually look like here? 📦 Parallel deployment processes Crowdstrike’s software is installed as a ‘falcon sensor’ with a well established gradual rollout process, from dogfooding to canarying and more. So what happened here? Well, turns out the sensor is both code and config, and config comes out totally differently. This might sound far removed from your average web app… unless you remember those pesky database migrations. 💆 Handling incidents outside of your control Incidents where the root cause is a third party or due to factors outside your immediate control can be stressful. It’s horrible sitting there, unable to improve things, waiting to find out what comes next. You can do things, though. And it’s important you do, as working through contingencies can get you ahead of the worst case scenarios. If this sounds like content you’d enjoy, check the comments for a link to the podcast. #crowdstrike #incidentresponse #incidentmanagenent #communication

To view or add a comment, sign in

Explore topics