Tag incident-management
Lessons learned from modernising a lesser maintained (Spring Boot) service (16 mins read).
What I learned from taking ownership of a lesser maintained service and bringing it up to a better standard.
Post details
my favorite thing about IR is sitting in a call all day bc of a big incident and you cant leave your chair or eat food or go to the bathroom bc if you leave you will miss something and if you miss something you might get blamed for why something burned down so you just sit there
ππ―ππ§lπ (@H3KTlC)Tue, 19 Apr 2022 16:37 +0000
Post details
When you experience your first production outage
Molly Struve π¦ (@molly_struve)Thu, 14 Apr 2022 19:00 +0000
Post details
Hot take: Anyone dunking on Atlassian about the fact that an outage like this could happen should not be trusted near production environments because they're either lying or don't have the experience to know what they're talking about.
Mark Imbriaco (@markimbriaco)Fri, 15 Apr 2022 13:00 +0000
Post details
my magnum opus
Courtney Wang (@CKWang)Wed, 02 Mar 2022 00:27 GMT
Post details
This week weβre joined by Nora Jones, founder and CEO at Jeli where they help teams gain insight and learnings from incidents. Back in December Nora shared here thoughts in a Changelog post titled βIncidentβ shouldnβt be a four-letter word - which got a lot of attention from our readers. Today weβre talking with Nora a...
Use (End-to-End) Tracing or Correlation IDs (4 mins read).
Why you should be requesting, and logging, a unique identifier per request for better supportability.
Post details
I'm starting to see incidents as essential for knowledge sharing. If you're not experiencing any, it then makes sense to periodically introduce controlled incidents to learn about your infrastructure and how it behaves. Note: Hardly an original thought/realisation.Lou βοΈ π¨βπ»ποΈββοΈπΈπ΄π»ββοΈπ (@loujaybee)Mon, 02 Aug 2021 12:41 +0000
This is a great post by Shubheksha and talking about the right way to talk about production issues.
Having a blameless culture makes it easier for new/junior engineers getting started with working on production systems, and makes everyone more comfortable working on things where they know they won't get the blame pointed at them.
I've found that, at work, diagnosing issues in our staging environment has given me such a great experience - it's been great to practice dealing with production-like issues in a non-production environment, as it gives you that time to breath, experiment and learn, as well as giving me much greater understanding of the end-to-end system.
You're currently viewing page 1 of 1, of 16 posts.