On-call is one of the toughest roles in engineering—it’s where the real challenges surface. It’s not just about responding to pages at 3 a.m.; it’s about untangling complex systems under pressure, diagnosing root causes with limited data, and collaborating across teams to keep services running smoothly. Every incident brings unique challenges: Unknown unknowns that require deep dives into unfamiliar codebases. High-stakes decisions where delaying a feature or rolling back a deployment isn’t just technical—it’s political. Burnout risk, as you juggle firefighting with long-term projects. On-call engineers are the unsung heroes who keep the internet running, often in the shadows. Let’s give them the recognition they deserve. 🌟 What’s the hardest challenge you’ve faced on-call?
Resolve AI’s Post
More Relevant Posts
-
If you’re a software engineer, you’ve likely heard of 𝐝𝐞𝐚𝐝𝐥𝐨𝐜𝐤 , a situation where two or more processes block each other indefinitely, waiting for resources held by the other. 𝐇𝐞𝐫𝐞’𝐬 𝐚 𝐪𝐮𝐢𝐜𝐤 𝐛𝐫𝐞𝐚𝐤𝐝𝐨𝐰𝐧: 🔸𝐃𝐞𝐚𝐝𝐥𝐨𝐜𝐤 𝐂𝐨𝐧𝐝𝐢𝐭𝐢𝐨𝐧𝐬 (𝐂𝐨𝐟𝐟𝐦𝐚𝐧’𝐬 𝐂𝐨𝐧𝐝𝐢𝐭𝐢𝐨𝐧𝐬): 𝐌𝐮𝐭𝐮𝐚𝐥 𝐄𝐱𝐜𝐥𝐮𝐬𝐢𝐨𝐧: Resources can’t be shared. 𝐇𝐨𝐥𝐝 𝐚𝐧𝐝 𝐖𝐚𝐢𝐭: Processes hold one resource and wait for another. 𝐍𝐨 𝐏𝐫𝐞𝐞𝐦𝐩𝐭𝐢𝐨𝐧: Resources can’t be forcibly taken. 𝐂𝐢𝐫𝐜𝐮𝐥𝐚𝐫 𝐖𝐚𝐢𝐭: Processes form a waiting chain. 🔹𝐄𝐱𝐚𝐦𝐩𝐥𝐞 𝐨𝐟 𝐃𝐞𝐚𝐝𝐥𝐨𝐜𝐤: 𝐓𝐫𝐚𝐧𝐬𝐚𝐜𝐭𝐢𝐨𝐧 𝐀 locks payments and requests orders. 𝐓𝐫𝐚𝐧𝐬𝐚𝐜𝐭𝐢𝐨𝐧 𝐁 locks orders and requests payments. Both are stuck waiting for the other, causing a circular dependency. 🚀 𝐏𝐫𝐞𝐯𝐞𝐧𝐭𝐢𝐨𝐧 𝐒𝐭𝐫𝐚𝐭𝐞𝐠𝐢𝐞𝐬: 𝐑𝐞𝐪𝐮𝐞𝐬𝐭 𝐫𝐞𝐬𝐨𝐮𝐫𝐜𝐞𝐬 𝐢𝐧 𝐚 𝐟𝐢𝐱𝐞𝐝 𝐨𝐫𝐝𝐞𝐫: Prevent circular wait by enforcing an order. 𝐔𝐬𝐞 𝐭𝐢𝐦𝐞𝐨𝐮𝐭𝐬 𝐟𝐨𝐫 𝐫𝐞𝐬𝐨𝐮𝐫𝐜𝐞 𝐥𝐨𝐜𝐤𝐬: Automatically release resources after a timeout. 𝐁𝐚𝐧𝐤𝐞𝐫’𝐬 𝐀𝐥𝐠𝐨𝐫𝐢𝐭𝐡𝐦: Simulate resource allocation to avoid unsafe states. 🛠️ 𝐑𝐞𝐜𝐨𝐯𝐞𝐫𝐲 𝐓𝐞𝐜𝐡𝐧𝐢𝐪𝐮𝐞𝐬: 𝐒𝐞𝐥𝐞𝐜𝐭 𝐚 𝐯𝐢𝐜𝐭𝐢𝐦: Terminate one process to break the cycle. 𝐑𝐨𝐥𝐥𝐛𝐚𝐜𝐤: Revert a process to a safe state and restart it. Understanding deadlock and how to handle it is critical for building robust systems. Have you ever encountered a deadlock in your work? Let’s discuss! Let's connect ! 🙂 Follow for more tips and tricks from my tech journey #SoftwareEngineering #SystemDesign #Concurrency #Deadlock #TechTips
To view or add a comment, sign in
-
-
🎈 Tech Tip of the Day: Strengthening Collaboration Between Security and Software Engineering with SAST 🔒🤝 One of the biggest challenges in software development is the disconnect between security and engineering teams, often leading to late-stage fixes and inefficiencies. Creating opportunities for collaboration and knowledge sharing between these teams can drive significant improvements in both security and software quality. 📢 Solution: Using automated Static Application Security Testing (SAST) within your CI/CD pipeline fosters collaboration by giving developers real-time insights into potential security issues. This encourages a more integrated approach, where developers and security experts work together to address vulnerabilities early in the process. 👂 Why it matters: When developers gain hands-on experience with security tools, it reduces the back-and-forth between teams and eliminates the inefficiencies caused by silos. This collaborative approach drives waste out of the development process by preventing last-minute security fixes, improving overall code quality, and delivering better outcomes for the business. Ready to improve efficiency? Looking to partner with someone who’s "actually" done this work hands-on at scale—not just selling the idea? I’ve been in the trenches, reducing inefficiencies and strengthening development relationships. Let’s bridge the gap between security and engineering today. 📈 https://lnkd.in/g4g6tCMn 👉 How is your team tackling the challenge of security and development silos? Comment below! 👇 #TechTips #DevSecOps #SAST #SoftwareEngineering #SecurityFirst #CollaborationMatters #CI_CD #SoftwareQuality #Observability #OperationalEfficiency #ITConsulting
To view or add a comment, sign in
-
-
Three Ways to Stand Out as a Support Engineer. 1. Master problem-solving skills – showcase your ability to resolve issues efficiently. 2. Develop technical expertise in scripting and automation tools. 3. Build strong communication skills to bridge the gap between tech and non-tech teams. Which of these strategies are you applying today? Share your thoughts! #ITSupport #SupportEngineer #TechSupport #HelpDesk #ITCareers
To view or add a comment, sign in
-
Zero Trust development setups - Platform engineering teams enable Zero Trust development setups, where every commit adheres to organisational security best practices. With tools like pre-commit and pre-push hooks, they ensure no sensitive data is committed, and code stays within approved organisational repositories - regardless of the programming language. #platformengineering @ Searce Inc
To view or add a comment, sign in
-
We’ve all heard about the 10x engineer, but anyone who has worked in a large company knows the 0.1x engineer. The guy who somehow manages to do the bare minimum but not get fired. A study from Stanford University casts additional light on this phenomenon. Their research shows that almost 10% of engineers are ‘ghosts’; people who effectively don’t do anything. Judging software engineering productivity is very hard - almost all metrics are flawed, and at risk of manipulation (see Goodhart's law). But the inverse is very telling - while the difference between one commit a day vs two is perhaps style, one commit a week needs looking into. At MISSION+, we have been experimenting with using LinearB (link in comments) on our projects. Early results have been positive. It is a relatively straightforward visualization of your repositories, along with your task tracking platform, integrated into a dashboard where you can see who is doing what. If ghost employees know that commit velocity is being tracked, it isn’t hard for them to fake this - but it’s still a worthy metric to at least review. As is often the case, the biggest issue isn’t even the ghost engineer, it’s the impact on overall morale and team cohesion; no-one wants to be on a team that isn’t performing. One can obviously take this kind of reporting too far - but in many cases, it’s as simple as it seems. Some cases may be ghosts. Some cases may be people not following best practices (e.g. huge commits at the end of a sprint). All cases are worth investigating. Some of my preferred statistics to uncover conversations: 1. Only committing a handful of times a month. 2. Only raising a handful of Pull Requests a month (assuming you use PRs). 3. Only a handful of tickets go to ‘Closed’ state in a month - this might not be the developer’s fault, it may expose inefficiencies in the overall system. This is of course in addition to any DORA-style reviews you may be doing, but by taking a monthly snapshot of the above, you might just discover the Ghost in the Machine.
To view or add a comment, sign in
-
-
Ever wondered what it takes to build the perfect software system? As software engineers, we’re on a relentless quest to create robust, efficient, and secure applications. Yet, the road is fraught with challenges that continue to test our skills and creativity. From elusive bugs to performance and security issues, many problems remain unsolved or only partially addressed in the most optimal ways. Here’s a look at some of the most pressing issues we’re tackling today: 🔒 Security Vulnerabilities: Zero-Day Exploits Zero-day exploits are vulnerabilities unknown to the vendor or public when discovered by attackers. These pose significant risks, making it vital to implement regular security updates and use advanced detection systems to mitigate potential threats before they’re exploited. 🔄 Concurrency and Parallelism: Race Conditions and Deadlocks Handling concurrent access and avoiding deadlocks are core challenges in concurrent programming. Race conditions can lead to unpredictable behavior, while deadlocks result in processes being stuck waiting for resources. 🧠 Memory Management: Memory Leaks Memory leaks occur when allocated memory isn’t properly released, leading to increased memory usage and potential crashes. Detecting and managing these leaks, especially in languages without automatic garbage collection, is crucial for maintaining application performance and stability. 🛠️ Error Handling and Debugging: Dynamic Errors Dynamic errors manifest under specific conditions or in production environments, making them difficult to reproduce and debug. Leveraging extensive logging, monitoring tools, and thorough testing in various scenarios can help in effectively addressing these challenging issues. ⚙️ Performance Optimization: Algorithm Complexity Algorithm complexity impacts the performance and scalability of applications. Choosing the right algorithms and optimizing existing code are essential for improving execution speed and resource consumption. 📉 Code Maintainability: Technical Debt Technical debt refers to the cost of quick, temporary solutions that lead to future rework. Over time, this can make the codebase harder to maintain and evolve. Regular code refactoring, adherence to best practices, and prioritizing technical debt are key to managing and reducing its impact. #SoftwareDevelopment #TechChallenges #Programming
To view or add a comment, sign in
-
-
🚀Empowering Network Engineers: 🔑The Key to Successful Network Automation I truly believe that the true success of network automation lies not only in the technology itself but, more importantly, in the empowerment of network engineers. Adopting NetDevOps principles, fostering autonomy, and building expertise in automation tools are essential steps to ensure sustainable and efficient operations. The Critical Mistake to Avoid Relying solely on software developers to embed network logic into code can be detrimental, as it effectively takes control away from network engineers. Here’s why this matters: • 🛠️ Proximity to the Network Network engineers understand the intricacies, performance requirements, and constraints of the network far better than external developers. • 🔑 Flexibility and Ownership Engineers trained in NetDevOps (Ansible, Jinja, Python, Git, CI/CD, Terraform, Sources of Truth, orchestrators, etc..) can adapt and evolve automation solutions as business needs change. • 🤝 Collaboration, Not Dependency Relying entirely on developers for automation creates bottlenecks and risks. Instead, fostering cross-disciplinary collaboration ensures both agility and sustainability. A Vision for the Future By empowering network engineers with the right skills and tools, organizations can build automation solutions that are: ✅ Efficient ✅ Maintainable ✅ Scalable All without losing sight of the network’s core needs. Let’s embrace a future where network engineers are NetDevOps leaders 🦸♂️, not just automation users. 🙎♂️ 💬 What are your thoughts on this approach? Are you investing in the right skills to drive network automation forward? #NetworkAutomationForum #AutoCon2 #NetworkAutomation #NetDevOps #NetworkEngineering #Collaboration #Automation
To view or add a comment, sign in
-
🚨 When a Production Issue Meets Junior Engineer Optimism 🚨 Tech Lead: "Hey, did you see the alert? Production's down." Junior Engineer: "Oh yeah, I saw it! I'm pretty sure it’s just a caching issue. Five minutes, tops!" Tech Lead: "Hmm, we've lost half our user sessions." Junior Engineer: "No worries! A quick server reboot, maybe clear some cookies... it'll be fine." Tech Lead: "Our servers don’t use cookies." Junior Engineer: "...Right. So... just... turn it off and on again?" Tech Lead: "We're past that. We need a root cause analysis." Junior Engineer: "Great! I'll just... check... all the logs. Like, every log. Every line. This should take... five minutes?" Tech Lead: "Welcome to production support. We live here now." Junior Engineer: *[20 minutes later]* "Do... do we actually sleep here? Asking for a friend." --- Ah, the optimism of new engineers. 😅 But hey, we’ve all been there! It's these little moments that make tech life interesting... and sometimes terrifying. #LifeInTech #ProductionSupport #OnCallLife #EngineeringHumor #JoysOfDebugging
To view or add a comment, sign in
-
𝗧𝘆𝗽𝗲 𝗔 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝘀: Identify bugs and promptly inform the manager, who then assigns the task to another engineer. 𝗧𝘆𝗽𝗲 B 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝘀: Report bugs to the manager and request to be assigned to fix them at a later time. 𝗧𝘆𝗽𝗲 C 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝘀: Identify bugs, resolve them, and promptly notify the manager with comprehensive details about the solution. It's widely preferred that all engineers exhibit Type C behavior. Agree? #linkedin #linkedinpost #linkedinmotivation #dailymotivation #softwareengineers #engineers #mondaythoughts #mondaypost #work #worklife #softwareengineer
To view or add a comment, sign in
-
-
This is a great article explaining why it is so harmful to interrupt developers with countless meetings. I think that making large blocks of time "focus time", and more importantly ensuring that it is honoured, is really important. Obviously, aligning teams & stakeholders so that their blocks of focus time are more or less aligned goes a long way to helping with this. https://lnkd.in/ezQkkvrP
To view or add a comment, sign in