Fernando Villalba’s Post

View profile for Fernando Villalba, graphic

Principal Engineer @ Civo

Engineer 1: “The (java) application is leaking memory and crashing every few days, we need to fix that” Engineer 2: “Don’t worry about memory, it’s cheap, we can always buy more” This was a real exchange of two developers in a place I worked at. The app in question always ballooned in memory usage and got killed by OOM every few days, causing downtime. It never got fixed, engineers and management kept pushing features instead. This is one reason why it’s important to set Service Level Objectives (SLOs), if you trespass the limits, you have to fix what’s causing them, you can’t just push for new features. In other words, SLOs can help give you time to address technical debt.

Nahum Litvin

DevOps Engineer @ Velo Platform Group @ Wix

6mo

Devops 1: "let's add to the liveness probe a memory check that will gracefully close the app before it can crush. this will resolve the immediate issue and the oom will become tech debt" Devops 2: " let them suffer or won't fit the leak until app crushes every other minute"

Sean R Turner

Chief Information Security Officer at Twinstake, SME and Climate angel investor, dad of four, hairy car nut (hairy me, not hairy cars). Superman doing everything :-P

6mo

cron up a restart 😂

Engineer 1: “The (java) application is leaking memory and crashing every few days, we need to fix that” Engineer 2: "Let's rewrite it in Rust instead" 😁

Suresh Kumar Khemka

Head of Engineering @Atlassian | Platform Engineering | SRE | DevOps | DevEx | Cloud | Performance Engineering

6mo

If the engineer is not the one responding to alerts when app goes down and fix it, even a manual restart then its more of a cultural issue as in this kind of scenario, even if you have SLOs defined and tracked, those will be ignored. Between any smart operator in this case will just put a job to restart the application once a day or something like that. 😁

As always, it depends. How much engineering time would it take to resolve the problem? A day? A week? A month? How much RAM could you buy for that cost? How many features would get pushed back? How much revenue would those features drive? Engineering solutions rarely exist in isolation, they drive value or they don't matter.

Gratus D.

Software Engineering | Futurism

6mo

SLOs really help with this type of thing. As an engineer a memory leak seems like "stop what you are doing and solve this first" type of problem, but for a non-engineering business owner it might be indistinguishable from just an "opinion" on code design. SLOs help align that language across the common need to serve our customers. If x amount of data loss (i.e. everything in memory when the app crashed), and y amount of downtime is perfectly fine then from a non-technical perspective why would you want to fix it? But once SLOs are established, in 2024 when everything is online, I can't imagine someone not prioritizing this (in 2004 even with SLOs I can see a lot of companies accepting that type of downtime)

Kelvin Meeks

Consulting Architect/CTO - Leadership in Enterprise Architecture and Software Engineering Innovation (US Army Veteran)

6mo

NFRs are a goodness. (this Wikipedia page is a good source for ideas...) https://en.wikipedia.org/wiki/Non-functional_requirement

Like
Reply
Sebastian Bergmann

Created PHPUnit. Co-Founded thePHPcc. Helps developers build better software.

6mo

In situations like that I am always thankful that I learned programming (not software development, mind you) in a time and on a platform where I was happy for every byte I was able to shave off.

Raja Nagendra Kumar

Chief Code Doctor - Delivering Clean never exhausts #CodeDoctors

6mo

#NFRs expose Enginering Skills. Ask for funds & time to build AI based restart to optimally reset memory .. may be everyone see that as a feature to be built.

See more comments

To view or add a comment, sign in

Explore topics