12 Steps to Technical Troubleshooting for Leaders and Problem Solvers

12 Steps to Technical Troubleshooting for Leaders and Problem Solvers

These are lessons learned from observing some masters that I have worked with, and from my own personal, hands-on software engineering experience. Whether you are a leader or a problem solver, hope this helps - feedback and suggestions on improving these steps, are always welcome !

1.     Weed out the symptoms from the root cause

  • Don’t spend time in addressing the symptom – if you go back to bed at 3 AM doing this, you will most likely be woken up again at 5 AM for the same problem
  • Symptoms can often be misleading – so, pay attention to the details
  • Just rebooting an application or a server may only provide a temporary relief; root cause will come back to bite you again !

2.     Start with errors: Review the pop-up messages and the clues in the logs

  • Follow the breadcrumbs the system leaves you – be sure to look for errors in related processes as well
  • On the other hand, if you write software, leave enough breadcrumbs for others, during system failures (see Step 12)

3.     Review documentation on past history for similar failures

  • No need to re-invent the wheel, there may have been trailblazers before you
  • On the other hand, create enough documentation, so it will help others in the future (same as Step 12)

4.     Time constraints: Back to business Vs Chasing the root cause

  • Make a quick decision as to whether time is on your side to (finally) chase the root cause of an elusive problem,
  • Or, if you will need to bring the system back up quickly, e.g, a lot of system users are waiting for you to fix the problem; In this case, you may want to eliminate an environmental or a timing problem by simply restarting the process (be sure to not fall for the trap in Step 1)

5.     Communicate, communicate, communicate

  • During system down times, a lot of folks want to know frequently about the progress being made
  • Designate someone to handle the status updates, so you can concentrate on the resolution, or, if you are handling it yourself, don’t forget to provide periodic updates

6.     Focus on the latest error - the main point of failure

  • Home in on the immediate failure, understand it, and then expand your research outward
  • Was there a recent bump in volume of transactions, was there a problem due to additional scanning implemented by your DataSec team ?, was there a OS upgrade by your SysAdmin team ?, was there a Database upgrade ?, was there an operational change with new system operators working on the system ?, etc.

7.     Look at the process that precedes the failed process

  • Watch out ! Sometimes, clues are buried in the process that preceded the failed process, e.g., was there a migration of a change to the preceding application that is causing failure on your (subsequent) process ?

8.     Compare the failed path against a recently successful one

  • The key to identifying the root cause, is sometimes in relative troubleshooting – compare and contrast

9.     Don’t forget the big picture – are there other processes with similar failure ?

  • Check in on any corporate-wide failures, e.g., the problem you are chasing on a specific database table may be due to a database-wide problem that other teams are experiencing as well – join forces !

10. Due diligence before asking for help from other teams

  • No one wants to be called in to research with their blind folds on – give other teams enough clues from your own research, observations and findings, so they have a head start

11. Ask for help in a timely fashion – waiting too long makes no one happy

  • If you followed the 10 steps above relatively quickly and you still need help, don’t wait too long to seek it
  • Ask for help in a logical fashion, e.g., do you need the database team, or the storage team first to do the research ?

12. Document your findings to enable root cause fix, or to address future failures until the root cause is fixed

  • A lot of folks complain that there is not enough documentation around, when they themselves have created very little to none - don't be one of them !
  • If it has been a pain for you to solve a problem, why would you want yourself or others to go through the same ?

To view or add a comment, sign in

More articles by Prathap (Prat) Shanmugam

Insights from the community

Others also viewed

Explore topics