Essential Practices For Successful Cyber Incident Response

Essential Practices For Successful Cyber Incident Response

It’s 5 a.m. on a Tuesday and the phone rings. You answer, wiping the sleep from your eyes, as one of your security analysts says , “we've been hit by a ransomware attack. We need all hands on deck immediately.”

Unfortunately for many security professionals, this scenario is all too real -- and unbearably frequent. It also triggers anxiety, panic and lots of unanswered questions, across the organization including:

  • What caused it?
  • How serious is it?
  • What needs to be done?
  • Are clients impacted?
  • What do we need to do right now?

Are You Prepared for Incident Response?

Cybersecurity incidents are now a matter of when, not if. And their business impacts are becoming worse.

Gartner predicts that between 2022 and 2025, 1 in 3 infrastructure organizations will be the victim of a breach or cyber-attack disabling critical systems. 2022 saw the highest average breach cost in 17 years , at over $9.4M in the US (IBM Data Breach Report)

Despite the elevated level of cyber threats, incident response readiness is alarmingly low.

According to a 2021 IBM Cyber Resilience Study, only 46% of organizations have sufficient incident response (IR) plans in place, and 79% don't have a mature, resilient cybersecurity program.

Low awareness, mis-calculated strategic priority and lack of skilled resources causes many organizations to wait until they have experienced a cyber attack to implement a realistic and functional incident response process. Unfortunately, waiting to prepare until you’re faced with a threat significantly increases the probability for impact to your business.

Waiting to prepare until you’re faced with a threat significantly increases the probability for negative consequences to your business.

Having a reliable incident response plan in place before incidents happen reduces the likelihood of harm in the near term, as well as helps shift the overall cybersecurity posture across the organization.

More specifically, cybersecurity incident response provides several strategic benefits:

  • Proactive protection of assets —security incidents happen without warning, so it’s essential to prepare ahead of time.
  • Resiliency — teams can respond in a repeatable manner and restore the business more quickly.
  • Alignment — the plan keeps everyone in sync to control damage recover during a crisis.
  • Risk reduction —having a plan, then exercising it exposes security gaps before attacks occur, then helps to reduce damage after attack.
  • Compliance — clear planning and documentation reduces an organization’s liability and provides evidence for compliance auditors and other authorities.


Where Many Organizations Fail in Incident Response Planning


Where a lot of organizations fail is that they assume they have a well-functioning plan in place, when in fact, it may be full of un-seen holes. In some cases, organizations may opt for off-the-shelf templated plans that aren't suited to their IT environment or industry. In other cases, the plan looks good on paper, but hasn't been practiced and pressure tested. In either scenario, a well intentioned plan may still fall flat when you need it most - the day of your next cyber incident.

Another big reason organizations lack solid incident response planning is because of staffing and talent constraints. Either they lack the right skillsets or their IT staff are overextended and too bogged down by day-to-day activities.

Another reason might be that leadership hasn't been shown the business value from this investment. While it makes sense in most instances for executives to evaluate the ROI of new IT initiatives, the return on incidence response planning is notoriously difficult because hopefully, you will never actually use it. And, a plan un-used is hard to justify financially.

To be clear, a well-planned and well-documented incident response plan will not only save time and money directly related to IT operations, it may also add a huge amount of value and effort later on. When disaster strikes, a fast and effective response can make all the difference for your employees, customers, and partners. 


How to Evaluate Your Incident Response Plan


As your organization considers implementing a new or upgrading an existing incident response plan, ask yourselves these questions:

  • Are my plans current? Have they been updated to reflect the current business and threat landscape?
  • Does the plan have goals and measures of success?
  • Does it provide steps for fast and effective resolution?
  • Does it account for meeting regulatory or legal requirements for data disclosure in the case of incidents involving data breaches?
  • Does the plan include proactive practice and testing activities to identify and resolve functional gaps before day of attack?
  • Does the plan account for sufficient post-mortem investigation and log-based evidence generation to address control flaws and improve the plan?
  • Are the right people informed and knowledgeable of the plan, their responsibilities, and how it affects them? Can I rely on them to effectively execute their parts?

Auditing your current efforts using these questions will help you uncover gaps that will introduce unexpected risk and delay at time of incident. Additionally, asking these questions during the development of new IR plans helps to ensure the plan is realistic and a good fit to your needs and skill sets of the team.

The best time to evaluate your incident response capabilities isn't during or after an attack -- it is right now, before your plan is put into action.


Essential Elements of a Successful Incident Response Plan


A comprehensive approach to incident response includes identifying an attack, understanding its severity and prioritizing it, investigating and mitigating the attack, restoring operations, and taking action to ensure it won’t recur.

An incident response (IR) plan is a set of documented procedures detailing the steps that should be taken in each phase of incident response. It should include guidelines for roles and responsibilities, communication plans, and standardized response protocols.

Organizational Fit

While just about any organization can be target for cyber attack, it’s important to recognize that each one is somewhat unique in terms of readiness for incident response. Structure, operating model, culture and technology adoption all play into how ready the business is to defend against and respond to threats. For this reason, cookie cutter “template” incident response plans are usually inadequate or, at best, require significant modification to be effective.

Thus, an incident response plan should be specifically tailor fitted to the organization. To achieve organizational fit, you'll want to start with an assessment of your current cybersecurity program, including the readiness of organizational culture, processes and governance, in addition to technologies and controls. 

Part of the fit exercise should also include a review, and possibly a re-calibration of, enterprise risk management policies, risk tolerances and business continuity planning. Policy helps to outline a framework for incident response, and potential roadblocks to implementation.

Preparation and Scoping

The ultimate goal of an incident response plan is to enable timely, effective response to cyberattacks that will minimize negative consequences. That said, each organization must define what this means in context of the business environment before initiating an IR plan. This usually includes protecting important information from threats, and react immediately and appropriately to remove risk and reduce scope during any incident.

Each organization will need to choose their level of robustness based on organizational need. According to the NIST incident response methodology, incident response is more than a linear list of steps followed in response to attack. It is a guiding roadmap for the organization’s incident response program to continually improve itself. In mature security programs, incident response is viewed as a cyclical activity, where continual practice, learning and improvement is used to adapt after each incident to better defend the organization.

For these goals to be acted on without fail in the moments of an incident, they must be precisely connected to specific actions steps with clearly defined roles and responsibilities, and technologies in place to support each step.

This section of your plan specifies what is considered a security incident, who is responsible for incident response, roles and responsibilities, documentation and reporting requirements.

Another key step during preparation is to establish an Incident Response Team, who will ultimately oversee and handle response activities before, during and after incidents. This team should include a cross section of business and technical experts with the authority to take action in support of the business. You'll also want to consider the team's structure, physical location and whether the team's function is provided by internal staff and/or augmented with outside cybersecurity providers who have experience in incident response. For instance, if your incident response is mainly handled by a SOC, this could be internal or outsourced to a managed security / SOC partner.

The plan should provide clear and concise guidance for all members of the incident response team to avoid confusion and delays. IR communications should ensure all departments affected by an incident stay informed and everyone with responsibilities should have appropriate decision rights and documentation to guide their actions.

It should also define any known assumptions or limitations to clarify scope of the plan to document what your plan intends to do – and what it cannot do, at least in the initial iteration. You may also want to account for future updates to optimize plan performance.

Preparation also includes inventorying the IT environment to include networks, servers, apps, data sources/flows and endpoints by importance.

Incident Definition and Monitoring

A key benefit of proper IR planning is clear agreement on how to recognize and determine whether to activate the plan , and what steps are put into motion immediately after. 

Part of this includes defining the following:

  • Who has the authority to invoke the plan
  • Where/how does the incident response team meet and communicate
  • How are incidents sourced and prioritized
  • What are the specific steps for detection, diagnosis and remediation
  • How are incidents and responses evaluated for performance

Once incidents and procedures are defined, you'll need to choose tools and methods to detect and monitor baseline activity vs. potential harmful events.

Detection and Analysis Playbook

Detection involves collecting data signals from the IT environment, people inside the organization and authoritative information sources outside the organization via security tools to identify precursors (signs that an incident may happen) and indicators (signs that an attack has happened or is happening now).

Analysis involves identifying a baseline or normal activity for the affected systems, correlating related events and seeing if and how they deviate from normal behavior.

Effective IR plans will have specific instructions in a concise, actionable playbook format to guide security teams to detect incidents as they step though procedures for reporting, investigating, and containing the threat. We recommend a priority and scenario-based approach to make detection and analysis more practical and test friendly. Additional resources such as checklists and more detailed runbooks may also be helpful. The key here is that your IR team understands how to use the resources and have practiced their use before incidents happen (see Testing and Preparedness below).

Most response efforts will depend on the type and scope of an attack. You'll want to be able to quickly and accurately rank the severity and importance based on a pre-defined scoring mechanism. Once ranked, then the appropriate 'play' can be applied. 

Two questions that will help you prepare the right response to an active cyber attack are:

  1. Do the impacted / breached systems contain sensitive data?
  2. Does the threat actor have access to privileged accounts?

The answers to these help determine whether you should pursue an isolation approach (aka pull the plug), or if containment is an option so you can gather forensics to discover motive and possible future attack paths.


Read more:

Incident Response Play by Play with Microsoft Defender and Sentinel


Choose the Right Detection and Response Tools

One of the biggest challenges in incident response is the sheer volume of data points, signals, artifacts, reports and related communications overhead to be managed as part of the detection and response process.

Keeping track of indicators of potential compromise (IOCs) is important because IOCs serve as precursors and post-event markers of attack, as well as forensic clues to prevent future intrusion attempts and any other harmful activities.

There are many different classes of tools (SOAR, EDR, XDR) that automate this process with proactive alerts and AI powered intelligence layered on to improve response. Still, the problem is volume and interpretation, so it is important to choose a tool that helps you to separate signals from noise and simplify, not complicate, your IR process. Incident response involves several, repetitive day-to-day tasks and much of this can be automated with the wide array of IR software solutions and managed services available. With the shortage of security talent being at the top of the list of IT challenges, improving productivity and eliminating burnout becomes even more important.

Here are some criteria to help selecting

  • Links from detection to resolution systems
  • Coordination with access rights managers and firewalls
  • Customizable action rules
  • Action logging
  • Live status reports


Security Incident & Event Management (SIEM) for Threat Detection

If you’ve spent any time paying attention to security technology, you will undoubtedly have heard the term “SIEM,” or Security Information and Event Management.

A SIEM system works by capturing—by way of log aggregation—the forensic information required to uncover security incidents or breaches and escalating them to a Security Operations Center (SOC) for review and remediation. In many cases, the logs stored in a SIEM will be the difference between a detected breach and an undetected breach–that is, if the SIEM program and process can review the logs and react in an appropriate timeframe. 

Industry leading SIEM solutions provide a range of capabilities to automate real-time collection and correlation of information from across your entire environment to generate readily actionable signals for post and preventive threat response. Capabilities include detailed log analysis and correlation, incident reporting, risk management, behavioral threat detection and actionable analytics. 

The best SIEM solutions offer intelligent event correlation processes, allowing sources to be aggregated and reviewed programmatically, detecting relevant patterns and abnormal behavior with far more speed and efficiency than a security analyst could.

With these features, SOC analysts can review more insightful data and detect threats much earlier and with less alert 'noise' and fatigue, even for extremely complex incidents.

A SIEM is only as effective as the information it alerts on, so properly defining the data sources, correlation goals, and appropriately reviewing and tuning over time are essential to its success. Once that is complete, a consistent deployment and the addition of SIEM into your deployment lifecycle works to prevent gaps from appearing as the environment changes. 

As infrastructure becomes increasingly complex and more people are moving to a hybrid environment between on-premises and cloud deployments, there is a greater need for a centralized SIEM. Each of these infrastructure components adds a layer of complexity and has the capability to introduce additional vulnerabilities in the configuration or code.

For this reason, we recommend Microsoft's unified SIEM + XDR solutions.

Optimize Incident Response with Microsoft 365 Defender and Azure Sentinel

Microsoft 365 Defender and Azure Sentinel combine the breadth of a SIEM with the depth of XDR, to fight against attacks and protect the most complex enterprise environments, across on-prem and multiple clouds. Empower defenders to hunt and resolve critical threats faster, eliminate alert fatigue and boost confidence levels of remediation actions.

No alt text provided for this image


Testing and Preparedness

Having the right plan and processes alone are not enough; the plan itself has to be tested and refined. This is where tabletop exercises come into play -- they connect the plan to practice to ensure success at the key moment of incident, before it is needed.

A well designed tabletop exercise simulates an actual incident and provides validation that the response plan as designed will actually work to address it. At the same time, it will highlight gaps that need to be addressed, in a low risk environment. The exercise takes participants through the newly planned process of dealing with a simulated incident scenario. Tabletop exercises are a low cost, low risk way to boost preparedness, offer hands-on training and continually refine incident response.

Cybersecurity tabletop exercises are most effective when structured as an initial scenario (e.g., malware), followed by a series of scenes that add new information to the incident to which participants must react. This structure replicates the uncertainty and evolution of real incidents.

We also recommend including leadership and other decision makers across the organization to immerse them into the IR process.


Containment, Eradication and Recovery

Containment means putting a stop to an attack as quickly as possible before it causes widespread damage. Containment approaches should be adaptable to match incident type and severity with the constant of keeping critical services available.

The containment section should outline tangible actions to limit the extent and potential damage of the incident, with a wide range of possible steps. For example, an external ransomware attack will likely be handled much differently than an employee's incorrect use of admin privileges or clicking on a phishing link. Procedures for containing and isolating an attack may include taking affected systems offline to prevent the spread, or even isolating unaffected systems to prevent further spread.

A key requirement for containment is to identify the attacking host, validate its IP address and the attack vector. This allows you to block their communications and access methods to prevent further spread.

In the eradication and recovery stage, after the incident has been successfully contained, the team will take steps to remove the incident's sources and outcomes from your environment. Eradication can include a wide range of actions such as removing viruses and malware, and closing or resetting passwords.

Once the threat is eradicated, the steps to begin restoring systems back to a normal state of operation begins. As with detection and containment, recovery steps should also be prioritized depending on the threat type, scope and severity of impact.

In the case of ransomware, recovery by restoring to backup or snapshot may be your best solution. For a malware attack, extraction tools and defensive applications might be all that is needed to update systems and close gaps. In other cases, configuration and access privilege changes may be required to patch vulnerabilities or block repeat attempts.

Whatever your eradication and recovery approach, your allowable tolerances using recovery time objectives (RTO) and recovery point objectives (RPO) should be defined ahead of time.

(RTO) is a measure of time, i.e. the duration of time and a service level within which a business process must be restored to avoid unacceptable consequences. To determine your desired RTO, ask the question: “What is the maximum time allowable to recover after discovery of an incident?“

(RPO) is a measure of data quantity, i.e. the maximum allowable amount of data loss up to a threshold or 'tolerance' level, as part of a business continuity plan. To determine RPO, ask: How much data loss can we tolerate before given levels of operation are disrupted?

These specific goals and metrics help you quantify risk tolerance, which is an important factor in shaping the overall incident response approach.

Once containment is achieved, and scope of impact is identified, you may determine that it’s safe to monitor the attack and capture evidence for forensics and learning. Take care in containment and eradication approaches as they may destroy useful evidence you can use later.

Post Incident Activities

As covered in the Testing and Preparedness section, just having a plan isn't enough; the plan itself should be refined over time. This is because of two key reasons. First, the cyber threat and regulatory is constantly evolving. Second, your organization and IR team will be able to take in learnings and improve with each new incident response. Even failures are an opportunity to learn and improve.

After completion of the eradication and recovery stage, the team should ask, investigate and document answers to the following questions:

  • How well did the IR team respond overall? Did we follow the plan, and if so, why not?
  • Were any wrong actions taken that caused damage or inhibited recovery?
  • What were the key signals of the incident, how and when were they received, and do we have confirming evidence of those?
  • What information if received sooner, could have improved response?
  • Did we discover precursors that will help prevent future incidents?
  • What would we do differently next time?
  • Have we learned ways to prevent similar incidents in the future?
  • What additional tools or resources are needed to help prevent or mitigate future incidents?

With each new incident, this post mortem step will become easier and should build upon the previous findings to improve the process, adjust your incident response policy, plan, and procedures.


Plus+ Cybersecurity Incident Response Solutions


Getting those incident calls in the middle of the night may never change; however your response to them can.

Our cybersecurity team can expertly develop and implement your incident response plan and process from start to finish, or guide your own efforts for best results. Whether you need an evaluation of existing process, help with tabletop testing, or are ready to up an in-house or outsourced SOC for remediation support (detection, forensics, threat hunting, other resources as needed), our comprehensive solutions and managed services have you covered to respond and recover quickly and within risk tolerances.

To get started, speak with one of our cybersecurity advisors today.

Enjoy this article? Get more insights and resources to help you move from aspiration to results in our +Insights Center.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics