5th March 2021

An Overview of IT Resilience, Business Continuity and Disaster Recovery

IT resilience is the ability of a business to adapt to planned or unplanned events while keeping services and operations running continuously. With good IT resilience, business information and data remain available, the IT infrastructure stays operational, disruptions are minimized, and service levels are restored quickly.

Set of laptops on table

Information & Communications Technology

We make sure that the ICT (Information & Communications Technology) Infrastructure, Systems and Processes are implemented and supported in such a way that the business will continue to operate effectively when impacted by an event. We work with businesses to identify ICT vulnerabilities and provide practical solutions to remove these weaknesses.

When it comes to Business Continuity (BC) it is important to create plans that consider all business functions and possible events and to create processes, policies, and procedures related to preparing and protecting the business accordingly. It is generally accepted that a high-level approach is needed.

While ICT is only part of the overall mix, it is, nevertheless, a significant enabler of business services and operations and therefore having resilient systems in place will prevent excessive disruption. In today’s world, where “always-on” has become the norm, downtime is increasingly expensive and unacceptable. Therefore, businesses should reinforce their IT resilience to avoid lost revenue, regulatory penalties and damaged reputations.

We work with businesses to maintain the systems and services required to keep operations running and help with IT Business Continuity Management (BCM). We also incorporate Disaster Recovery (DR) as a part of BC, both of which extend beyond the scope provided by everyday Incident Response and Management because the scale and type of events are extraordinary and require special measures.

Typically we are focussing on these core areas to ensure a business is prepared for disruptive events:

  • Reliable Internet Supply
  • Resilient Systems & Services
  • Protecting the wider IT Network
  • Flexible Collaboration & Communication
  • Information Backup, Restore & Disaster Recovery
  • The Impact of Flexible Working
  • Cyber Threat Protection

We work in many different sectors and with businesses of all types and know that all businesses are different and therefore plans need to be unique. We consider the acceptable cost of recovery and the overall impact as well as the acceptable Recovery Point Objective (RPO) and Recovery Time Objective (RTO) for each part of the business and the systems and services required.

Put simply, we can say that the RPO is the moment you saved a document you are working on for the last time. If your system crashes and your progress is lost, how much of your work are you willing to lose before it affects you? On the other hand, RTO is the timeframe within which applications and systems must be restored after an outage.

Some businesses may be content with losing 1 or 2 days of information and being closed for several days. Others though need much faster availability and cannot afford any data loss. Very often different scenarios apply to different departments within the business. For example, the sales team may need to be operational within a few hours whereas HR and back-office staff could be placed on a lower priority. Email, on the other hand, could be considered separately and be of the highest priority, being made available almost instantly.

Impact Analysis

We think it is important that each business determines the relative impact of its Business-critical and Non-Critical applications as part of its Impact Analysis. To help with this a three-tier model to help formulate the plan is frequently used.

  • Tier-1: Mission-critical services & applications that require a very fast RTO with a Zero RPO
  • Tier-2: Business-critical applications & services that require RTO of 24 hrs & RPO of 4 hrs
  • Tier-3: Non-critical applications & services that require RTO of 48 hrs & RPO of 24 hrs

Here you can see the factors that need to be considered and as you can see there could be significant differences to both the actual data loss and the time to recover.

Recovery targets

The various services and applications used by the business will have different demands and the solutions for each could be widely different. The cost to implement needs to be weighed against the impact to the business and of course the cost for zero RPO and RTO would be extremely high and might be prohibitive.

Therefore, a robust and unique plan should be made to reduce the impact of disruptive incidents and this is where we can help by identifying vulnerabilities within the ICT infrastructure and providing practical solutions to remove weaknesses. Additionally, we always recommend that once in place, the plan should be continually reviewed and tested.