RDR Categories

Resilience and Disaster Recovery

RA | Resilience Architecture
Resilience Architecture ensures the ability to adapt and recover automatically to unknown changes in the system, either through a threat or due to software or hardware failures. A good design will ensure the solution is resilient and able to withstand or minimize disruptions. A typical assessment will check if:

  • Does the solution have high-availability?
  • Will it use clustering?
  • Will it have load balancing?
  • Will there be network redundancy in terms of network links?
  • What will be the redundancy plan between datacenters?
  • If using AWS, will it use using different availability zones?
  • Will it have any geographical constraints?
  • Will it have any resource limitations? (e.g. bottlenecks, bandwidth, single network paths)

BS | Backup Strategy
Backups are a critical piece of any system which are in essence, a duplicate copy stored in the primary working system. The backups can be for either data or configuration, both required to rebuild the system if needed. They ensure the continuous availability and reduce downtimes. A typical assessment will check if:

  • Will it have automated backups?
  • Will backups be stored into a centralized storage system?
  • Will it be distributed across locations?
  • What will be the backup frequency?
  • Who will have access to them?
  • What will be the testing frequency?

IR | Logging and Monitoring
Preventing all attacks is nearly impossible, which is why this section ensures all the relevant and critical components of the solution are integrated into the enterprise logging solution. It is essential for an organization to quickly detect attacks in order to minimize the damage and initiate any mitigation activities that might be needed. A typical assessment will check if:

  • Will the components be logging to the enterprise logging solution?
  • What will be logged?
  • Who will be monitoring the logs?
  • Who will be notified of any incident?

IR | Incident Management
This category refers to the management of events and incidents that could cause disruptions to operations, services or functions. A typical assessment will check if:

  • What will be the overall process when an incident is detected?
  • Who will be involved in this process?
  • Will the process be well defined and documented?

IR | Recovery Process
A fundamental piece of any solution is their recovery capabilities after a major incident. It is essential to ensure a robust process to recover the underlying components to continue providing services after disruption. A typical assessment will check if:

  • What will be the steps to rebuild the system?
  • Will RTO/RPO be defined?
  • Will it be integrated into the overall BCP and DR strategy?
  • What will be the process to test if the re-build steps are accurate?
  • Who will be involved into this process?