What has the ability to recover information or systems in the event of catastrophic disasters?

Disaster recovery (DR) is defined as an organization’s method of circumventing or minimizing data loss and business disruptions resulting from catastrophic events. Such events may be human-made or natural, including everything from equipment failures and localized power outages to cyberattacks, civil emergencies, criminal or military attacks, and natural disasters. This article aims to give you a comprehensive understanding of disaster recovery and best practices for disaster recovery in 2021.

Table of Contents

What Is Disaster Recovery (DR)?

Disaster recovery (DR) is defined as an organization’s method of regaining access and functionality to its IT infrastructure after events like a natural disaster, cyberattack, or even business disruptions related to the COVID-19 pandemic.

DR consists of IT technologies and best practices designed to circumvent or minimize data loss and business disruption resulting from catastrophic events. Such events may be human-made or natural in occurrence, including everything from equipment failures and localized power outages to cyberattacks, civil emergencies, criminal or military attacks, and natural disasters such as hurricanes, wildfires, and floods, etc. DR is a crucial aspect of business continuity.

Businesses of all sizes — small, mid-sized, or large — get equally affected by the unnoticed occurrence of any such disastrous events. However, SMEs tend to neglect them, thereby failing to develop a reliable, practicable disaster recovery plan. Without such a system, they have little protection from the significant impact these disruptive events can pose to their fundamental structures.

Infrastructure failure can cost around $100,000 per hour, whereas critical application failure costs range from $500,000 to $1 million per hour. Many businesses cannot recover from such losses. Nearly 40% of small businesses do not reopen after navigating through such a disaster, and about 25% fail within the first year after the crisis. Therefore, having a disaster recovery plan in place can dramatically reduce the risks of such events.

DR planning involves the following steps:

    1. Strategizing
    2. Planning
    3. Deploying appropriate technology
    4. Continuous monitoring and testing

In addition to this, maintaining backups of an organization’s data forms a critical component of disaster recovery planning. Disaster recovery also involves keeping adequate storage and compute available and intact to support robust failover and failback procedures. Failover is the process of offloading workloads to backup systems so that the disruptive impact of the disaster on the production processes and end-user experiences are contained to a minimum. Failback involves switching back to the primary systems.

How does DR work?

Disaster recovery largely depends on replicating data and computer processing that can be carried out in an off-premises location, which is not affected by the disaster. In scenarios where the servers go down due to a natural disaster, equipment failure, or cyberattack, a business needs to recover lost data from a secondary location where the data is backed up.

Here, the secondary site is not within the geographic region of the impacting disaster. Thus, an organization can transfer its computer processing to that remote location to continue its business operations.

Top 5 elements of an effective DR plan

1. Backups

Backup is essentially the spine of any DR plan. The organizations need to determine what needs backup (or to be relocated), who should perform the backups, and how they should be implemented. The plan should include a recovery point objective (RPO) that defines the frequency of backups and a recovery time objective (RTO) that states the maximum downtime allowable after a disaster. 

These metrics create limits to guide the choice of IT strategy, processes, and procedures that make up an organization’s disaster recovery plan. The disaster recovery strategy is dictated by how much downtime an organization can handle and how frequently the organization backs up data.

2. Business-critical asset assessment

An effective DR plan includes documentation of critical systems, applications, data, and other resources that are significant for business continuity and the essential steps necessary to recover data.

3. DR group

The DR group is a team of specialists responsible for creating, monitoring, and managing the DR plan. The plan should define each team member’s role and responsibilities so that in the event of a disaster, the DR team knows whom to contact and how to communicate with the employees, vendors, and customers, in turn.

4. Risk evaluation

Risk evaluation identifies the potential hazards that can put the organization at risk. Assessment is made depending on the type of event. The organization needs to strategize what measures and resources may be required to resume business and restore its original position.

5. Testing and updating

The DR team should continually monitor, test, and update its recovery strategy to address continuously evolving cyber threats and business needs. The organization can successfully navigate any such challenge by ensuring that a company is always equipped and ready to face the worst-case scenarios in disaster situations.

Benefits of disaster recovery

Implementing a robust DR solution can assure the following benefits:

1. Reduced restore times with lower RTO and RPO

Disaster recovery guarantees the restoration of systems, services, and applications quickly, thereby effectuating lower RTO and RPO. DR plan parameters and employed methods ensure drastic reductions in restore times based on an organization’s needs.

2. Contained losses

DR plans for business information systems can limit the losses in terms of revenues and the costs for possible damage caused by downtime and technical assistance expenditure.

3. Safeguarded business operations

Every organization has some critical processes that must always be active and are essential for business continuity. Implementation of DR strategies ensures that such procedures are preserved and possible interruptions (if any) are minimized, thereby resuming operations in a shorter period.

4. Protected business reputation

Business downtimes caused by unexpected incidents or disasters can seriously threaten and hamper the firm’s reputation. Therefore, a short recovery plan can avoid the irreversible damage to the organization’s image and, in turn, add to its business strength.

5. Granular management

The DR solution enables to manage replications in a granular way. This means that the data at a file level or even smaller units is monitored, managed, and restored, to carry out the complete recovery of data and services.

6. Consistent performance

Business infrastructure replication on one or more disaster recovery sites or off-premise locations ensures that there is no impact on business performance, thereby allowing all the systems to stay online constantly.

7. Chance to customize your DR

Another benefit of DR is exercising the chance of customizing and monitoring your own disaster recovery plans. You can figure out your own replication frequency based on the business needs of your organization.

Cloud vs. On-Premise 

Making decisions regarding disaster recovery and business continuity (BC) is critical for any organization which wants to restore its operations post an unexpected incident. Planning, implementing, and testing a recovery strategy can sometimes take years. 

With the technological development that we witnessed in 2021, many organizations are operational in cloud infrastructure or on-premise framework. Hence, it is crucial to determine what type of DR solution best fits the company’s needs — a cloud or on-premise approach. 

Listed below are some of the significant and noteworthy advantages and disadvantages of the on-premises DR plans and Cloud DR plans: 

1. Advantages of on-premises DR

    • Onsite servers will allow for more control over your server
    • Keeps company data private
    • No third-party associated
    • Data is accessible without internet access

2. Disadvantages of on-premises DR

    • Increased capital investment
    • Limited scalability with the growth of the organization
    • Need for space to build or store hardware
    • Added cost maintenance, management, and IT support
    • No guaranteed uptime
    • Data loss is more likely to occur during a disaster

3. Advantages of cloud DR

    • No onsite hardware building costs
    • Scalable to the growth of your business
    • You’ll only pay for what you use
    • Easily connect to the cloud from anywhere, using any device
    • Backing data up to the cloud can happen as often as every 15 minutes

4. Disadvantages of cloud DR

    • An internet connection is needed to access company data
    • Trusting a third-party to keep data secure
    • Ongoing cost

Based on the advantages and disadvantages of DR solutions for cloud and on-premise, an organization can decide whether to invest in the former or the latter. However, it is essential to highlight that the costs and benefits of the cloud are too good to pass up and substantial compared to the on-premise framework.

Conventional DR solutions are often cost-ineffective, given the expense of the hardware, software, and skilled engineering staff needed to run them. However, in contrast to traditional solutions, cloud DR services offer a cost-effective alternative to traditional disaster recovery methods. Cloud DR, whether run by an organization or as a service (DRaaS), makes it easy to replicate data to multiple (remote) locations and restore the business faster in the event of a cyberattack or disaster of another sort.

A cloud-based DR solution provides more scalability and flexibility in terms of what, where, and how much data can be stored offsite at remote locations. Cloud DR also helps free up critical resources (e.g., employees) to shift them to core competency projects. 

Cloud DR allows an organization to configure and construct the architecture that suits the unique needs of its business. Cloud DR offers a unique platform to transform and protect the organization’s business more efficiently with greater agility than the conventional on-premise DR solution. With the cost-effectiveness factor and ease of use and control of the cloud environment, cloud DR solutions edge past the traditional DR solutions by a margin.

In conclusion, while it is important to protect the organization’s business in case of disaster, adopting the right recovery strategy is equally crucial. A common starting point for many organizations is establishing a physical DR policy. 

However, upgrading a company’s DR strategy to the cloud may protect the enterprise from ruin with better recovery options. Some organizations opt for a 3-tier approach that involves two local sites and a third disparate site elsewhere. This 3-tier approach can only be accomplished by leveraging cloud platforms.

Thus, with diverse BC/DR questions posed by organizations of various types, cloud-based disaster recovery platforms inevitably crop up in the answer. Cloud DR solutions are better suited over convention DR solutions because:

    • Continuous system availability irrespective of private, public, or hybrid cloud framework
    • A lower total cost of ownership 
    • Flexibility to optimize business continuity

It is difficult to predict where technology is headed with its current development and growth, but it seems clear that on-premises DR solutions are now seen as a precursor to cloud-based DR solutions.

Case Study: 5 Disaster Recovery Examples 

Disaster recovery is easy to explain in the abstract but harder to understand at the implementational level, in the real world scenario. Here are some of the DR examples where the organizations have taken creative approaches to deal with disasters: 

1. Womble Carlyle Sandridge & Rice PLLC, Winston-Salem, N.C.

About the company: Womble Carlyle is one of the largest law firms in the mid-Atlantic and the Southeast, with approximately 450 lawyers. The company was founded in 1876.

Challenge: The firm includes hundreds of laptop-toting attorneys who travel across the globe and expect to get the same level of reliability as with desktops. When there are issues with a laptop, the IT staff has to step in and immediately restore its functionality.

In the past, the IT staff responded to a laptop crash by talking the attorney through the restoration process via phone or by sending a CD or a new hard drive overnight. Such a process generally took hours to restore the laptop. To avoid the time spent on the recovery task, the firm wanted a faster way that reduces the productivity loss for its lawyers.

Technology: To solve the problem, the firm’s IT department considered several options, including homegrown backup procedures and backup products from Veritas Software Corp. However, the IT staff ultimately decided to use a PC or laptop backup technology called Connected TLM from Connected Corp. in Framingham, Mass.

Compared to the simple backup/restore programs, Connected was able to provide a comprehensive backup of all files, which included registry files, data files, browser favorites, etc. All this was done in the background, so the end-users didn’t even know if it was happening and could essentially carry on with their critical tasks.

After a pilot test, the Connected system was deployed on 400 laptops and 50 desktops.

Result: Laptop restorations that used to take four to six hours now took 45 to 60 minutes. With Connected, Microsoft Word was restored to its original state within eight minutes.

Womble Carlyle says that Connected’s backup is more storage-efficient than anticipated, implying that after backing up the operating system and standard applications once, it only backs up the files that are different on each machine, thereby enhancing the storage efficiency of the restored systems.

2. Corio Inc., San Carlos, Calif.

About the company: Corio, which is now acquired by IBM, is an application service provider that delivers enterprise software over a secure global network for a fixed fee.

Challenge: Corio manages mission-critical data for its customers, which requires real-time security for event monitoring on a per-customer basis.

Technology: Corio uses Counterpane Internet Security Inc. (acquired by BT) as its managed security provider. The firm also brought in software from ArcSight Inc., which monitors and correlates a wide range of security devices, including firewalls and intrusion-detection systems, while also providing reports.

Result: ArcSight software provided a window into the organization’s environment at a central console. It also provided customer-specific views on the console. Corio Inc. says that each customer’s traffic has a signature, a pulse, and ArcSight allowed them to look for anomalies too.

According to Corio Inc., ArcSight’s collected data from numerous security devices with potential labor savings while doing so. “I’d have to have an army of people [to monitor] all of the logs from sensors,” a Corio Inc. representative said. Besides, ArcSight provided high-level executive reports on security activity to its customers. 

3. American Tower Corp., Boston

About the company: The company builds, owns, and operates towers for cellular phone companies and has about 14,400 sites in the U.S., Mexico, and Brazil, including about 300 broadcast tower sites.

Challenge: American Tower is an unpopular company because there are so many opponents to building towers, so the goal of the IT staff is to keep hackers, critics, and competitors out of its systems.

Technology: Instead of waiting for vendors to post signature files for new hacker attacks and cleaning up after virus attacks, American Tower wanted something that would stop intruders before they could get in at all. So, they turned to StormWatch software from Okena Inc. in Waltham, Mass. Unlike software that relies on attack signatures, StormWatch focuses on the behavior of critical applications. 

Result: The software has been in production use for several months at the cost of $18,000, and “It’s amazing, the things it has stopped,” American Tower says. According to American Tower, “Most software detects. This software detects and prevents“. But the company added that the StormWatch reports list all of the hacker attacks that have been rebuffed.

According to American Tower, the CPU performance hit from StormWatch has been minimal, at just 2% of CPU utilization.

4. Baltimore-based automobile insurance company

Challenge: After a bad experience with their prior disaster recovery provider, a Baltimore-based automobile insurance company sought a new disaster recovery provider who would give them the time and attention they needed. With a high volume of sensitive data processed on their server, the security of their data was a focal concern. The company wanted to secure its in-house code so that it allowed for easy and consistent data transmission and managed its data through a protected outlet. The company also requested frequent tests of its disaster recovery plan.

Solution: The company obtained DP Solutions’ disaster recovery services, including 24x7x365 tech support and a disaster recovery strategy they could depend on along with new hardware for their network servers. DP Solutions also helped the client create their own network and assisted with upgrading them to an enterprise-class network by working closely with their team to understand the structure of their in-house code and understanding their ideal server infrastructure.

Technology: With DP Solutions’ disaster recovery services, the organization receives fully-managed backups and offsite replication of their data. DP Solutions provided them with new switches, firewalls, malware protection, and a Storage Area Network (SAN), making several internet connections accessible to their users. The organization also acquired an enterprise-class network to better allow server permanence and regulate the server so that a single switch failure would not cause a network crash or another switch to fail.

Results: The company now has an entire site protected in the event of a disaster, as well as dependable network infrastructure. DP Solutions’ services have helped increase the availability and uptime of server applications at their corporate headquarters.

5. Gaille Media

About the company: In August 2017, Hurricane Harvey hit Southeast Texas, ravaging homes and businesses across the region. Over four days, some areas received more than 40 inches of rain. And by the time the storm cleared, it had caused more than $125 billion in damage.

Challenge: Countless small businesses were devastated by the hurricane. Gaille Media, a small Internet marketing agency, was almost one of them. Despite being located on the second floor of an office building, Gaille’s offices flooded when Lake Houston overflowed. The flooding was so severe that nobody could enter the building for three months. And when Gaille’s staff were finally able to enter the space after water levels receded, any hopes for recovering the space were quickly crushed: the office was destroyed, and mold was rampant. The company never returned. 

Solution: However, its operations were hardly affected. This was because Gaille kept most of its data stored in the cloud, allowing staff to work remotely through the storm and its aftereffects. Even with the office shuttered, they never lost access to their critical documents and records. In fact, when the time came time to decide where to relocate, the owner ultimately decided to keep the company decentralized, allowing workers to continue working remotely. Had the company kept all its data stored at the office, the business may never have recovered. 

Top 8 Best Practices for Disaster Recovery in 2021 

Disaster recovery is an essential part of keeping data safe and maintaining business continuity. Each organization and its businesses are different, so it is important to understand all the choices available for that organization. This slows the organizations to pick the plan that best suits their needs. The top eight best practices for disaster recovery in 2021 are:

What has the ability to recover information or systems in the event of catastrophic disasters?

Best Practices for Disaster Recovery in 2021

1. Backup-as-a-service (BaaS)

Backup-as-a-service is similar to backing up data at a remote location, where a third-party provider backs up an organization’s data. This is the simplest type of disaster recovery and entails storing data off-site or on a removable drive. However, just backing up data provides only minimal business continuity help because the IT infrastructure itself is not backed up.

2. Cold site

In this type of disaster recovery, an organization sets up basic IT infrastructure in a secondary location, which is rarely used but provides a place for employees to work after a natural disaster. It can help with business continuity because daily operations can continue. However, it does not provide a way to protect or recover important data, so a cold site must be combined with other disaster recovery methods.

3. Cloud-based disaster recovery

When using a cloud-based approach, an organization can cut costs by using a cloud provider’s data center as a recovery site, rather than spending on its own data center’s facilities, personnel, and systems. Before committing to this method, the organization needs to determine the challenges providers may face with its business backup and recovery. The provider may be able to assist the organization in fixing those problems before the cloud becomes a part of its DR plan.

4. Data center disaster recovery

The physical elements of a data center can protect data and contribute to faster disaster recovery in certain types of disasters. For instance, fire suppression tools will help data and computer equipment survive a fire and recover from it without much damage. A backup power source will help businesses sail through power outages without grinding operations to a halt. However, none of these physical disaster recovery tools help in the event of a cyberattack.

In this approach, the DR plan is not limited to the computing facility it is housed in; the entire building plays a key role. In a data center DR, all the features and tools within the building, such as physical security, support personnel, backup power, HVAC, utility providers, and fire suppression, affect the recovery strategy. 

In an unexpected outage event, these elements within the building must be in working order and functioning as in normal scenarios. However, it is important to highlight that even if everything is functioning correctly, the data center can still be susceptible to a natural disaster.

5. Disaster recovery-as-a-service (DRaaS)

While disaster recovery-as-a-service (DRaaS) is often based in the cloud, it is not exclusively cloud-based. Some DRaaS providers offer their solutions as a site-to-site service, where they host and run a secondary hot site. 

Additionally, providers can rebuild and transport servers to an organization’s site as a server replacement service. On the other hand, cloud-based DRaaS enables users to orchestrate failback to rebuilt servers and reconnect users through VPN or remote desktop protocol.

In the event of a disaster or ransomware attack, a DRaaS provider moves an organization’s computer processing to its own cloud infrastructure, allowing a business to continue operations seamlessly from the vendor’s location, even if an organization’s servers are down. 

DRaaS plans are available through either subscription or pay-per-use models. There are pros and cons to choosing a local DRaaS provider. For example, after transferring to DRaaS servers that are closer to an organization’s location, latency gets lower. However, in the event of a widespread natural disaster, a nearby DRaaS may be affected by the same disaster.

6. Hot site

A hot site maintains up-to-date copies of data at all times. Hot sites are time-consuming to set up and more expensive than cold sites, but they dramatically reduce downtime.

7. Point-in-time copies and instant recovery

Point-in-time copies, also known as point-in-time snapshots, make a copy of the entire database at a given time. Data can be restored from this back-up, but only if the copy is stored off-site or on a virtual machine unaffected by the disaster.

Instant recovery is similar to point-in-time copies, except that instead of copying a database, instant recovery takes a snapshot of an entire virtual machine.

8. Virtualization disaster recovery

Virtualization negates the need to reconstruct a physical server in the event of a disaster. The organization may also be able to achieve the targeted recovery time objectives (RTO) more easily by placing a virtual server on reserve capacity or the cloud. 

Organizations can back up certain operations, data, or even a working replica of its entire computing environment on off-site virtual machines which will remain unaffected by physical disasters. Using virtualization as part of a disaster recovery plan also allows businesses to automate some disaster recovery processes, bringing everything back online faster. 

For virtualization to be an effective disaster recovery tool, frequent transfer of data and workloads is essential, as is good communication within the IT team about how many virtual machines are operating within an organization.

Takeaway

Disaster recovery is an area of security planning that aims to protect an organization from the effects of unexpected catastrophic events, such as power outages, cyberattacks, civil emergencies, criminal or military attacks, and natural disasters. Having a disaster recovery strategy enables an organization to maintain or quickly resume its business-critical functions following a disruption.

Creating a comprehensive disaster recovery plan is difficult, but that doesn’t mean it has to be impossible. Find which approach is the right fit for your organization to protect your data from cyberattacks, natural disasters, and simple human error.

Did this article help you understand the basics of disaster recovery? Comment below or let us know on LinkedIn, Twitter, or Facebook. We’d love to hear from you!