The disaster recovery (DR) mandate in the IT community has long been one of “good enough.” “If we replicate data to a remote location that is good enough.” “If we buy the DR package from our storage area network (SAN) vendor, it will be good enough.” These options had to be good enough, since few practical alternatives existed. IT administrators settled for access to remotely copied data, since the job of actually implementing comprehensive DR was too complex for most storage and data protection solutions.
That is no longer the case, and organisations need to move beyond the “good enough” paradigm and adopt practices that will truly help them recover full IT business operations after a disaster – primarily from their customer’s perspective.
Any organisation that has had to recover operations after a failure, using only copied data volumes, knows the drawbacks of that strategy. Without business application recovery, enterprises cannot deliver service level agreements. Instead, they should look for service-oriented data protection that incorporates remote replication with intelligent awareness of all IT service components. These should include applications, operating systems, servers, networking and storage. When enterprises have reliable recovery automation, they can restore each component of service in a particular sequence in a reasonable amount of time, achieving true automated DR.
Where replication excels (and where it fails)
Data replication is a critical requirement of DR. Keeping mission-critical information off-site is a good insurance policy for any business. In the event of technological or natural disaster, accessing that data will be among IT’s first tasks to bring business applications back on line. However, those off-site data volumes do not automate or replace the numerous other jobs IT staff must perform so that enterprises can resume IT business operations. Instead, data centre staff is forced to take on the complex manual work of rebuilding servers, installing operating systems and applications, configuring networking, assigning storage volumes, and making sure things start properly and in the correct order. The industry might use the terms “data replication” and “disaster recovery” interchangeably; but frustrated and stressed IT teams know that these are very different things.
In order to implement a true DR solution that will ensure the full restoration of business applications and services, IT must first adopt the service-oriented data protection model. Service-oriented data protection requires thinking about each IT service and what it contains. A basic Web portal, for instance, is not simply LUN 32; it consists of an Apache server, a SQL database, a content management application and other elements. IT teams need to manage the Web portal service as one integrated unit to be protected in its entirety. In order to do this, data protection must be application-aware, meaning it must know how to copy data from a database or e-mail application in a system- or state-coherent manner. When the time stamps for all key components are the same and the data is properly collected, the entire service can be moved or recovered without error or corruption.
Once IT is dealing at the service level and implementing data protection systems that map to the way managers actually run IT operations for the entire enterprise, teams can begin to implement data backup, replication, and recovery jobs on a service-by-service basis. Service-oriented data protection is a big idea, and IT cannot redesign all of its data protection systems overnight. However, enterprises can move towards this ideal in discrete, logical steps.
First, IT teams should focus on replacing their “good enough” replication-focused DR with an automated, service-aware approach. In doing so, businesses should seek out a system that supports both physical and virtual servers, works comfortably with any type of storage or networking, and is able to convert physical servers into virtual servers. The best options are open, integrated architectures that support heterogeneous environments and eliminate vendor lock-in. Better DR will let IT define discrete recovery jobs to map to business applications and high-priority elements. Before finalising DR plans, stakeholders need the ability to conduct non-disruptive testing to ensure accuracy and confidence in their recovery jobs before an actual disaster or failure occurs.
Building DR that exceeds “good enough” expectations
The core work of DR isn’t about replicating data, although that is certainly vital work. Rather, true DR performs the complex tasks of rebuilding servers, reinstalling operating systems and applications, reconfiguring networks and reassigning storage volumes. Today, most IT teams are forced to do that work manually, since their “good enough” DR strategies don’t incorporate automation of such work. When these efforts are performed manually, organisations suffer; manual DR is slower, more difficult and more prone to error.
Now that service-oriented data protection can deliver true DR with automated disaster recovery testing, cloning, failover and failback of data centre operations, IT teams have an attractive alternative to the faux DR for which they previously settled. Mere replication is not equivalent to DR, and enterprises that are serious about business continuity are moving beyond “good enough.”