A Storage Technology Blog by a Technologist

Polls

Do you use a single storage protocol in your IT infrastructure?

View Results

Loading ... Loading ...

Sponsors

Categories

Archive

Blogroll

Meta

Subscribe in NewsGator Online

Mirror Mirror on the Wall, whose the fairest of them all…

October 2nd, 2008 by Steven Schwartz
FedEx Ground delivery truck

Image via Wikipedia

Local and Remote Data Protection (NOT a RAID discussion)

 

     Ever since building out the BCDR service at StorageTek, I’ve been extremely interested in replication technologies.  Off-site data storage is nothing new for IT infrastructure, however, how various vendors accomplish this and the definitions around those technologies.  I’m going to define technology terms ranging from real-time data transfer through traditional tape vaulting.  Why do i even bother?  Mainly I think there is a knowledge gap in this area among end-users, including impacts of site-to-site networking requirements.

 

     The following are some basic general types of get data off-site, I am listing these in RPO(recovery Point Objective) order, from least data loss to most potential data loss.

 

  1. Synchronous Mirroring
  2. Asynchronous Mirroring
  3. Delta-Based Replication
  4. Tape/Disk Vaulting/FedEx Truck Method

     I will define and explain my interpretation of these technologies, and some of the caveats.

 

Synchronous Mirroring

    

     The idea of Mirroring data really falls into two areas, Synchronous, and Asynchronous.  Synchronous Mirroring requires two distinct data sets to be transactional at the same commit phase.  What does this mean?  If you follow a SCSI commit model for a write IO, it means that a data set needs to be confirmed and committed on both sides of the Mirror in order for the application to move forward.  There are a few ways to achieve this from a storage infrastructure point of view.  From an RPO perspective, it insures that 2 unique data copies exist that are perfect copies of each other.  Every transaction is protected.

 

     One way, which pretty much allows use of any storage is a Host based software package such as Veritas Foundation Suite, or DoubleTake.  This applications, in simple terms, create a split write driver later in the Disk device stack, this driver allows an application IO to be written down to two different storage devices, and to ensure Synchronous state, both devices must respond with receiving and image committing the data set.

 

     A second way, which allows flexibility in a SAN or NAS environment would be to utilize an In-Band Appliance, or storage service enabled FC swtich,  for mirroring.  In this case, it is no longer the burden of the OS to handle these storage services, minimizing the impact on the server hardware.  Products such as IBM”s SVC or FalconStor’s Network Storage Server are example of In-Band appliances.

 

     A third option, is utilizing features within a SAN or NAS storage device.  Many vendors have Synchronous Mirroring as an option within the storage device, some as simple as within the same array mirroring, others allowing full site-to-stire mirroring.  Products like EMC’s SRDF, IBM’s PPRC, and NetApp’s SyncMirror.

 

     So what are the CONs of a Synchronous Mirror, because at the surface, it looks like there are many options for technology and an RPO that on the surface looks great!   CONs: Site distance limitations.  As you move sites father apart the reality of Synchronous Mirroring goes away.  With increased distances comes increased latency, which in an transaction IO application could mean huge performance impacts.  Also, the costs associated with a point-to-point network typically required for S. Mirroring are still prohibitive.  Corruption and viruses need to be looked at.  In an Synchronous Mirror, a corruption in production means a corruption at the secondary site.  Same goes for viruses, and data deletions.  Lastly, just because the data is there, that doesn’t mean it is accessible, databases will have to go through recovery processes, unless mirroring is being done at the Database Application level.  An additional requirement, which may be considered a CON, is that it is typically a requirement to have identical performing storage configurations fro both sides of the mirror, if the secondary storage system is slower to respond, it will reduce the performance of the primary data and application.

 

Asynchronous Mirroring

    

     Similar to Synchronous Mirroring, Asynchronous Mirroring has a near real-time copy of the data set.  However, there is a HUGE difference between these technologies.  The biggest of them being, the remote/second copy of the data DOES NOT need to be committed for the primary data and application to move forward with IO.  This means that, in theory, latency to the second data set should have little to no impact on the primary data set and application performance.  It also means, that network latency, should have little to no impact on the primary storage as well.

 

     So what kind of solutions work for this type of mirroring?  Believe it or not, pretty much the same as Synchronous mirroring technologies, however, typically the costs associated with asynchronous mirroring are less, including the planning and services involved to get this type of data protection up and running.  Additionally, mid-range storage devices typically have this as a feature set as well.

 

     There are of course negative implications with using asynchronous mirroring as well.   The same risks of replicating data corruption is present for asynchronous mirroring, typically the secondary site is only several transactions behind.  There might need to be a caching mechanism to help store transactional data while the network is running with higher latency, and in the event of network failure, complete recreation of the mirror might need to occur.  However, the benefits can out-weight the CONs.  The ability to reduce network costs, reduce secondary site storage costs, and reduce storage feature costs (for most non-financial institutions) far exceeds the potential loss of data (or decreased RPO).

 

Delta Based Replication

 

     This method for data transfer is much different than mirroring technologies.  The premise of delta based replication is that changed data and new data is stored locally at the primary site, until either a watermark, or time schedule is hit, then those writes are transmitted as a batch to the remote storage location.  Some vendors also include compression and acceleration as part of delta based replication models.  The premise is, RPO has a much lower requirement, networks are still very expensive, and delta based replication can augment traditional off-site storage of tape backups.

 

     While delta based replication has some serious limitations from a transactional business application perspective, for many business applications this method works perfectly.  VMWare’s SRM functionality will also work with this type of DR strategy.  Typically, the remote cop of data is a different class of storage, and replication can be accomplished through backup software, deduplication software/appliances, in-band and out-of-band storage appliances, as well as a native tool within a storage array.  Flexibility and affordability really become the top positive attributes of delta based replication.  Also, distance from primary sites to secondary sites becomes irrelevant, there are customers using delta based replication across continents today.

 

    So what are the negative impacts of this technology?  Data recovery and recovery point objectives will suffer.  Planning around network utilization is typically overlooked as the network requirements are typically not as stringent.  The ability to recover data at the secondary site is often overlooked, as well as the ability to restore that data back to the primary site.  While data corruptions can be transmitted to the remote copy, this can typically be caught, and if the technology being deployed uses remote snapshot technologies, then recovering from a replicated corruption is typically easier.

 

Remote Tape Vaulting (FedEx Truck method)

 

    Typically people don’t use FedEx for sending tapes off-site anymore.  There are plenty of truck transport companies that allow for secure transfer of tapes (although in the news is regularly a report of tapes misplaced or stolen).  I’ve actually even seen local governments store tapes in the evidence rooms of a police station across town from the city’s data center, and have them picked up the policemen on duty at the beginning of the business day.  This method is FAR from over.  Tape has yet to be killed.

 

     So what is the benefit of tape?  In almost every case, the actual cost of tape media is far less then active disk.  Tape allows the ability to have long term archiving of data that is no longer active.  Tape uses no electricity when sitting on a shelf!  This is about where the benefits of tape end.

 

     So the problems with tape?  Anyone who has had to recover a data set from a tape, or group of tapes knows how scary that is, even with current technologies that double check media readability, I never feel “ok” until the data is all restored.  Tapes, because of the portability can be stolen, misplaced, lost, and destroyed. 

 

     Clearly this post was much longer then I intended it to be, I’ll most likely revisit some of my thoughts here over the next several days.

 

     Posts always welcome.

 

 

 

Enhanced by Zemanta

Posted in Backup and Recovery, Enterprise, SRM | 4 Comments »

  • http://opensystemsguy.wordpress.com/ open systems storage guy

    Another issue to be considered- when you have any disk based replication, it’s being replicated on the disk level and is not usually “application aware”. This means that if you experience a power failure at the primary site between a data write and a log write on a database file in a synchronous environment, there’s a fairly good chance your remote copy of the database won’t boot without some intervention. Typically the answer to this has been periodically running a script which would flush the app write buffer to disk, flush the writes to the remote site, and then take a “consistent” point in time copy. This turns your recovery point into the time of the last script run.

    There are ways around this- some people do database replications once a day and then just replicate the logs synchronously. This keeps your recovery point low, but increases your recovery time. Also, some vendors offer addons that will manage the scripting process for you.

  • Geoff Mitchell

    Steve,
    Surprised you didn’t include Continuous Data Protection as another option here. I know this falls somewhere on (or between) asynchronous and delta based, but provides some advantages, especially in the area of having to deal with replicated issues from prime to secondary site.

    EMC works around the distance limitation of SRDF by doing a synchronous local hop followed by an asynchronous distance hop, which is great if you can afford three DMXen.

  • Ryan

    Double-Take is in the asynchronous category rather than the synchronous. The Double-Take driver in the I/O stack takes note of writes to disk and passes those operations up to the Double-Take service, which in turn handles the network communication, including all the things needed for asynchronous reliability like queuing, bandwidth management and integration with snapshot technology to address replication of data corruption.

  • http://wwwjoe-daddy.blogspot.com/ Joe Daddy

    Interesting Blog you have Steve, I’ll be dropping by more often. Impressive work.

    Joe Daddy