A Storage Technology Blog by a Technologist

Categories

Archive

Blogroll

Lijit Search

Lijit Search

How green are you?

I have saved 1.99 pounds of paper by writing online.
Want to go green?

Spam Blocked

Meta

Subscribe in NewsGator Online

SEARS® Guarantees 40:1 or better garbage to trash bag ratio!

October 9th, 2008 by Steven J. Schwartz
The exterior of a typical Sears department store.

Image via Wikipedia

     In recent news, and in reaction to H-Ds claim 50% tire reduction announcement, SEARS® released claims that using it’s trash compactors will condense the amount of trash a given household puts into trash bags.  There are of course some limitations and requirements.   Firstly, this will be measured over 30 days.  Secondly, if you are only filling a single bag of trash a week currently, you must increase your trash output to 5x this amount as a minimum.  Lastly, you must be throwing out certain types of trash.

 

 

    I know I’m cheating here, taking my idea from before and playing it out again, but give me a break guys!  Your customers aren’t this naive!

 

In similar news, SEPATON released its own “quarantine”:

 

SEPATON guarantees that the FastStart Plus Deduplication Package will reduce the capacity of backup data at a 40:1 ratio under the following conditions: Guarantee only applies to currently GA backup applications with Microsoft Exchange 2003 and 2007 Agent, Oracle 10 and 11 using flat files or RMAN. The package must be installed and configured by a SEPATON Professional Services representative. The customer must
follow SEPATON best practices including performing full backups of applicable data at least five times per week for thirty days

 

Now to be fair, SEPATON, is doing much more then just just compression, but requiring customers to go from traditional back-up practices of weekly full backups, to daily full backups is NOT a fair requirement.  I am beginning to question the marketing firms and marketing teams that are working on these programs.  There is a time in our industry that it feels like the vendors believe that our customers are stupid…that’s right I said it.  I would like to hear from customers, end-users, and IT professionals on this…is anyone buying into these marketing programs at the worker levels?  Or are these programs being pushed on your by executives that don’t understand technology?

 

 

 

Reblog this post [with Zemanta]
[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Posted in Backup and Recovery, Deduplication, Start-up | 4 Comments »

Dumb and Dumber – Why don’t we have intelligent deduplication?

October 7th, 2008 by Steven J. Schwartz
dumb and dumber

Image by ~C4Chaos via Flickr

     So I’ve been following the deduplication market and products for a good amount of time, and what amazes me, is the lack of thought behind today’s solutions, and the recent  blind adoption of deduplication on primary data sets.  Now, I’m not referring to application level deduplication that I’ve discussed before, but appliance or storage based deduplication products.

 

     The fundamental problem I see with primary volume deduplication on “live” data sets, is the complete lack of intelligence of the deduplication service.  It doesn’t matter how often a data set is read, or by how many applications, current products treat the data the same.  So, what is my primary concern?  “Hot Spots” on the disk sub-systems.  This is something that has been around as a concern, and a reality for databases, and will continue to grow as an application problem.

 

     Where else have I seen disk hot-spots recently?  NFSroot in the HPC environment, when clusters are configured with diskless nodes, and a single boot image.  The idea, is that most of a Linux OS is the same binaries, and that you should be able to boot an entire cluster off of a single image, plus some extra space each node needs for swap and configuration information.  Did anyone honestly think that some storage vendor(slide 6) or x86 based virtualization engine came up with the idea of a single boot image?

 

     The problem with this method, and areas that need to be addressed, how do we handle moments of massive access against the same data set.  In the instance of NFSroot, when a cluster is booting, or when applications change, the NFSroot loaded operating system puts a serious strain on the storage sub-system. 

 75px-Dumb_%26_Dumberer_film[1]

     So the primary problem with traditional storage systems, is they only have 2 levels of service for a  given volume at a given time.  They have cache and they have the the HDDs that the volume is configured on.  Cache is typically limited, due to cost, and the ability to maintain data state in the event of a power failure, however, locking a volume into a single type of HDD is a thing of the past.  Several years ago SAM-FS was able to utilize it’s file system for HSM (Hierarchical Storage Management), also known as ILM (Information Lifecycle Management). however, these technologies where really tied to applications for open systems, and typically were one-way data movement, from expensive production quality disk, to archive disk, or active tape.  FalconStor, a few years ago, within the IPStor product had 75px-ADN_static[1]something called HotZone®, which created a virtual extended disk cache via RAM or SSD, more of a caching head then a full tiering solution.  Products like, Dell | Equallogic PS-series storage with automated load balancing and storage tiering, and Compellent’s Data Progression™, give a new option for data access.  In both cases, when areas of data require higher levels of performance, these solutions have the ability to migrate that data to a higher class of storage, thus eliminating, or minimizing hot spots within the data set.

 

      So, it would sound like I’m plugging products based on the previous paragraph, however, what I’m trying to point out, is that deduplication is prone to quickly cause hot spots within applications, and could be especially risky for Virtualized OSs running on Deduplicated storage.  Enhanced virtualization within the storage layer can help reduce these areas of contention.  So, what is my solution? 

  • Products that are offering deduplication services for primary production volumes will need to at some point need to address
  • As virtualization software continues to move towards single image storage (and deduplication at the application layer), storage vendors will have to keep up with the ever changing storage performance requirements.
  • Deduplication intelligence needs to mature to the point where it can be aware that not all data regardless of sameness, night still need to be excluded.

     I would love to hear peoples thoughts on this!

 

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Posted in Deduplication, SAN and NAS | 1 Comment »

Harley Davidson can Guarantee 50% Less Tires on the Road compared to Fords Traditional Cars and Trucks!

September 30th, 2008 by Steven J. Schwartz
Harley-Davidson WL

Image via Wikipedia

 

     In Recent News, and as an outstanding announcement for travelers everywhere, Harley Davison motorcycles (with some caveats) can Guarantee a 50% reduction of tires in use on the road compared to traditional cars and trucks made by the Ford Motor Company.

 

 

 

 

 

Using the following math:

 

50% less tires compared to a baseline of traditional cars and trucks. The baseline is determined from the amount of tires touching the ground on a car/truck and the amount of tires that a vehicle of transport requires. For example, suppose that you need a vehicle to get you from point A to point B. Here’s how we calculate the baseline:

  • Default Harley Davidson Motorcycle has only 2 wheels/Tires, add on 100% overhead for 4 wheel ability, plus an additional spare tire.
  • Total tires required to get from Point A to Point B  – 2 tires on a traditional auto this number is 4 tires plus a smaller spare tire.
  • 50% less tires means a traveler will only need to purchase 2 tires to get from Point A to Point B

 

In Similar News, NetApp releases “50% Virtualization Guarantee*”

 

50% less storage compared to a baseline of traditional storage. The baseline is determined from the amount of data to be stored and the amount of storage overhead that a system of similar protection and performance levels typically requires. For example, suppose that you need a system to accommodate 10TB of data. Here’s how we calculate the baseline:
•  Add on 100% overhead for RAID 10 protection; 2.6% overhead for rightsizing and formatting; and two spare drives.
•  Total raw capacity required for 10TB of data on a traditional storage system is roughly 21.75TB.
•  50% less storage means that the customer will need to purchase only 10.75TB of raw space with NetApp.

Obviously, there are other significant savings with NetApp, reduction in Hot Spares, Deduplication, etc.  I just thought it was funny that the baseline for storage chosen wasn’t another RAID6 based configuration, but comparison to a RAID10 deployment.

 

Also, that URL isn’t a typo, guarantee was in fact spelt wrong.

 

 

Enhanced by Zemanta
[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Posted in Deduplication, Enterprise, NOT SAN Related, SAN and NAS, virtualization | 14 Comments »

Follow-Up to Deduplication…

September 27th, 2008 by Steven J. Schwartz

     In the past several months I’ve written a couple times about Deduplication, mainly in regards to my feeling that it is a feature and not a product, and more recently looking at NetApp’s implementation of A-SIS.  I also mentioned the announcement of a newer blog DEDUPEMATTERS.com, which is run by Data Domain, here.

image

     It isn’t that I’ve been ahead of the curve with Deduplication, it just continues to come up as a checkbox mark in every storage discussion.  Why?  Mainly because of the exponential growth of storage, and storage retention requirements for both public and private companies (please ignore the current economy turn down in the United States, and while this has an impact short term, I believe long term projections will still be accurate).

 

     So where am I going with this train of thought?  I honestly believe, as I’ve stated before, that Deduplication is a feature, and primarily a feature of backup storage/suites and server applications.  This is a personal belief for the following reasons:

 

  • The act of Deduplication on a data set, in both online and post-processing activities, is a compute and storage intensive process.
  • The act of data re-hydration, or re-duplication, is also an intensive process, but mostly can have a storage capacity ballooning effect.
  • A data set which has been through Deduplication forces data into a consolidated format, which, in certain instances can cause disk hot spots and data access performance to be lowered.
  • Current methodologies for Deduplication are based only on capacity savings, and not the important of data access, nor application performance.

     What does this all mean regarding features vs. products?  How does this apply to your implementation of Deduplication?  What does this mean for Deduplication of primary storage volumes?  Let’s explore this:

 

Product vs. Featureimage

 

     I would like to make a parallel here to Storage Virtualization.  Some time ago, there were a plethora  of “Heterogeneous” Storage Virtualization products/appliances.  The biggest issues with these products/appliances was a very common IT dilemma; what I call the IT Triangle!  There is no way to get ALL three without on of the corners suffering.  If you want the highest performing, and highest resiliency, you end up with the HIHGEST COST.  So in order for Virtualization products/appliances to stay cost effective and provide “heterogenous” storage support, they sacrificed performance and/or reliability.  So the market dictated, that this level of functionality should be based within storage devices, and that "the flexibility” of true heterogeneous support would become less of a priority.

 

     Deduplication products/appliances typically have the same problem, however, what the target is that they are deduplicating will have much different requirements.  I will touch upon this shortly.  So, the real question is, what are the sacrifices you are willing to make with a deduplication product/appliance in your environment?  Are you willing to par extra for an additional product in the IT infrastructure for deduplication of the backup stream?  Do you want to run your NAS environment on a 3rd party solution in order to take advantage of block based deduplication, when file level deduplication might be built into your current file serving solution?  Would you be willing to place an in-data-path appliance between your application servers and your primary storage in order to leverage block based deduplication, knowing that it may have significant storage savings, however, at a cost to application performance?

 

How Deduplication is used in Environments Today!

 

     YOU ARE ALREADY USING DEDUPLICATION TECHNOLOGIES!!!!!  You might not even know it!  There are several technologies that ARE Deduplication technologies present is MOST datacenters today.

 

  1. Are you running Exchange 2000, 2003 or 2007?
  2. Are you utilizing Windows Storage Servers?
    • Well starting with Windows Storage Server 2003 RC2, there is file level Deduplication within volumes and set per volume.
  3. Are you utilizing any pointer based snapshot technology within your storage system, or VSS within Windows?
    • Once again, this is a form of data Deduplication, specifically around data protection.  Storage arrays that utilize a pointer based snapshot technology allow virtual backup copies of a volume set, this is the case when utilizing VSS within windows as well, just handled at the OS level rather then the disk storage level.  (some storage providers can utilize VSS functionality to use disk based snapshot technology to take OS and Application consistent snapshots at the hardware layer, rather then the default software layer.
  4. Do you utilize COTS applications running on a Database?
    • Many database applications utilize record linking in order to minimize multiple copies of the same data rows/columns/table spaces.

     So what do the above examples show?  Application/OS based deduplication which is a feature of a larger application set, not a product unto itself.  Primary storage features that over several years have become relatively mainstream features. (note:  NOT ALL Snapshots are created equal!).

 

     There are also deduplication features available for most backup packages for helping reduce the footprint of the backup environment.

 

Deduplication of Primary Storage Volumes

 

     So Primary Storage Volumes seem to be the next logical discussion point.  Catching up on my questions earlier, virtualization appliances gave heterogeneous storage support, and cross platform data services, however, at a performance degradation, as well as with additional cost.  Most customers I’ve come across in recent years are so concerned with performance, that detailed application assessments, and deep technical dives into storage performance was required in order to drive purchase decisions.  The number of saved perfmon exports, and IOSTAT redirects that I’ve looked at an analyzed through tools sets continues to grow.  So, as Stephen Foskett recently put, ”deduplication is not yet ready for prime time in primary storage applications”, it is however, readily present and ready for production use in other areas.

 

     So, high IO, and low latency requirements for storage need to be seriously looked at as applications that aren’t “storage hardware feature” ready for deduplication.  Applications can be more intelligent typically about deduplication, minimizing performance impact for a very specific data set, which just hasn’t been seen yet in the storage industry’s feature set.

 

Final Thoughts

 

image

     So I am going to contradict myself.  Several years ago I was actually a very big fan of Virtualization appliances, they were a non-perfect stop-gap for the storage industry.  My customers wanted strong storage services, like Snapshots, site-to-site replication/mirroring/archiving, and heterogeneous storage pooling.  They were willing to make an investment in products like SANSynphony, IPStor, and SVC, in order to gain an agnostic storage approach, re-deploy older storage, and leverage cheaper featureless storage arrays.  The storage vendors caught up however, and began offering better performance with the same feature sets, and, in general, the virtualization appliance went away.  I believe the same is occurring with deduplication appliances.  This is a good stop-gap until the application providers and storage vendors come up with better native deduplication technologies and support.  So yes, while I STRONGLY feel that Deduplication is a feature of either applications or storage hardware, for the time being deduplication appliances will continue to be prevalent, just a stop-gap though!

 

 

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Posted in Backup and Recovery, Deduplication, SAN and NAS | 1 Comment »