Al Franken, thank you for giving me the title for this blog entry. For those of you not familiar, Al Franken was a major comedic contributor to SNL (Saturday Night Live for those of you living in the dark ages) and that quote was a title of one of his more recent non-slanted liberal musings. This brings me to my topic of the day, Benchmarks and Benchmarking.
Coming front the world of clustered file systems, specSFS seems to be the standard benchmark that is easily available to access and is the mostly widely abused benchmark of all times. Now before the spec guys get upset with me, the problem isn’t with the specSFS tool-set, the problem is with the vendors who use it (I am guilty of this vendor deception personally).
For those of you not familiar with this benchmark, it is specifically used to test NAS IOPs performance. Some of the best performance ever seen by this benchmark was most recently released by Network Appliance, Inc. They were able to invent a solution that pushed the specSFS benchmark to the next level of performance, the 1,000,000 IOPs solution. Why this is irrelevant I will explain as simply as $. Actually, closer to $,$$$,$$$, please input any 7-8 figure dollar amount you wish, because that is the cost of 1,000,000 IOPs utilizing ONTap GX clustering configurations.
I am going to breakdown the top 5 producing solutions in this entry. This will be broken down into the following categories:
- The solution being provided.
- The Posted Result
- Load generation
- Number of Servers Required
- IOPs per load server
- File System(s)
- Number of file systems (anything more then 1 is bad)
- IOPs per file system
- Number of Disk Controllers
- Number of Spindles
- Speed of Spindles
- IOPs per controller
- IOPs per Spindle
- Relative Cost of Solution
- List pricing for components
- IOPs per $
Before I get going, I’m going to try my best to come up with list pricing costs for these solutions. I haven’t ever ordered these products and everyone of these vendors seem to have pretty complex configuration quoting tools. I’ll do my best!
(This list was based on specSFS results posted prior to 9/01/2007)
- Network Appliance, Inc. – Data ONTAP GX System (24-node FAS6070)
- EMC Corp. – Celerra NSX Cluster 8 X-Blade 60 (1 stdby) 2 DMX
- Panasas, Inc. – ActiveScale storage cluster (60 DirectorBlades)
- Exanet, Inc. – ExaStore EX600FC
- BlueArc Corporation – BlueArc Titan 2200, 2-Node Active/Active Cluster
For those of you who aren’t interested in a LONG read, here is the summary in table format.
ONTap GX is NetApp’s final release of what was once SpinServer by Spinnaker. Let me start by saying Spinnaker had a released production ready solution 4.5 years ago when NetApp “acquired” them. It has taken them 4 years to turn the product into a “NetApp” solution, which based on a quick perusal of features and functionality is no better then the product that was bought so long ago. ONTap GX breaks the NetApp filer tradition in allowing a truly distributed file system, rather then the “active-passive” cluster pairs that everyone has been force to buy is they want an HA NAS solution. The premise of this solution is that filers are in a true N+1 cluster that allows “unlimited” scalability and very high aggregate NAS performance. The potential for this solution is HUGE among large HPC environments that require file sharing across thousands of cluster nodes and thousands of GigE (or faster) networks. The trouble with “large” clustered/parallel/distributed file system solutions is the fact that file lock management eventually becomes a bottleneck. This is due to the serial nature of file locking in a clustered solution. Secondarily, there becomes a problem with file caching and having cache coherency cluster wide.
The stated performance was: SPECsfs97_R1.v3 =1032461 Ops/Sec (Overall Response Time = 1.53 msec)
The actual ONTap GX cluster was at maximum “performance” able to produce 1032461 operations per second with an average latency of 4.6 milliseconds. Interestingly enough they were able to push 921099 operations per second at almost half that latency (2.7 milliseconds)
NetApp utilized 216 load generation servers for push this result. Each load server ran 24 load threads. This equates to 4780 ops per load server and roughly 200 ops per thread.
Based on the break down of the configuration, NetApp only presented a single Global Name Space, so technically they had a single “file system”. Under the Global Name Space they were able to mount 24 volumes in sub-directories. Per volume NetApp was able to reach 43020 ops. but for the purpose of being fair, they were realy able to get 1032461 ops. per file system.
This is where this solution starts getting silly. NetApp required a physical total of 2016 disks across 48 controllers to get this solution going. Actual data disks were broken down as follow 5 RAID DP disk groups of 14+2 disks per controller pair. (so 24 storage nodes with 80 data spindles each for a total of data spindles of 1920). The disk speed was 15K RPM FC-AL drives. So the breakdown is as follows:
21590 ops per disk controller
538 ops per spindle
Cost for Solution:
These are the models I believe that we will require for make this solution work as it was tested.
SW-T4C-GX-BASE Data ONTAP GX Software,Node $23,331
SW-T4C-GX-FV-HPO GX FlexVol HPO Software $13,769
X70015B-ESH2-QS-R5 DS14MK2 14×144GB,15K $35,595
They required 24 storage nodes each storage node was comprised of 84 disks and was licensed for Flex Volumes and ONTap GX. I have figured that a single storage node cost, $402,135 at list pricing. For a grand total for the solution of $9,651,240, and the extended approximately $9.35 per IOP.
EMC utilizes a scalable clustered solution, but I was most disappointed with their configuration. The fact they required 7 unique file systems breaks the model of scalable NAS solutions. The biggest problem is that anyone can create 7 separate mountable file systems, the problem is they then need to manage them separately. A single global name space is really the only way to achieve simple mounting and management. There deployment of hardware is most similar to Panasas on the front-end and parallel BlueArc and Exanet with FC SAN back-ends. They clearly win the IOPs per Controller game because they technically utilized the least amount of storage controllers, but DMX is also far above the storage deployed in any of the other solutions.
The stated performance was: SPECsfs97_R1.v3 = 320064 Ops/Sec (Overall Response Time = 1.64 msec)
The actual EMC NSX cluster was at maximum “performance” able to produce 320064 operations per second with an average latency of 6.4 milliseconds. Interestingly enough they were able to push 295460
operations per second at half that latency (3.2 milliseconds)
EMC utilized 42 load generation servers for push this result. Each load server ran 42 load threads. This equates to 7620 ops per load server and roughly 181 ops per thread.
Based on the break down of the configuration, EMC presented 7 (seven) file sytems for the solution testing. Per file system EMC was only able to reach 45723 ops.
EMC required a physical total of 772 disks across 14 controllers to get this solution going. Actual data disks were broken down as follow 55 RAID 1 disk groups per volume configured. The disk speed was 10K RPM FC-AL drives. So the breakdown is as follows:
22862 ops per disk controller
415 ops per spindle
Cost for Solution:
This is about as rough of a quote as I can come up with, I will check with some inside EMC folks I still know and see if I was even close on the correct configuration. My lack of line items make me believe that I am FAR off, so take this price estimate with several grains of salt.
DMX2000-PRF EMCDISK: $668,003
DMX3000-M23 EMCDISK: $346,153
CEL60462-4AU NSX XB60: $29,705
One DMX2000 and One DMX3000 and eight XB60 NSX blades = $1,251,796 which equates to $3.91 per iop.
Panasys uses a similar front-end model that NetApp and EMC utilize, however they should truly be considered a segmented file system like Exanet. They have captured storage associated with each set of DirectorBlades and use data redirection and data fetching to allow for a single name space. They are able to achive performance by leveraging parallel writing similar to what you would see with some of the build your own solutions such as Lustre and PVFS2.
The stated performance was: SPECsfs97_R1.v3 = 305805 Ops/Sec (Overall Response Time = 1.76 msec)
The actual Panasas cluster was at maximum “performance” able to produce 305805 operations per second with an average latency of 7.8 milliseconds. Interestingly enough they were able to push 274040
operations per second at almost half that latency (4.0 milliseconds)
Panasas utilized 45 load generation servers for push this result. Each load server ran 60 load threads. This equates to 6796 ops per load server and roughly 113 ops per thread.
Based on the break down of the configuration, Panasas presented one global Namespace similar to NetApp, they however had 60 virtual volumes mounted within sub-directories of this global namespace. So per virtual volume, Panasas was able to achieve only 5097 iops per volume, but to be fair, per file system the met the full 305805 iops.
Panasas required a physical total of 326 disks across 18 controllers to get this solution going. Actaul data disks were broken down as as RAID0 across the disks and cluster based RAID5 for data protection. The disk speed was 7.2K RPM SATA drives. So the breakdown is as follows:
16989 ops per disk controller
938 ops per spindle
Cost for Solution:
This is about as rough of a quote as I can come up with, they call for 270 StoageBlades and 60 DirectorBlades. I might be missing some software licensing here as well. Let’s add it up, $605610 for storage blades and $461400 for DirectorBlades for a grand total of $1067010 which equates to $3.49 per iop.
DirectorBlade 100: $7,690.00
Store_Blade StorageBlade: $2,243.00
Exanet utilizes a scalable commodity hardware approach to the clustered NAS market. They deploy in clustered pairs of ExaStor nodes, and scale in clustered pairs. The data path for the global name space is then either pre-fetched across the dedicated GigE back-end network, or re-directed through the back-end network. This limits the per-file performance to this back-end network. The benefit of an open hardware architecture is the ability to upgrade this network component when it becomes a bottleneck. I have to assume that 10G Ethernet and or infiniband connectivity is on the road map. They require deployment of clustered pairs because each individual storage node (being just an Intel server) is a significant single point of failure.
The stated performance was: SPECsfs97_R1.v3 = 203182 Ops/Sec (Overall Response Time = 1.08 msec)
The actual Exanet cluster was at maximum “performance” able to produce 203182 operations per second with an average latency of 2.6 milliseconds. Interestingly enough they were able to push 183032
operations per second at almost half that latency (1.8 milliseconds)
Exanet utilized 24 load generation servers for push this result. Each load server ran 24 load threads. This equates to 8466 ops per load server and roughly 353 ops per thread.
Based on the break down of the configuration, Exanet presented one global Namespace similar to NetApp & Panasas, there is no documentation that they utilized virtual volumes or other volume technology. Therefore, per file system they met the full 203182 iops.
Exanet required a physical total of 348 disks across 12 controllers to get this solution going. They utilized 24 RAID5 disk sets. The disk speed was 15K RPM FC-AL drives. So the breakdown is as follows:
16932 ops per disk controller
584 ops per spindle
Cost for Solution:
While getting list pricing out of exanet was close to impossible, Exanet did post what they claim to be the list cost per IOP of the specSFS benchmark results on their website. They list the cost of the solution at $851,332 or a $4.19 cost per IOP.
BlueArc utilizes a custom hardware front-end architecture with a required dedicated Fibre Channel SAN Disk storage back-end. The support list until recently for the storage backend had been limited to LSI/Engineo disk storage, but now with the OEM/reseller agreement between BlueArc and HDS, there is an expanded storage support list. They deploy HA configurations in a clustered pair configuration, however, I can assume theat they are working towards an N+1 architecture for release in the near future. The HA cluster is a true active-active configuration.
The stated performance was: SPECsfs97_R1.v3 = 195502 Ops/Sec (Overall Response Time = 2.37 msec)
The actual BlueArc cluster was at maximum “performance” able to produce 195502 operations per second with an average latency of 5.6 milliseconds. They didn’t drop in latency until iops dropped to 162759 and the related latency was 3.7ms.
BlueArc utilized 28 load generation servers for push this result. Each load server ran 32 load threads. This equates to 6982 ops per load server and roughly 218 ops per thread.
Based on the break down of the configuration, BlueArc presented one global Namespace similar to NetApp & Panasas & Exanet, They do utilize a rooted sub-namespace technology that was split into 16 sub-volumes, so per volume performance was 1221 . However, as I stated before, they did present a single namespace and therefore, per filesystem they met the full 195502 iops.
BLueArc required a physical total of 416 disks across 16 controllers to get this solution going. They utilized 12+1 RAD5 groupings. The disk speed was 15K RPM FC-AL drives. So the breakdown is as follows:
12218 ops per disk controller
470 ops per spindle
Cost for Solution:
This is going to be a hard configuration to price out. I have taken the list price of a BlueArc Titan 2100 from HDS at $219,758, the disk requirement for these numbers is approximately $144,208 for the disk controllers and $123,712. This puts the total solution at just about $487,678 and that equates to $2.49 per IOP.
SX365024.P HIGH PERFORMANCE NAS 2100 FSX Cluster Bndl: $219,758
LSI 2882 disk controller: $18,026
LSI Storage Expansion Unit: $15,464
So there are a few things that should be looked at when comparing these solutions, and some of those are things I decided to leave out. I specifically didn’t take into account disk capacity, since this was all over the place, and it wasn’t a category that specSFS really cares about. I also didn’t look at the IO profile for this write up because there is only so much time in the day, and most people don’t ever get to that level of detail when researching a benchmark result. I really just wanted to point out that yes you can get to 1Million iops in a NAS solution, but prepare to break out the checkbook. I do hope that everyone realizes that comparing list pricing, and most likely BAD configurations was just done for the purpose of demonstration and not to base ANY purchase decision on this. Vendors typically drop anywhere from 10% to 75% off of list price when working with customers.