GPFS data protectionGPFS has a fixed filesystem block size that can be set by the administrator at the time of creating the filesystem. An often used value is 256 kB. On the disk layer, GPFSs builds so called Network Shared Disks (NSDs) out of the LUNs it gets presented. Quite sophisticated and flexible configurations are possible with GPFS if it is used outside SONAS. With SONAS however, the flexibility has been gone to maximize standardization and easier support. In SONAS, a LUN is formed out of 8 disks plus to parity disks (8+P+Q). Please note that no spare disks are used in this configuration. Each of these RAID6 arrays are then made a NSD in GPFS. If now a file is written to the filesystem, GPFS splits it according to the filesystem block size into slices that are then written to across all NSDs. The following picture visualizes it.
|Fig1: GPFS File striping, NSDs and RAID array configuration|
What we also see here is that the storage arrays cut the writes into 32 kB block size. So the protection level in SONAS is pretty much like simple RAID6 without any SPARE disks. No flexibility (the only thing the admin can change is that he decides to mirror each NSD to another NSD. That needs to be done with care to guarantee that the mirrored NSD resides in another Pool of Disks (POD) – see figure one of my previous blog. Of course that reduces the capacity efficiency right below 50%).
I think it is highly unrealistic that administrators want a single protection level for all their data in a Petabyte scale filesystem. There might be temporary data, data from test and development, highly critical business data … all get the same level of protection (we talk about ILM capabilities in a later blog).
Isilon Data ProtectionIsilon doesn’t work with storage controllers nor does rely on hardware RAID. OneFS combines the layers on filesystem, LVM an RAID and OneFS in charge of the whole data path, from the filesystem level down to the physical segment on the disk. Data protection can be set by the administrator on global level, filesystem level and even file level and is not bound to any HW configuration or RAID formats.
OneFS’s filesystem block size in 8KB and a stripe unit (the unit of block that gets written to a single node) is build of up to 16 contiguous blocks (128kB). These stripe units are then spread across all nodes in the cluster. The protection level can be set by the admin even on directory or file level. Let’s assume we have a cluster of 6 nodes and a protection level of N+2 for a specific directory. OneFS is builds a protection group of 4 (N=4) data stripes and 2 parity stripes. The following picture visualizes it.
|Fig 2: OneFS Data Stripes and Protection Groups|
What we can see is, that N+2 here does not only mean that only 2 disks can fail. Furthermore, two nodes can fail or a larger number of disks in two nodes can fail. Whatever two nodes fail in the above picture, there is always enough remaining data to recover the data or parity stripes. And again, the administrator can set the protection level per directory or files. The capacity efficiency above can be calculated as:
Efficiency = 1 – (M / (M+N)) = 1- (2 / (2+4)) = 66%.
However, if we decide that a protection level of two disk failures or one node failure is sufficient, we can make the protection group larger and fold it twice over the same nodes. See the following picture for illustration.
|Fig 3: OneFS folded protection group allowing 2 disk failures|
We have now a protection level of N+2:1. That is, we can lose two disks or one node. The efficiency equation from above evolves to:
Efficiency = 1 – (M / b(M+N))
Here, b=2 because we for a protection group of 2 stripes. As before, M=2, N=4:
Efficiency = 1 – (2 / 2(2+4)) = 83%.
OneFS supports data protection levels of N+1, N+2, N+3, and N+4 (all b=1), as well as N+2:1 double-drive and single-node tolerant, (b=2) and N+3:1 (triple-drive and single-node tolerant (b=3), as well as mirroring levels between 2x (doubly-mirrored) and 8x (eight-times mirrored). Files smaller than 128kB are being mirrored by default. As you can see, Isilon uses no spare disks. But it uses free capacity to rebuild the data from a failed drive. So as long as enough capacity is in reserve you have nothing to do for the rebuild to occur.
SummarySONAS has a fixed protection level based on RAID6 for all data. It uses arrays with 8 data and two parity bits without any spare disk. As a result, SONAS would for example not survive a two node failure (unless all data is mirrored). In case of a drive failure, there is manual intervention and disk replacement required in order to start the rebuild. Using large drives like 3TB or large may cause higher risk since rebuild times increase to multiple days while performance suffer for the array.
Isilon has a much more flexible protection scheme which can be tailored to specific business needs on global, directory or file level. Rebuild occurs automatically and much faster since more drives are involved.