02 September 2014

Challenges and Options for Filesystem Backups at Petabyte Scale

The data growth for unstructured data is accelerating and filesystems that contain one or multiple petabyte are common. The current version of OneFS 7.1 has a tested support for 20PB of raw capacity, new node types and larger disks will lift that limit going forward. Challenges start even before a filesystem size reaches a petabyte. In this post I will identify the challenges that come along with such large filesystems. Even though the Isilon data protection mechanisms are very mature [1] you may want to backup your data to prevent logical or physical data due to disasters or for compliance reasons. We’ll define what the attributes of an ideal backup solution are and compare some existing technical solutions against this list of attributes and functions. The list of technical solutions is determined by my working experience and discussion with some renown colleagues from EMC’s Data Protection Availability Division (see Acknowledgement) and I would not make a claim for completeness.

 

The challenges

 

The NDMP challenge

Remember that the context of this discussion is LARGE filesystems. The industry standard solution for backing up NAS appliances is the NDMP protocol. I have mentioned several disadvantages of NDMP in a recent post. By far the biggest challenge is that NDMP does not support a progressive incremental forever strategy (well, there is an exception that I will explain later but without any 3rd party tools the statement is true). That means you need to perform a full backup every so often. In practice this is not feasible: assume we would have a dedicated 10 Gigabit Ethernet available for backup and we could saturate it with 900 MB/s. A full petabyte backup would still take
approximately 13 days to complete.

The treewalk challenge

But also incremental backups have challenges: you need to identify which file has changes after the last snapshot. Traditionally you need to walk across the filesystem and identify which files to backup. Large filesystems can have billions of files so the treewalk may also take days to complete.

The restore challenge

Depending on your business requirements you cannot restore a full (petabyte) filesystem fast enough. At least you would require predictable SLAs to be able to make a business decision. That means that you cannot count on some technologies (like tape) where you have no guarantee of the achievable throughput for restore. There certainly are more challenges but the afore mentioned are my top 3.

Requirements for an ideal backup/restore solution

Based on many discussions that I had with customers and engineers I would articulate two types of requirements: first the Must have or mandatory requirements and second the Nice to have requirements. This is of course a subjective categorization. The fact that a requirement is mandatory or nice to have is naturally depending on your business.

Mandatory requirements

An ideal solution would meet the following ideal requirements:
  1. It must be fast. Technically it means that it needs to
    • support massive parallel movement of data
    • and it must avoid treewalks
  2. It must support a progressive incremental forever strategy so that you only need a full backup at the beginning.
  3. It must support predicable SLAs for restore.
  4. It must support a short Recover Point Objective (RPO) if your business requires it
  5. The whole solution must be cost efficient

Nice to have requirements

Depending on whom you talk to you may get a lot more things on the wish list but here are some prominent ones:
  1. Be able to backup data to different media types (tape, VTL, disk…)
  2. Be able to restore you data to a different NAS system
  3. Be able to restore you data to a different filesystem/location
  4. End users can restore data without admin privileges
  5. Support de-duplication and/or compression

Existing Backup/Restore Solutions

Let’s nos look at some prominent backup/restore solution and how they meet the requirements listed above.

1. Filesystem backup using NDMP

As said earlier, NDMP is the standard protocol for backup/restore of NAS devices. I  already mentioned in this blog a number of shortcomings so let’s take a look at the capabilities. Be aware that some requirements cannot be answered with Yes/No therefore I added a comments column. If mandatory requirements are not met I marked the no in red to indicate that this is a showstopper for the solution in my opinion.

Mandatory Requirements Req. met Comments
     Fastness (avoid treewalk, massive parallel)
(Yes)
Depends on the  product and version. With OneFS 7.1.1 and Networker 8.2 snapshots based backup is possible. Multi-stream NDMP will accelerate backups and restore going forward.
     Incremental forever
no
     Predicable SLAs for restore
no
From tape: no; from disk: depends but will take long time
     Short RPO
no
     Cost efficiency
depends
Tape:yes; Data Domain *):probably depending on de-duplication ratio. Consider that we have periodic fulls so de-dupe ratio might be low ; Other disk:no
Nice to have requirements

     Backup to different media types
yes
     Restore date to different NAS system
no
     Restore data to different filesystem location
yes
     End- User can restore data without admin privileges
no
     Supports de-duplication/compression
yes
Compression is supported by tapes and de-duplication is supported on Data Domain *)
Table 1: NDMP backup/restore vs. mandatory and nice to have requirements
*) Comment on de-duplication on Data Domain:  higher de-duplication rates on Data Domain are typically achieved through incremental backups. However, due to the fact that we need to perform periodic fulls with NDMP you may not see very high de-duplication ratios. As a rule of thumb for sizing Data Domain, you need the same amount of DD storage that you want to backup. The first 50% will be filled with the initial backup, the other 50% must be sized for incremental backups. Considering that the DD storage is more expensive than a pure Isilon storage, this will not be as cost effective as the next solution (see next section).
image
Figure 1: Filesystem Backup using NDMP can use different target media like another Isilon (or other filesystems), a Data Domain or Tape
Some comments on the target options:


  • The backup to another Isilon with NDMP is not cost effective. We need to dump periodic full backups to it and this will consume much more capacity than synchronizing with SyncIQ (next section). Also the post de-dupe process on Isilon may never really finish or will run a long time to complete.
  • The backup to a Data Domain would be most efficient when the backup software is EMC Networker due to the tight integration and use of the boost protocol. However, that would only be cost efficient if the initial backup achieves a de-duplication ratio larger than 2:1. This may not happen regularly depending on the data set.
  • Still, some customers do backup to tape. Low capex per GB, but management can get quite complex. Recovery times not predicable and may take weeks at petabyte scale.
As you can see, NDMP is not only failing to meet three mandatory requirements, it is also not very flexible when it comes to restore.

2. Filesystem backup using mounted NFS export and CIFS shares

This option may sound strange to some people. I often hear concerned comments from administrators that NFS or CIFS will be too slow. Well, that might be true for traditional systems but even the slowest Isilon node would deliver the data via NFS at ~500 to 600 MB/s per Node (depending on thread counts, network config etc.). Throughput is not an issue here but the treewalk is one (see above). On the upside we are able to implement an incremental for every strategy using this approach.
image
Figure 2: Multiple backup servers can each mount portions of the filesystem and perform a native backup to different targets.

As we can see from the following table, this solution has some different characteristics. We have established this solution with several customers and thus avoiding NDMP. You may need to parallelize the workload to multiple backup clients (or servers) by partitioning the namespace. This has been shown a successful strategy, especially for IBM TSM customers because the TSM support for NDMP is very poor. With TSM you can achieve an incremental forever strategy. OneFS 7.1.1 provides now a new API that would allow avoiding the treewalk as well going forward (see my separate post on the OneFS changelist API).

Mandatory Requirements Req. met Comments
     Fastness (avoid treewalk, massive parallel)
(yes)
Treewalk will not be fast but can be parallelized. Going forward backup SW solution will hopefully use the changelist API which will the avoid the treewalk.
     Incremental forever
(yes)
Depends on the backup software. With TSM:yes; With Networker potentially but I am not aware of any implementation. Testing required. Both, TSM and Networker still requires some scripting to use the changelist API; 3rd Party work in progress for TSM.
     Predicable SLAs for restore
(yes)
Target=Tape: no;
Target= Data Domain: yes
Target=Isilon: yes
     Short RPO/RTO
no
     Cost efficiency
(yes)
Only if you can implement an incremental forever strategy; Requires scripting today but will most probably available natively when backup SW vendors use the changelist API.
Nice to have requirements

     Backup to different media types
yes
     Restore date to different NAS system
yes
     Restore data to different filesystem location
yes
     End- User can restore data without admin privileges
yes
     Supports de-duplication/compression
yes
Through various ways; If you use backup SW de-duplication that is supported by Commvault, TSM and others Isilon might be the best target. Data Domain is a very efficient target if the de-duplication ratio is high enough to justify the cost.
Table 2: Native backup using NFS/CIFS shares from the source filesystem

Summarizing this strategy might be an option for you if you do not have to rely on a short RPO and RTO. If you require your business to continue within 24 hours or so after the loss of the primary filesystem, this is definitely not an option.
One more comment on Data Domain: technically you could mount the DD target directory directly to an Isilon node and perform the backup through rsync. However, it’s not supported and I have not seen anybody testing it to date. But there is a remarkable example which shows that this kind of direct backup could be a good option going forward. The new VMAX³ together with ProtectPoint allows a direct backup to Data Domain without involving any backup node. It is announced to be 10 times faster than traditional backups. I am curious to see similar things for Isilon in the future.


3. Backup using the Avamar Backup Accelerator

Another interesting approach to backup massive amount of data is the Avamar Backup Accelerator. It’s technically interesting because the Avamar Backup Accelerator, which is a separate server outside the Isilon cluster, is using NDMP to get the data from Isilon but it ‘unpacks’ the NDMP stream on the fly and does a file based backup to the backend which has to be a Data Domain System. Don’t be confused with the Isilon Backup Accelerator which is just an Isilon Node with a Fiber Channel connection to do faster 2way NDMP directly to tape.
image
Figure 3: The Avamar Backup Accelerator node unpacks the NDMP stream on the fly

This solution combines the the good features of snapshot based fast NDMP with native backup methods which allows full de-duplication advantages and indexing and versioning down to single files. Through the Isilon API, Avamar controls snapshot creation on Isilon. The Avamar Business Edition Server illustrated in figure 3 is required to manage meta data, index and scheduling. I expect these functions to be integrated into Networker going forward as these products converge. The Backup Accelerator which moves the data is available as a hardware appliance or VM. At the time of writing this article, the hardware appliance has only a Gigabit Ethernet adapter. This may change to 10 GB/s Ethernet going forward, but for the moment you may consider using the VM version so that you can leverage your 10 GB/s network already in place. As a rule of thumb you can assume 500GB/s throughput per Avamar Backup Accelerator. For increased throughput, multiple accelerators can be used. Let’s look at the attributes versus our requirements in the following table:

Mandatory Requirements Req. met Comments
     Fastness (avoid treewalk, massive parallel)
yes
See above: ~ 500 MB/s per accelerator node; Scales with number on nodes. Also it uses the OneFS changelist API to make treewalks unnecessary.
     Incremental forever
yes
No restrictions! Full index, versioning, no treewalks
     Predicable SLAs for restore
yes
Yes
     Short RPO/RTO
no
     Cost efficiency
?
Following the rule of thumb for DD sizing, you need approximately the same raw capacity on DD as the data size that you need to backup. Compared to the async mirror with snapshot retention (see next section), this solution will most probably be more expensive !
Nice to have requirements

     Backup to different media types
No
Data Domain target only
     Restore date to different NAS system
yes
     Restore data to different filesystem location
yes
     End- User can restore data without admin privileges
yes
     Supports de-duplication/compression
yes
Table 3: Backup using the Avamar Backup Accelerator

Technically this is the most appealing solution of the three that we have looked at so far since it combines the (few) advantages of NDMP with native file level backups, support for full indexing and versioning, de-duplication and compression. A good fit for customers who already have a Data Domain infrastructure or those who don’t need to backup the whole petabyte filesystem. However, due to the premium price for the de-duplication appliance it’s most probably more expensive compared to the solution in the next section (SynIQ).


4. Using parallel asynchronous mirroring and snapshots

I have seen many customers who got rid of the traditional backup solutions mentioned so far. Instead, they use the Isilon asynchronous replication and retain snapshots on the source, target or both sides. With snapshots you protect the data against logical corruption or deletion of data and you keep multiple file versions at the same time. Using replication lets you protect the data against hardware failures. Replication works very fast since it leverages all nodes and is multi- threaded down to sub-file level. See a previous post on SyncIQ or read the Best Practices for Data Replication with EMC Isilon SyncIQ  White Paper which describes the solution in detail.

image
Figure 4: Isilon parallel replication with SyncIQ.

In case you lose the primary site, you can fail over all your clients to the secondary site and continue working with relatively little impact (compared to the other solutions). You can also manually (or scripted) reverse the replication directory so that SiteA gets the secondary site and SiteB gets the primary site. The node types on both sides don’t need to be similar so in a typical scenario the secondary site gets equipped with higher dense node types.

Mandatory Requirements Req. met Comments
     Fastness (avoid treewalk, massive parallel)
yes
This is the fastest of all methods:
- massive parallel
- incremental forever
- no treewalks!
     Incremental forever
yes
     Predicable SLAs for restore
yes
Fast restore of failover possible in case of a disaster
     Short RPO
yes
     Cost efficiency
yes
Can use denser NL nodes for secondary site
Nice to have requirements

     Backup to different media types
No
Well, you could of course perform additional backups from the primary or secondary site to another medium in addition!
     Restore date to different NAS system
(yes)
Not using syncIQ but target filesystem can be read and you can copy data to any other filesystem ‘manually’
     Restore data to different filesystem location
(yes)
See above
     End- User can restore data without admin privileges
yes
Yes. For Microsoft users: it integrates into VSS and for UNIX users a .snapshot directory is available in any directory so you can access previous versions of your files as well.
     Supports de-duplication/compression
yes
Post-process de-duplication available. Can be used on both or only on side.
Table 4: Replicating data with Isilon’s SynIQ solution vs. requirements

Superior RTO and RPO
As you can see in Table 4, this solution is the only one that meets all mandatory requirements. It is not only the fastest method for backup and restore, it also provides the lowest RTO and RPO. RTO: in case of a disaster you can fail-over to the secondary site. On Isilon, you just need to push one button and then the secondary system will roll back to the latest snapshot and make the target directory writeable. Additionally you need to do some other things as well like pointing your clients to the new cluster which could be done by changing the DNS record of the cluster or changing the links in Microsoft DFS in case you use it. None of the other methods provide such a short time for recovery. Restoring a petabyte filesystem will take days or even weeks!
RPO: a traditional backup like those mentioned in solution 1 to 3 are typically performed once a day. If you backup your data at 20:00 each day and lose your primary filesystem at 19:00, you lose 23 hours of data. With SyncIQ you can set RPOs down to some minutes. Since OneFS 7.1.1 even continuous replication is supported. Meaning that as soon as the primary filesystem recognizes changes it starts replicating to the other side. However, this process is still snapshot based. If you have too many changes going on continuously, the system might not be able to delete older snapshots fast enough. So realistically you would set the RPO to some minutes depending on the change rate of your data and your business requirements. Nevertheless, this is superior to any other solution mentioned before.

SyncIQ Integration with Networker and other Backup Software solutions
There is another interesting aspect to mention in respect to hybrid solutions that combine the strength of SynIQ with those of backup solutions like EMC Networker. Starting with release 8.2, Networker is able to manage and schedule SyncIQ snapshots and replication. That means you do data protection from a single point of management. But the thing gets even more interesting if Networker can index the files that reside in the snapshots. Subsequently you would have your traditional backup solution with indexing, versioning etc. without any additional data movement. I am looking forward to see exciting new things like this and many others going forward.

Summary and conclusion

Considering the challenges that come along in with protecting petabyte scale filesystems against logical and physical disasters it is obvious that using the parallel replication mechanism of Isilon is the fastest, most efficient  solution. It is very cost effective since it
  • can use lower cost, high dense nodes on the secondary site
  • does only replicate changes
  • can use de-duplication to reduce consumed capacity
  • provides the fastest RPO and RTO of all solutions
In the following table I summarized the coverage of the mandatory requirements by the four different solution options.
Mandatory Requirements NDMP Native
SMB/NFS
shares
Avamar
Accelerator
SynIQ
async replication
     Fastness of backup
+
+
+
+++
     Incremental forever
No
(yes) 1)
yes
yes
     Predicable SLAs for restore
No
(no) 2)
(yes) 2)
yes
     Short RPO/RTO
No
No
No
yes
     Cost efficient
(yes) 3)
?
yes
yes
1) Needs currently some scripting until featured by backup software vendors
2) Depends on the media that is used: Tape:no, Disk:yes
3) With tape only
Table 5: Summary of how the four solution options meet the mandatory requirements

As a conclusion we can see that the asynchronous parallel data replication seems to be the most efficient method, technically and financially. Keep in mind that this is only the perception of some data protection experts and myself and is very general. Depending on your business requirements, existing environment and skills you may come to a different conclusion for your use case. In addition, new developments like the tighter integration of Backup solutions with Isilon will change the picture as well and will evolve into interesting and appealing hybrid solutions.


Acknowledgement

As stated in the beginning of this post, the content that we share here is a result of practical experiences that I made in a number of projects and a work/discussion where my following colleagues have contributed with their expert skills and experience: Oliver Kustermann (working for Orchestra), Lucian Gravis and Andreas El-Magraby (both working for EMC). Thanks to my brother Matthias who helped with correcting my typos and improve this text with respect to readability.

Discussion

As always, I appreciate if you leave comments/ideas/corrections/experience below and connect to me on Linkedin









1 comment:

  1. Nice to see your post, this is a great platform to get some useful information and facts!
    disaster recovery backup

    ReplyDelete