The challenges
The NDMP challenge
Remember that the context of this discussion is LARGE filesystems. The industry standard solution for backing up NAS appliances is the NDMP protocol. I have mentioned several disadvantages of NDMP in a recent post. By far the biggest challenge is that NDMP does not support a progressive incremental forever strategy (well, there is an exception that I will explain later but without any 3rd party tools the statement is true). That means you need to perform a full backup every so often. In practice this is not feasible: assume we would have a dedicated 10 Gigabit Ethernet available for backup and we could saturate it with 900 MB/s. A full petabyte backup would still takeapproximately 13 days to complete.
The treewalk challenge
But also incremental backups have challenges: you need to identify which file has changes after the last snapshot. Traditionally you need to walk across the filesystem and identify which files to backup. Large filesystems can have billions of files so the treewalk may also take days to complete.The restore challenge
Depending on your business requirements you cannot restore a full (petabyte) filesystem fast enough. At least you would require predictable SLAs to be able to make a business decision. That means that you cannot count on some technologies (like tape) where you have no guarantee of the achievable throughput for restore. There certainly are more challenges but the afore mentioned are my top 3.Requirements for an ideal backup/restore solution
Based on many discussions that I had with customers and engineers I would articulate two types of requirements: first the Must have or mandatory requirements and second the Nice to have requirements. This is of course a subjective categorization. The fact that a requirement is mandatory or nice to have is naturally depending on your business.Mandatory requirements
An ideal solution would meet the following ideal requirements:- It must be fast. Technically it means that it needs to
- support massive parallel movement of data
- and it must avoid treewalks
- It must support a progressive incremental forever strategy so that you only need a full backup at the beginning.
- It must support predicable SLAs for restore.
- It must support a short Recover Point Objective (RPO) if your business requires it
- The whole solution must be cost efficient
Nice to have requirements
Depending on whom you talk to you may get a lot more things on the wish list but here are some prominent ones:- Be able to backup data to different media types (tape, VTL, disk…)
- Be able to restore you data to a different NAS system
- Be able to restore you data to a different filesystem/location
- End users can restore data without admin privileges
- Support de-duplication and/or compression
Existing Backup/Restore Solutions
Let’s nos look at some prominent backup/restore solution and how they meet the requirements listed above.1. Filesystem backup using NDMP
As said earlier, NDMP is the standard protocol for backup/restore of NAS devices. I already mentioned in this blog a number of shortcomings so let’s take a look at the capabilities. Be aware that some requirements cannot be answered with Yes/No therefore I added a comments column. If mandatory requirements are not met I marked the no in red to indicate that this is a showstopper for the solution in my opinion.Mandatory Requirements | Req. met | Comments |
Fastness (avoid treewalk, massive parallel) |
(Yes)
| Depends on the product and version. With OneFS 7.1.1 and Networker 8.2 snapshots based backup is possible. Multi-stream NDMP will accelerate backups and restore going forward. |
Incremental forever |
no
| |
Predicable SLAs for restore |
no
| From tape: no; from disk: depends but will take long time |
Short RPO |
no
| |
Cost efficiency |
depends
| Tape:yes; Data Domain *):probably depending on de-duplication ratio. Consider that we have periodic fulls so de-dupe ratio might be low ; Other disk:no |
Nice to have requirements | ||
Backup to different media types |
yes
| |
Restore date to different NAS system |
no
| |
Restore data to different filesystem location |
yes
| |
End- User can restore data without admin privileges |
no
| |
Supports de-duplication/compression |
yes
| Compression is supported by tapes and de-duplication is supported on Data Domain *) |
*) Comment on de-duplication on Data Domain: higher de-duplication rates on Data Domain are typically achieved through incremental backups. However, due to the fact that we need to perform periodic fulls with NDMP you may not see very high de-duplication ratios. As a rule of thumb for sizing Data Domain, you need the same amount of DD storage that you want to backup. The first 50% will be filled with the initial backup, the other 50% must be sized for incremental backups. Considering that the DD storage is more expensive than a pure Isilon storage, this will not be as cost effective as the next solution (see next section).
Figure 1: Filesystem Backup using NDMP can use different target media like another Isilon (or other filesystems), a Data Domain or Tape
Some comments on the target options:
- The backup to another Isilon with NDMP is not cost effective. We need to dump periodic full backups to it and this will consume much more capacity than synchronizing with SyncIQ (next section). Also the post de-dupe process on Isilon may never really finish or will run a long time to complete.
- The backup to a Data Domain would be most efficient when the backup software is EMC Networker due to the tight integration and use of the boost protocol. However, that would only be cost efficient if the initial backup achieves a de-duplication ratio larger than 2:1. This may not happen regularly depending on the data set.
- Still, some customers do backup to tape. Low capex per GB, but management can get quite complex. Recovery times not predicable and may take weeks at petabyte scale.
2. Filesystem backup using mounted NFS export and CIFS shares
This option may sound strange to some people. I often hear concerned comments from administrators that NFS or CIFS will be too slow. Well, that might be true for traditional systems but even the slowest Isilon node would deliver the data via NFS at ~500 to 600 MB/s per Node (depending on thread counts, network config etc.). Throughput is not an issue here but the treewalk is one (see above). On the upside we are able to implement an incremental for every strategy using this approach.Figure 2: Multiple backup servers can each mount portions of the filesystem and perform a native backup to different targets.
As we can see from the following table, this solution has some different characteristics. We have established this solution with several customers and thus avoiding NDMP. You may need to parallelize the workload to multiple backup clients (or servers) by partitioning the namespace. This has been shown a successful strategy, especially for IBM TSM customers because the TSM support for NDMP is very poor. With TSM you can achieve an incremental forever strategy. OneFS 7.1.1 provides now a new API that would allow avoiding the treewalk as well going forward (see my separate post on the OneFS changelist API).
Mandatory Requirements | Req. met | Comments |
Fastness (avoid treewalk, massive parallel) |
(yes)
| Treewalk will not be fast but can be parallelized. Going forward backup SW solution will hopefully use the changelist API which will the avoid the treewalk. |
Incremental forever |
(yes)
| Depends on the backup software. With TSM:yes; With Networker potentially but I am not aware of any implementation. Testing required. Both, TSM and Networker still requires some scripting to use the changelist API; 3rd Party work in progress for TSM. |
Predicable SLAs for restore |
(yes)
| Target=Tape: no; Target= Data Domain: yes Target=Isilon: yes |
Short RPO/RTO |
no
| |
Cost efficiency |
(yes)
| Only if you can implement an incremental forever strategy; Requires scripting today but will most probably available natively when backup SW vendors use the changelist API. |
Nice to have requirements | ||
Backup to different media types |
yes
| |
Restore date to different NAS system |
yes
| |
Restore data to different filesystem location |
yes
| |
End- User can restore data without admin privileges |
yes
| |
Supports de-duplication/compression |
yes
| Through various ways; If you use backup SW de-duplication that is supported by Commvault, TSM and others Isilon might be the best target. Data Domain is a very efficient target if the de-duplication ratio is high enough to justify the cost. |
One more comment on Data Domain: technically you could mount the DD target directory directly to an Isilon node and perform the backup through rsync. However, it’s not supported and I have not seen anybody testing it to date. But there is a remarkable example which shows that this kind of direct backup could be a good option going forward. The new VMAX³ together with ProtectPoint allows a direct backup to Data Domain without involving any backup node. It is announced to be 10 times faster than traditional backups. I am curious to see similar things for Isilon in the future.
3. Backup using the Avamar Backup Accelerator
Another interesting approach to backup massive amount of data is the Avamar Backup Accelerator. It’s technically interesting because the Avamar Backup Accelerator, which is a separate server outside the Isilon cluster, is using NDMP to get the data from Isilon but it ‘unpacks’ the NDMP stream on the fly and does a file based backup to the backend which has to be a Data Domain System. Don’t be confused with the Isilon Backup Accelerator which is just an Isilon Node with a Fiber Channel connection to do faster 2way NDMP directly to tape.Figure 3: The Avamar Backup Accelerator node unpacks the NDMP stream on the fly This solution combines the the good features of snapshot based fast NDMP with native backup methods which allows full de-duplication advantages and indexing and versioning down to single files. Through the Isilon API, Avamar controls snapshot creation on Isilon. The Avamar Business Edition Server illustrated in figure 3 is required to manage meta data, index and scheduling. I expect these functions to be integrated into Networker going forward as these products converge. The Backup Accelerator which moves the data is available as a hardware appliance or VM. At the time of writing this article, the hardware appliance has only a Gigabit Ethernet adapter. This may change to 10 GB/s Ethernet going forward, but for the moment you may consider using the VM version so that you can leverage your 10 GB/s network already in place. As a rule of thumb you can assume 500GB/s throughput per Avamar Backup Accelerator. For increased throughput, multiple accelerators can be used. Let’s look at the attributes versus our requirements in the following table:
Mandatory Requirements | Req. met | Comments |
Fastness (avoid treewalk, massive parallel) |
yes
| See above: ~ 500 MB/s per accelerator node; Scales with number on nodes. Also it uses the OneFS changelist API to make treewalks unnecessary. |
Incremental forever |
yes
| No restrictions! Full index, versioning, no treewalks |
Predicable SLAs for restore |
yes
| Yes |
Short RPO/RTO |
no
| |
Cost efficiency |
?
| Following the rule of thumb for DD sizing, you need approximately the same raw capacity on DD as the data size that you need to backup. Compared to the async mirror with snapshot retention (see next section), this solution will most probably be more expensive ! |
Nice to have requirements | ||
Backup to different media types |
No
| Data Domain target only |
Restore date to different NAS system |
yes
| |
Restore data to different filesystem location |
yes
| |
End- User can restore data without admin privileges |
yes
| |
Supports de-duplication/compression |
yes
|
4. Using parallel asynchronous mirroring and snapshots
I have seen many customers who got rid of the traditional backup solutions mentioned so far. Instead, they use the Isilon asynchronous replication and retain snapshots on the source, target or both sides. With snapshots you protect the data against logical corruption or deletion of data and you keep multiple file versions at the same time. Using replication lets you protect the data against hardware failures. Replication works very fast since it leverages all nodes and is multi- threaded down to sub-file level. See a previous post on SyncIQ or read the Best Practices for Data Replication with EMC Isilon SyncIQ White Paper which describes the solution in detail.Figure 4: Isilon parallel replication with SyncIQ. In case you lose the primary site, you can fail over all your clients to the secondary site and continue working with relatively little impact (compared to the other solutions). You can also manually (or scripted) reverse the replication directory so that SiteA gets the secondary site and SiteB gets the primary site. The node types on both sides don’t need to be similar so in a typical scenario the secondary site gets equipped with higher dense node types.
Mandatory Requirements | Req. met | Comments |
Fastness (avoid treewalk, massive parallel) |
yes
| This is the fastest of all methods: - massive parallel - incremental forever - no treewalks! |
Incremental forever |
yes
| |
Predicable SLAs for restore |
yes
| Fast restore of failover possible in case of a disaster |
Short RPO |
yes
| |
Cost efficiency |
yes
| Can use denser NL nodes for secondary site |
Nice to have requirements | ||
Backup to different media types |
No
| Well, you could of course perform additional backups from the primary or secondary site to another medium in addition! |
Restore date to different NAS system |
(yes)
| Not using syncIQ but target filesystem can be read and you can copy data to any other filesystem ‘manually’ |
Restore data to different filesystem location |
(yes)
| See above |
End- User can restore data without admin privileges |
yes
| Yes. For Microsoft users: it integrates into VSS and for UNIX users a .snapshot directory is available in any directory so you can access previous versions of your files as well. |
Supports de-duplication/compression |
yes
| Post-process de-duplication available. Can be used on both or only on side. |
Superior RTO and RPO
As you can see in Table 4, this solution is the only one that meets all mandatory requirements. It is not only the fastest method for backup and restore, it also provides the lowest RTO and RPO. RTO: in case of a disaster you can fail-over to the secondary site. On Isilon, you just need to push one button and then the secondary system will roll back to the latest snapshot and make the target directory writeable. Additionally you need to do some other things as well like pointing your clients to the new cluster which could be done by changing the DNS record of the cluster or changing the links in Microsoft DFS in case you use it. None of the other methods provide such a short time for recovery. Restoring a petabyte filesystem will take days or even weeks!RPO: a traditional backup like those mentioned in solution 1 to 3 are typically performed once a day. If you backup your data at 20:00 each day and lose your primary filesystem at 19:00, you lose 23 hours of data. With SyncIQ you can set RPOs down to some minutes. Since OneFS 7.1.1 even continuous replication is supported. Meaning that as soon as the primary filesystem recognizes changes it starts replicating to the other side. However, this process is still snapshot based. If you have too many changes going on continuously, the system might not be able to delete older snapshots fast enough. So realistically you would set the RPO to some minutes depending on the change rate of your data and your business requirements. Nevertheless, this is superior to any other solution mentioned before.
SyncIQ Integration with Networker and other Backup Software solutions
There is another interesting aspect to mention in respect to hybrid solutions that combine the strength of SynIQ with those of backup solutions like EMC Networker. Starting with release 8.2, Networker is able to manage and schedule SyncIQ snapshots and replication. That means you do data protection from a single point of management. But the thing gets even more interesting if Networker can index the files that reside in the snapshots. Subsequently you would have your traditional backup solution with indexing, versioning etc. without any additional data movement. I am looking forward to see exciting new things like this and many others going forward.Summary and conclusion
Considering the challenges that come along in with protecting petabyte scale filesystems against logical and physical disasters it is obvious that using the parallel replication mechanism of Isilon is the fastest, most efficient solution. It is very cost effective since it- can use lower cost, high dense nodes on the secondary site
- does only replicate changes
- can use de-duplication to reduce consumed capacity
- provides the fastest RPO and RTO of all solutions
Mandatory Requirements | NDMP | Native SMB/NFS shares | Avamar Accelerator | SynIQ async replication |
Fastness of backup
|
+
|
+
|
+
|
+++
|
Incremental forever
|
No
|
(yes) 1)
|
yes
|
yes
|
Predicable SLAs for restore
|
No
|
(no) 2)
|
(yes) 2)
|
yes
|
Short RPO/RTO
|
No
|
No
|
No
|
yes
|
Cost efficient
|
(yes) 3)
|
?
|
yes
|
yes
|
2) Depends on the media that is used: Tape:no, Disk:yes
3) With tape only
Table 5: Summary of how the four solution options meet the mandatory requirements As a conclusion we can see that the asynchronous parallel data replication seems to be the most efficient method, technically and financially. Keep in mind that this is only the perception of some data protection experts and myself and is very general. Depending on your business requirements, existing environment and skills you may come to a different conclusion for your use case. In addition, new developments like the tighter integration of Backup solutions with Isilon will change the picture as well and will evolve into interesting and appealing hybrid solutions.
Nice to see your post, this is a great platform to get some useful information and facts!
ReplyDeletedisaster recovery backup
Stefan Radtke'S Blog: Challenges And Options For Filesystem Backups At Petabyte Scale >>>>> Download Now
Delete>>>>> Download Full
Stefan Radtke'S Blog: Challenges And Options For Filesystem Backups At Petabyte Scale >>>>> Download LINK
>>>>> Download Now
Stefan Radtke'S Blog: Challenges And Options For Filesystem Backups At Petabyte Scale >>>>> Download Full
>>>>> Download LINK sP
THANKS FOR SHARING SUCH A GREAT WORK
ReplyDeleteGOOD CONTENT!!
SAN Solutions in Dubai
That means you need to perform a full backup every so often. In practice this is not feasible: assume we would have a dedicated 10 Gigabit Ethernet available for backup and we could saturate it with 900 MB/s. A full petabyte backup would still take approximately 13 days to complete.Event Tents in Abu Dhabi | European Style Tents Middle East | Festival Tents Dubai
ReplyDeleteonline backup statistics
ReplyDeleteOnline Daily Backup software helps you to create copies of files, database, and hard drive that prevents your data loss.
Click here for more information about Online Cloud Backup Reseller Program.
continuous data protection
ReplyDeleteOnline Daily Backup software helps you to create copies of files, database, and hard drive that prevents your data loss. Click here for more information about Online Cloud Backup Reseller Program.
Stefan Radtke'S Blog: Challenges And Options For Filesystem Backups At Petabyte Scale >>>>> Download Now
ReplyDelete>>>>> Download Full
Stefan Radtke'S Blog: Challenges And Options For Filesystem Backups At Petabyte Scale >>>>> Download LINK
>>>>> Download Now
Stefan Radtke'S Blog: Challenges And Options For Filesystem Backups At Petabyte Scale >>>>> Download Full
>>>>> Download LINK 6s
online backup statistics
ReplyDeleteOnline Daily Backup software helps you to create copies of files, database, and hard drive that prevents your data loss. Click here for more information about Online Cloud Backup Reseller Program.