08 July 2013

Isilon vs. SONAS Part 6: Remote Replication


SONAS vs. Isilon Part 6: Remote Replication

In this article we’ll look at the remote replication implementations of both systems. The comparison is based on SONAS 1.4 and Isilon OneFS 7.0. Both systems provide asynchronous replication with configurable RPOs.

Use Cases

Asynchronous replication is typically used for large scale filesystems for the following use cases:

  • Disaster Recovery
  • Business continuance
  • Disk-to-Disk backup
  • Remote disk archive

Although a zero RPO cannot be achieved using asynchronous replication, it is typically feasible for non-
transactional data. The achievable RPO as well as the RTO depends on several factors as the change rate of data, available bandwidth, latency and other. In practice a RPO of a few minutes can be achieved (for example, in Isilon this can be set down to a minute but it depends of the other factors whether that is realistic in practice.

 Performance

Both systems are capable of doing parallel replication leveraging multiple nodes on source and target and on both systems the number of processes (SONAS) and worker threads (Isilon) can be modified to adapt throughput and resource utilization. CPU and IO throttling is only supported on Isilon as well as Sub-Directory replication.
Picture 1: Replication is done in parallel from all or multiple nodes (Isilon). SONAS uses musliple or all of their interface nodes to replicate.
  
For the differences in functionality please refer to the performance section in the table below.

What gets replicated?

The list of things that one could think of to be replicated can be quite long. However, the basic thing that need to be replicated are obvious, it’s the data that is stored in the filesystem. Then there is a couple of things are needed on the remote site in case of a failover, for example:

  • SMB shares
  • NFS exports
  •  User/Group Quotas
  • UID/GID mapping


Then, we have a couple of things that you may or may not like to be replicated depending on the environment:

  • Network configuration (like IP addresses, routes, etc.). This is only useful if both sites are within the same layer2 network. It might be more appropriate to re-route the clients using other mechanisms like changing the DNS records or make use of DFS.
  • Authentication configuration (only if Site B is supposed to use the same AD or LDAP servers). If you think about a DR use case this is quite unlikely since the original AD or LDAP server may have also be ‘failed over’ to something else.

The list does not end here (i.e. you could also think about replicating Access Zone configuration, file policies and more) but it would go beyond this article to discuss all the potential use cases and implications.

Fact is, that both platforms of course replicate file system data. From the nice to have features mentioned above, Isilon replicates the UID/GID mapping which is a very important aspect. SONAS does not do that and therefore requires AD authentication with installed Unix Services on the AD (which should be best practice for a UID/GID mapping but I know of many customers don’t have that for various reasons. But having AD SFU installed is centralizing the UID/GID to SID mapping and is therefore very helpful in many other respects too).

The other big difference is that SONAS can only replicate the whole filesystem (or at fileset level) whereas Isilon can do replicate also on subdirectory level. This may an important aspect since you can schedule different portions of your filesystem replication at different times. For further differences see the ‘What gets replicated’ section in the table below. Also the ability to replicate on directory level allows you to implement an active/active like concept where the clients access data on both sites and relevant directories are replicated in the other direction (see figure 2).

Failover/Failback

In case of an outage of the primary site you may want to failover to the secondary site. For this to happen we have typically these steps to perform: 
  1. Stop the replication from A to B (if not already done by the outage)
  2. Roll back to the last known good snapshot on site B
  3. Make the filestem (or directory) writeable on site B
  4. If site A is available again we would need to revert the replication direction from B to A to replicate the changes that have been done on B while A was down.
  5. One site B and A are at the same level you may decide to fail back to site A.

In SONAS you have to revert the replication direction manually which might be a source for errors if not well documented and trained. Also you have to consider that in outage situations there is often an increased level of adrenalin in the admins blood so that a pre-defined routine may help a lot to avoid mistakes. Isilon provides a One Push Button solution for it (well, it’s basically the failover that requires only one push button while the failback is performed in three steps. However, this pre-defined steps and the ability to perform them also via the WebUI are very helpful.

Some comments on automatic failover


Many customers are asking for automatic failover/failback capabilities. This is because of their experience with synchronous replication where that might be appropriate. However, in the unstructured file based scale out environment synchronous replication is not really a choice. You have to consider that with asynchronous replication a failover is –with high probability- causing some data loss (remember that we have to roll-back using the last known good snapshot). Therefore you might want to have an administrator deciding whether you need to perform a failover or initiate other appropriate actions. Nevertheless, automatic failover can be done. I have seen projects where this have been implemented on Isilon using a so called Automatic Failover Management solution (AFM) which was implemented by some clever services colleagues. 

Topology


Typically the replication is built in a 1:1 relation. SONAS does it on filesystem level whereas Isilon canb e configured to replicate on a subdirectory level. Therefore a bi-directional replication can be setup for different directories. Figure one illustrates that. The target directory (or filesystem) is read only until the replication relation gets stopped. A single system can be target  for multiple sources. In SONAS that requires the target directory to be different from the root /gpfs. For a potential failover this requires modification on the application (of DFS) level since source and target directory paths are not identical.


Figure2: OneFS bi-directional replication on different sub-directories

Bi-directional replication (in support of an active-active like solution) can only be implemented with Isilon because in SONAS the same system cannot be source and target at the same time (because you can only to replication of filesystem level)

[update 11.Feb 2014]

Aspera Integration

One important aspect that I forgot to mention when I did the initial post is the Apsera integration into Isilon. Aspera is a third party solution for WAN optimized data replication and synchronization (for more information you may visit their web-site http://asperasoft.com/software/synchronization/ ). Isilon has the Aspera solution integrated into the code and can therefore replicate and synchronize at a high performance level. Ironically, IBM has recently acquire Asperasoft but the solution is not integrated into SONAS.

[/update] 

Management

Both systems allow Web UI and CLI configuration of the replication. Isilon also provides a RESTful API for management but the support for SynIQ (that’s the name of the replication module in Isilon) is yet very limited. What future releases for complete support of all functions here.

Failover/Failback is an important aspect for disaster situations. In SONAS you have to revert the replication direction manually which might be a source for errors if not well documented and trained. Isilon provides a One Push Bottom solution for it (well, in practice only the failover is one push, failback requires of course some additional steps).

Both solutions have online accessible documentation. The SONAS information center (google it) seems to me a good structured and complete resource with html and PDF format and search capabilities. The Isilon PDF documentation can be downloaded on support.emc.com where you can also find some best practices papers.


Isilon
SONAS
Performance


Parallel replication using multiple or all nodes
yes
yes1)
Throttle throughput
yes
no
Throttle CPU usage
yes
no
Transfer compressed data
no6)
yes
Target aware initial replication
yes
yes
Efficient block based deltas
yes
yes
Modifiable number of processes or threads
yes
yes
Aspera integration
yes
no



What gets replicated


Subdirectory replication
yes
no2)
Replicate UID/GIS mapping
yes
no3)
Replicate shares/quotas
no
no
Include/Exclude policies
yes
no



Topology & Security


1:N replication
yes5)
?
N:1 replication
yes
yes
A source cluster can also be a target
yes5)
no
Cascading replication of same directory
no
no
Can encrypt replication data on wire
no 6)
yes



Management


GUI configuration and Management
yes
yes
GUI configuration and Management
yes
yes
RESTful API for management
yes7)
no
Push button Failover/Failback
yes
no
Performance/throughput monitoring
yes
no
Failover dry-run support
yes
no
Additional Snapshots on target
yes4)
?
Rating on available online documentation
+
++



1) Only interface nodes
2) Can use file space replication
3) SONAS async replication requires AD and installed Unix Services to be installed
4) Requires SynIQ license
5) Only on different directories
6) Requires external solution
7) Limited in 7.0 and almost complete in 7.1


Summary

Both SONAS and Isilon do asynchronous replication. Although customers often ask for synchronous replication, it is hard to achieve with a Scale Out NAS System due do latency and other issues. Both systems can leverage their parallel architecture to move data and they can incrementally move changed data on block level. Both systems are very limited in replicating configuration data such as shares, quotas, networking config so that manual and/or scripted actions are required for failover/failback. In this regard Isilon has more automated failover/failback functionality while SONAS has advantages with compression and encryption capabilities.



Disclaimer
As always: this article reflects my own personal view of the facts. As the time of writing, Isilon release 7.02 and SONAS 1.4 were the actual releases. Please consult the appropriate manuals for details and actual release. If I got something wrong or missing please feel free to use the comment function to post your comments or send me a mail. The also the general disclaimer.









1 comment: