# update 01.04.2013: based on even more existing installations we typically see more than 400 MB/s throughput with TSM on actual hardware (NL 400 nodes). That's in line with our throughput sizing for other workloads. /update
These days many customers look for efficient ways to leverage relatively new features of their backup software like de-duplication, node replication and backup of virtual environments. Several of these features cannot be efficiently used when backing up and archiving data to tape. Also we see more and more use cases that require regular and more frequent access to backup- or archived data which as well is not well suited for data that sits on tape. I am currently working with some clever guys (see section contributions) who implemented Isilon at several customers to overcome typical limits of traditional backup to tape or disk solutions and I found these ideas are worth sharing here. Although this article discusses the challenges and solutions along Tivoli Storage Manager TSM, similar principles are valid for other backup solutions of other vendors like Commvault, Symantec, EMC, CA and others. Over time, disk solutions became more and
more cost effective and with Isilon we have an easy to manage and very cost effective solution that scales up to approximately 15 PB uncompressed and not de-duped capacity.
These days many customers look for efficient ways to leverage relatively new features of their backup software like de-duplication, node replication and backup of virtual environments. Several of these features cannot be efficiently used when backing up and archiving data to tape. Also we see more and more use cases that require regular and more frequent access to backup- or archived data which as well is not well suited for data that sits on tape. I am currently working with some clever guys (see section contributions) who implemented Isilon at several customers to overcome typical limits of traditional backup to tape or disk solutions and I found these ideas are worth sharing here. Although this article discusses the challenges and solutions along Tivoli Storage Manager TSM, similar principles are valid for other backup solutions of other vendors like Commvault, Symantec, EMC, CA and others. Over time, disk solutions became more and
more cost effective and with Isilon we have an easy to manage and very cost effective solution that scales up to approximately 15 PB uncompressed and not de-duped capacity.
Why backup to disk
As stated above, several use cases won’t allow using tape as
a backup media. Several advantages come along with backup to disk strategies:
- Faster backup, especially for unstructured data
- Much faster access to and restore of data
- Less secondary workload (tape migrations)
- Improved SLAs (have you ever really estimated how long it would take to restore the majority or all of your data from tape?)
- Recovery times not dependent on the number of available tape drives
- The SAN infrastructure is obsolete (with Isilon)
- Lower TCO in many cases. This depends of course on several factors like capacity, frequency of access, de-duplication ratio and others.
Issues with traditional disk arrays
Backup to disk is not a new thing but the management
overhead of traditional disk arrays are well known to TSM administrators:
- Traditional filesystems cannot easily be shared among TSM servers
- Many filesystems have a limited size (i.e. 32TB) which is way too small for a backup environment
- Filesystems cannot grow to accommodate any size without manually reshuffling and re-balancing tons of data through TSM.
- The management can be quite complex. Some
examples are:
- Storage array management
- Dedicated SAN adapters
- Device Drivers
- Array definitions
- LUN definitions for each array
- LUN masking
- SAN zoning
- Volume Groups
- Logical Volumes
- Filesystems
- Device class definitions
- etc. - Typically a performance monitoring and management is required.
All these issues are avoided when using Isilon as a backup
target. All your TSM servers can mount a single scalable filesystem through
NFS. The simplified infrastructure is shown in the following picture.
Figure1: Isilon and TSM Servers/Clients using 10 Gigabit
Ethernet
But hasn’t NFS been known to be a slow solution of TSM?
Well, that has been true for traditional NAS arrays but not
so with Isilon. As you may know, Isilon’s development started over a decade ago
for multimedia streaming. And using Isilon as a sequential file pool for TSM is
a well suited workload. Test results provided by Concat and General Storage
have shown a throughput of approximately 1,200 MB/s using three TSM instances
on a single Linux Server and 1,000 local clients that backup data to an Isilon
cluster with just four NL400 nodes. The setup has not been tuned and the
throughput results are shown in the following figure.
Figure 2: TSM Backup throughput using local clients
|
Standard TSM Setup
The
setup of the test-environment has been quite straight forward with no specific
tuning:
Here
are some of the main steps used to configure the TSM setup:
...
mkdir /tsmisilona/tsm1
mkdir /tsmisilona/tsm1/instance
mkdir /tsmisilona/tsm1/storage
...
dsmicfgx ... instance=/tsmisilon/tsm1/instance ...
...
(TSM) def devcl file devt=file maxcap=10000m dir=/tsmisilon/tsm1/storage
(TSM) def stg backuppool file maxscr=1000000
...
...
(TSM) upd devcl file
dir=/tsmisilona/tsm1/storage,/tsmisilonb/tsm1/storage,/tsmisilonc/tsm1/storage
Listing 1: TSM Instance setup steps
Generate massive data on ‘virtual’ TSM Clients and NFS Mount options
Backup tests at scale require typically heavy IO
workload. To avoid setting up a very
large TSM client infrastructure to generate massive throughput, about 1,000
clients have been used on the server that got fed via the TSM client API with
scripts that generate data with an average file size of 5 MB. This method
eliminates the requirement of reading sufficient data from disk and prevents
any bottleneck on the client side. The client processes wrote their data via
the loopback interface to the local TSM server on the same system.
Using three different mount points (rather than one) have shown to be much more efficient. The following mount options have been used:
mount -t nfs -o vers=3,tcp,hard,intr,rsize=131072,wsize=131072
isilon01-fast.lab.local:/ifs /tsmisilona
mount -t nfs -o vers=3,tcp,hard,intr,rsize=131072,wsize=131072
isilon01-fast.lab.local:/ifs /tsmisilonb
mount -t nfs -o vers=3,tcp,hard,intr,rsize=131072,wsize=131072
isilon01-fast.lab.local:/ifs /tsmisilonc
Listing 2: NFS Mount options
As stated earlier, the setup that provided the results shown
in figure 1 has not been tuned. Using multiple servers may even improve the
results. However, the results already show how effective the solution is. You
can expect that the throughput scales almost linearly with the number of nodes
that you add to the cluster while the management of the storage and
infrastructure does not increase.
One thing I need to mention is that you should not consider putting
the TSM database on Isilon. Highly random access patterns that we typically see
on the TSM database are something that doesn’t suite well on Isilon today. That
might change in the future but today you would ideally use some fast internal
disk or a SSD based array. In the test setup discussed here two 1TB SATA disks
have been used and the TSM database has been served from the server’s cache.
Why Isilon provides a more efficient solution
If you are a regular reader of this blog you already know
why Isilon helps to address all the issues mentioned above. Isilon comes along
with just one filesystem which does not require managing RAID arrays,
aggregates, logical or physical volumes, SAN adapters, drivers and the like.
The filesystem of Isilon (OneFS) stripes data across all nodes (of a disk pool
– see this
post for more details), it auto balances data blocks to avoid unbalanced
utilization of resources and it helps to avoid future data migrations in case
of technology refreshes. The expansion of the cluster just takes a few seconds
of management actions (see this
video) and once a node has been added, the space for TSM is available
immediately. Here is an example that shows how easy and fast new capacity
(nodes) can be added to the cluster and the space being available to TSM instantaneously:
# Just show the current time:
tsm: GSWARM01>sh time
Current Date and Time on the Server
----------------------------------------
04/28/2013
14:48:39
UTC (GMT) Date/Time is: 04/28/2013 12:48:39 PM
Daylight Savings Time is in effect: YES
# Now let’s look at the available disk space
tsm: GSWARM01>q dirspace
Device Class
Directory
Estimated
Estimated
Name
Capacity
Available
------------
--------------------------------- --------------
--------------
ISIDCNORD /tsmd1isi/gswarm01/data
427,671,368
M 118,849,927 M
# Now we add another node like shown in the video
# Then display again the capacity and available space for TSM
tsm: GSWARM01>q dirspace
Device Class
Directory
Estimated
Estimated
Name
Capacity
Available
------------ ---------------------------------
-------------- --------------
ISIDCNORD /tsmd1isi/gswarm01/data
534,589,210
M 225,767,761 M
# As you can see this took not even two minutes
tsm: GSWARM01>sh time
Current Date and Time on the Server
04/28/2013
14:50:08
UTC (GMT) Date/Time is: 04/28/2013 12:50:08 PM
Daylight Savings Time is in effect:
YES
Listing 3: Expansion of an Isilon cluster adds capacity for
TSM within two minutes
As you can
see, the expansion of an Isilon cluster is very easy and the capacity is
available to TSM immediately. The data redistribution is performed by Isilon in
the background and it should not affect the production workload. This is just
one example of Isilon’s ease of use and the reduction in complexity to the
application layer. Other features like remote replication, snapshots, flexible
data protection etc. help to protect the data with Meantime to Data Loss (MTDL)
values that reach billions of years (for example a N+3 data protection setting
yields to a calculated MTDL of about 3 billion years while the protection
overhead on a 10 node cluster for that level of protection is only 30%).
Summary
Isilon
provides a very efficient infrastructure that allows effective deployments of
backup to disk scenarios with high performance and a very high level of data
protection. Software features of de-duplication, compression and node
replication can be used while SLAs can be improved dramatically, especially for
data that needs to be accessed regularly. Even scenarios where HSM is used on a
smaller high performance filesystem can be deployed with Isilon as an effective
external archive tier (well you may ask why not using Isilon with its tiering
function (Smartpools) without HSM and that’s a valid question. However, the
world is complex and if someone has a HSM infrastructure in place already it
can be a good solution with all the advantages of a disk based tier over (or in
addition to) a tape solution, especially
if the data is accessed frequently).
- Reduced complexity for the TSM deployment
- No more SAN components required as well as all SAN management
- Almost zero management when adding capacity
- Well suited workload for Isilon with measured throughput of approximately 300MB/s/node on NL400 nodes without any optimization (just one TSM server)
- Read performance is typically even much better
- With a shared filesystem and TSM node replication it’s only one step from the monolithic TSM architecture towards an infrastructure that looks like backup as a service approach.
Contributions
Thanks to Lars
Henningsen from General Storage
and Stéphane
Criachi from Concat for providing input
and their test results for this article. These guys have expert knowledge in
backup solutions and Isilon and I would advise you to get in contact with them
if you consider implementing solutions that I outlined in this article. Also my
colleagues Andrej Kienkov and Frank Krämer from IBM provided some useful comments.
Further Reading:
Great article on how a TSM environment can be optimized with an scalable Isilon backup to disk solution. Not only Isilon is a great backup target for TMS also EMC Data Domain is. With Data Domain you get all the B2D advantages mentioned above AND optimized high speed granular de-duplication. A Data Domain can be connected over FC and/or NFS/CIFS to multiple TSM servers at the same time. TSM pools on a Data Domain can also be replicated across existing IP networks eliminating the need for TSM to run costly migration and copy storage pool operations.
ReplyDeleteCheck it out !
This is my very first time that I am visiting here and I’m truly pleasurable to see everything at one place.info from onlinebackupguide
ReplyDeleteHi, having read this awesome written piece I’m also pleased to fairly share my familiarity here with colleagues.best graphics card 2014
ReplyDeleteAs an Isilon/TSM administrator I must say that isilon is the finest TSM Backup Target (fom TSM) available from EMC. It is so simple, scalable, powerful, no fuzz and Isilon has a pefect fit with TSM deduplication, Node Replication, TSM TB licensing.
ReplyDeleteVery nice posting. Your article us quite informative. Thanks for the same. Our service also helps you to market your products with various marketing strategies, right from emails to social media. Whether you seek to increase ROI or drive higher efficiencies at lower costs, Pegasi Media Group is your committed partner will provide b2bleads.Emc Software Products Users Email List
ReplyDeleteThis is my very first time that I am visiting here and I’m truly pleasurable to see everything at one place.
ReplyDeletedevops course in bangalore
best devops training in bangalore
Devops certification training in bangalore
devops training in bangalore
devops training institute in bangalore
ReplyDeleteThanks for sharing NAS storage dubai