23 October 2014

Direct I/O considerations for TSM Storage pools on OneFS (NFS)



For several reasons explained in previous posts, EMC’s Scale Out NAS system Isilon is a fantastic target for TSM storage pools. It’s easy to use; it scales linearly and provides a shared and fast filesystem for TSM instances. This makes a TSM admin’s life much easier. During a recent project we learned that the throughput we can achieve with the Tivoli Storage Manager (TSM)  vary quite significantly depending on whether TSM is configured to perform buffered vs. un-buffered I/O (direct I/O) on the storage pools that reside on OneFS. This article describes some dependencies between buffered or direct I/O, CPU utilization, I/O latencies and throughput and the role of Isilon’s Smartcache feature. Although we discuss that here in the context of TSM and OneFS, the discussed aspects should be valid for other applications as well.

Direct I/O

Direct I/O is sometimes also referred to as un-buffered I/O. Direct I/O bypasses the Virtual Memory Manager (VMM) of the operating system and therefore the cache of the file system layer. It is typically used for applications that do not benefit from caching or read-ahead algorithms. For example, database systems like Oracle and DB2 maintain their own application cache and in this case filesystem caching would be redundant and resources would be wasted since copying data to and from memory (for caching) is CPU intensive.

The current default setting for TSM is to use direct IO on storage pools (can be set in TSM’s configuration file dsm.opt). Even though direct I/O reduces the CPU utilization on the TSM server, it generally leads to significant performance penalties [1] and longer elapsed times, since writes are synchronous and need to go directly to disk, instead of being copied to memory and flushed later. Also Direct I/O reads cause synchronous reads from the disk, whereas with normal cached policy, the reads may be satisfied from the cache. This can result in poor performance if the data was likely to be in memory under the normal caching policy. Direct I/O also bypasses the VMM read-ahead algorithm because the I/Os do not go through the VMM. The read-ahead algorithm is very useful for sequential access to files because the VMM can initiate disk requests and have the pages already resident in memory before the application has requested them [2].

Disable direct I/O for higher throughput

This is most probably the reason why IBM requires disabling direct I/O on the TSM server when using storage pools on NFS on their de-duplication appliance [3]. For storage pools on OneFS it seems to be very beneficial to do the same, although it requires more work for the TSM server CPU to copy data from and to file system buffers in memory (in addition, the sync daemon needs to flush dirty cache pages out to disk periodically). However, disabling direct I/O reduces latencies and the overall throughput will be much higher. On x86-Architectures, CPU resources are not a big cost factor and on more expensive architectures like the IBM Power Architecture (where adding more CPUs might be a cost concern), one could use the AIX Workload Manager to guarantee enough CPU resources for important services that keep the system responsive even in situations with extremely high I/O load.

One of the reasons for the performance degradation we saw with direct I/O enabled is the fact that Linux and Unix perform all writes synchronously. This leads to negative side effects with regards to latency and throughput. OneFS has a feature called Smartcache [4] that consist of two components: a coalescer and the Endurant Cache (EC). The coalescer coalesces (and optimizes) writes and sends them to the Entrant Cache which is a NVRAM buffered cache. As soon as data is written to the EC, the write can be acknowledged to the client, which reduces latencies dramatically and can allow higher throughput. However, with direct I/O enabled on the server and the fact that we get synchronous I/Os, the coalescer is not allowed coalescing I/Os and the Smartcache cannot effectively defer IOs. Figure 1 shows the impact of direct I/O vs. buffered I/O on NFS3 on OneFS and you can see that it is quite significant. All IOs were done with 1024 KB block size and a 5 node NL400 cluster and a CentOS client (be aware that TSM writes with 256kB block size, therefore you probably end up with slightly lower maximum throughput with non-direct I/O, but 400-500 MB/s is a good value to calculate with).

Figure 1: Single threaded: Direct I/O vs. buffered I/O in MB/s per NL400 node on OneFS 7.1.0

As the measurements shows, it extremely beneficial to switch off direct I/O on the TSM server (at the price of higher CPU load on the server). I think direct I/O is fine for storage pools on SAN but on a NAS system it seems not to be appropriate.

What if you must stick with Direct I/O

We recently ran into a situation where a customer didn’t want to disable direct I/O since they ran IBM Power CPUs with AIX which they found to be very expensive. On a standard x86 platform CPU resources are not a big issue because the price/performance is great; accordingly you get a lot of power for little money. However, in this situation CPU resources on the server side were an issue and we were forced to live with direct I/O.

The requirement was to deliver a certain throughput at 1000 threads, assuming that in peak times 1000 TSM clients might be active. For the reasons explained above, these extreme conditions can force the EC into a bottleneck, since it cannot defer IOs to reduce latencies (in which case a cache doesn't make sense of course). As a result, we could not achieve the targeted performance. However, one of strength of OneFS is that it is very flexible and you can enable/disable the Smartcache for any pool or directory in the filesystem. Disabling the EC for the pool or directory that contains the TSM volumes increased the throughput about 40% (again with the same I/O workload: direct I/O=enabled and 1000 threads). At this point the CPUs on the NL nodes ran at almost 100%, which was obvious because we needed to move the incoming data 1:1 to disk without being able to coalesce and defer them for much more efficient I/O.

In summary we can say that with direct I/O you can save CPU resources on the TSM server while at the same time you need to perform more CPU work on the storage side for the reasons explained. In our specific environment, with direct I/O enabled, we achieved approximately 230 MB/s write and 370 MB/s read throughput per NL400 node at 1000 threads. With direct I/O disabled or lower I/O thread count the throughput you can achieve is higher.

Improve throughput with Direct I/O

If, for any reason, you cannot disable direct I/O (like in our situation mentioned above), but you want more throughput, here are some options that will help:
  • Add additional nodes to the cluster. The good news is that Isilon not only scales capacity when you add nodes but also you add CPU power, network and memory resources.
  • Use stronger Isilon nodes such as the X410 which come along with two 8-core Ivy-bridge family processors.
  • Add one or more A100 Accelerator Nodes to the cluster which have no disks (no capacity) but a lot of CPU power. These nodes are less expensive and are delivered with two Intel Sandy Bridge Hex-Core processors that run at 2 GHz. (We may do some tests quite soon with A100s and I’ll post some relevant information here).

Data consistency considerations

With respect to data consistency there should be no issue with disabled direct I/O since TSM will commit entries to the database not before the file system cache has been flushed to disk.


Direct I/O performs non-buffered I/O, bypassing the file system cache. This reduces CPU consumption on the TSM server, but limits the throughput you can get with storage pools on OneFS (and most probably most if not all other NAS systems). For higher throughput, TSM allows you to disable direct I/O which lowers latencies and increases the maximum throughput for storage pools on OneFS significantly. If CPU consumption on the TSM server is a concern on AIX servers, you might limit the resource consumption with the AIX Workload Manager. If you decide not to disable direct I/O you may consider disabling the Smartcache on OneFS, which will optimize the throughput in high load situations at the price of higher CPU load on the Isilon system.



[1] Direct I/O write performance, IBM Information Center.
[2] Direct I/O read performance, IBM Information Center.
[3] IBM ProtecTIER Implementation and Best Practices Guide, Chapter 15.3.1, page 255
[4] Write Caching with SmartCache, Isilon OneFS Version 7.1.1 CLI Administration Guide, p.334
[5] Best practices for NFS client settings, Knowledge Base Article, available on support.emc.com



Thanks to Kip Cranford and Stephan Esche for comments and our very helpful discussion to understand the described dependencies.

No comments:

Post a Comment