One of the
most important aspects in choosing a scale-out NAS system is the list of
supported protocols. In this post I will focus on the access protocols and give
some notes on the Hadoop Distributed Filesystem (HDFS) support in Isilon OneFS.
The supported authentication protocols were covered in this
other blog post and the management protocols may be covered in a future
post.
At this
point in time, the current Isilon operating system version is OneFS 7.1 and IBM’s
SONAS version is 1.4.2.
IPv6 Support
Although it is not directly related
to the access protocols, it is worth mentioning that SONAS does not appear to
support IPv6 as it is not mentioned in the IBM SONAS
documentation nor is it listed on the IBM IPv6
compliance product list. Considering that many enterprise customers are
planning or starting their IPv6 rollout, it is another indication that IBM is
not seriously innovating with SONAS and that it is not a strategic asset in their
storage portfolio.
Isilon has supported IPv6 for many
years now. I was able to find IPv6 support going back to version 6.5 released
in 2011. The current version 7.1 builds upon this IPv6 maturity.
The following table lists the
supported file access protocols, as well as RESTful access options and HDFS
support for Hadoop and compatible access methods.
Protocols
|
Isilon
|
SONAS
|
SMB1
|
Yes
|
Yes
|
SMB2
|
Yes
|
Partly2)
|
SMB 2.1
|
Yes
|
No
|
SMB3
|
planned
|
No
|
NFS3
|
Yes
|
Yes 3)
|
NFS4
|
Yes
|
No
|
HTTP
|
Yes
|
Yes
|
FTP
|
Yes
|
Yes
|
RESTful API
|
Yes
|
No
|
HDFS V 1.0
|
Yes
|
No
|
HDFS V 2.0
|
Yes
|
No
|
HDFS V 2.1
|
planned
|
No
|
Table 1: Supported Access Protocols in SONAS 1.4.2 and OneFS 7.1
SONAS SMB Limitations
Regarding SMB support, IBM lists
several limitations and considerations in this
official IBM document:
- Alternate Data Streams are not supported. Alternate Data Streams have been introduced first in Windows NT for compatibility reasons with Apple’s Hierarchical File Systems (HFS) (where data is stored in two parts, the data fork and resource fork). But Alternate Data Streams are used for other purposes as well. For example, Meta data stored for Office documents (that you can access and modify via the File Properties menue) are stored in ADS. Since SONAS doesn’t support ADS, these information is not accessible to clients and cannot be used for indexing.·
- SMB 2.1 is not supported at all·
- Level 2 Oplocks are not supported. That means client requests for L2 oplocks are not granted. That has impact on the clients ability to cache data locally which increases network traffic.
Beside the fact that SONAS does not currently
support SMB 2.1, IBM also lists significant technical limitations in the
documentation. Some of these are precautionary considerations but others are
real product limitations.
SONAS NFS Limitations
SONAS limitations for NFS are
documented in this
IBM article. Some of the comments are just considerations (i.e one should not
mount the same data via different paths or exports to the same client. The
potential data corruption is a result of how NFSv3 handles data, it is not a
SONAS issue). However, the following limits are relevant and do not exist in
Isilon OneFS:
- NFS version 4.0 is not supported
- Clients should mount IBM SONAS NFS exports using IP addressing only. Do not mount an IBM SONAS NFS export using a DNS RR entry name. If you mount an IBM SONAS NFS export using a host name, ensure that the name is unique and remains unique. This restriction prevents data corruption and data unavailability, as the lock manager on the IBM SONAS system is not "clustered-system-aware". That means they call it a Scale Out Cluster without clustered file-system awareness. Well done….
- Files created on an NFSv3 mount on a Linux client are visible only through CIFS clients mounted on the same server node. CIFS clients mounted on different server nodes cannot view these files.
The last two points mean that you cannot adequately use a load balancer or DNS round robin to distribute the SMB and NFS mounts equally across the interface nodes. This static mapping seems very inflexible and requires administration overhead! Isilon shines with SmartConnect for this purpose. SmartConnect is an intelligent Domain Name Server that responds to client queries with an IP number from a relevant pool of IP addresses that can balance client connections based upon CPU load, interface throughput or connection count or simply provide round-robin load balancing. So no need to individually take care of SMB or NFS clients, OneFS is fully cluster-aware.
RESTful access to namespace
In today’s
world, mobile devices are used on daily base to access data. Typically these
devices access data via HTTP rather than NFS or CIFS. The same is true for many
applications. Therefore, a REST API has been introduced to Isilon that is called
RESTful Access to Namespace (RAN) [1]. RAN enables applications to create, delete
and modify data through the API via HTTP/1.1 queries. Over time you may also see
other RESTful API functionality integrated into OneFS. If you would like to use existing REST APIs
like Swift or S3 you can do it already by using VIPR.
Hadoop Distributed Filesystem (HDFS) Support
Data
Analytics is a very hot topic these days as companies massively start to
explore the value in their data. The classic Business Intelligence (BI) workload
has done this for decades but was traditionally focused on structured data that
was stored in large, monolithic databases.
However,
since the introduction of MapReduce algorithms in BI working on unstructured
data (i.e. files/objects) that reside in a Hadoop Distributed File System
(HDFS), a whole new set of analytics opportunities have appeared. In the
meantime, a number of Hadoop distributions have been established in the market
like Apache, Cloudera, Pivotal HD, Hortonworks and others. All of them have in
common that they can analyse massive amount of data that resides in a HDFS file
system. All Hadoop cluster nodes typically
have a compute component and storage component (providing the HDFS layer). The
storage is typically implemented with internal disks attached to compute nodes.
Most Hadoop projects start small, so this is the most cost effective solution
(in terms of CAPEX, but not operational cost). Figure 1 illustrated the
components (compute, storage, IO-path) of a traditional Hadoop cluster.
Figure 1:
Traditional Hadoop Cluster where all nodes contain the compute and storage
components (typically DAS). Data must be copied into and result out of the HDFS
cluster to access them with POSIX clients like NFS/SMB/FTP .
HDFS is
optimized for this purpose but it has some drawbacks:
- Data protection by default has each block is stored at least three times
- Many distributions have a single point of failure with a lone primary name node
- Missing enterprise storage features like remote replication, snapshots, backup APIs
- HDFS is not POSIX compliant. Existing applications cannot access the data without special ‘gateway’ tools
- Existing data that resides in a traditional filesystem must be imported into the HDFS namespace. This is a challenge of time, bandwidth, computation, and concurrent capacity. Imagine you need to copy 200TB over a 10 gigabit link into a HDFS namespace. Even if the link is dedicated to the task, this copy process would take more than 48 hours.
Isilon has HDFS integrated as a protocol
EMC has engineered HDFS as a built-in protocol in Isilon OneFS. That means that Isilon understands and talks HDFS with the compute nodes but stores the data internally in OneFS with POSIX semantics. This means the Hadoop cluster is now split into two parts: compute and storage. The following figure illustrates this.
Figure2:
Hadoop Cluster with Isilon as a HDFS Storage Backend. No requirement to copy
data into the HDFS cluster. Data can be accessed from HDFS compute nodes as
well as traditional SMB/NFS/FTP clients.
This has
some significant advantages:
- You do not need to dedicate silos of compute and storage for Hadoop analytics. You can analyse the data directly where it can be accessed through other protocols by other application sets, you can perform snapshots and replicate that data elsewhere. This shared platform avoids the time consuming process of copying data in and out of a siloed HDFS filesystem.
- Isilon stores the data much more effective than a native HDFS (see my other blog post for data protection). Roughly we could say that we achieve 80% usable to raw disk efficiency rather than 30% with the native HDFS approach. This efficiency cannot be matched by IBM, NetApp, or any other major storage vendor.
- You can utilize your compute farm flexibly with different Hadoop distributions and HDFS versions with the same data sets. This is very nice for migrations and (as far as I know) not possible with the native implementations.
- You can use enterprise storage features that you get with Isilon such as file system snapshots, replication, enhanced kerberized security, access zones, and more.
- You can scale you compute farm independently from storage. If you need more compute capability, add more multicore physical or virtual servers, if you need more storage for your data sets , add Isilon storage nodes. All HDFS data on Isilon can be accessed with traditional existing tools that follow POSIX semantics.
The performance figures I have seen so far are quite similar on Isilon vs.
the direct-attached non-virtualized compute+storage model. Some workloads are slower,
but many are faster. However, this is only about the compute time to result. As
mentioned before, you save the majority of the time by not being forced to copy
data between a HDFS and a POSIX filesystem.
Conclusion
If you consider using SONAS, you need to carefully check your use cases
and environment since as many basic functions and protocols are not supported
such as IPv6, SMB 2.1, NFSv4. Even for SMB 2 and NFSv3 there are several restrictions
that can cause problems if you want to implement it in a heterogenious
environment. Isilon has a much greater set of supported protocols and all of
them are supported in a cluster-aware manner. Furthermore, Isilon directly supports
the Hadoop Distributed Filesystem with
key improvements over classic architectures.
instagram sign up
ReplyDeletetutuapp apk ios
ReplyDeleteThanks for sharing NAS storage dubai
Thanks for this amazing article on "Networks Security protocols" I was Just searching for NAS products for gaming and found this amazing website of yours.
ReplyDelete