About two
decades ago, a number of parallel and distributed file systems were developed. The
impetus was that, when data began growing exponentially, it became clear that
scale-out storage was the paradigm to follow for large data sets. Some examples
of good scale-out file systems are WAFL (not really scale-out) , IBM Spectrum
Scale (aka GPFS), Lustre, ZFS and OneFS. All these systems have something in
common: they had their "first boot" sometime around the year
2000. They also all have their strengths
and weaknesses. Some of these systems are not really scale-out; others are
difficult to install and operate; some require special hardware or don't
support common NAS protocols; they may have scalability limits, or lack speed
of innovation.
Just the
fact that these systems were designed 20 years ago is a problem. Many important
Internet technology trends such as DevOps, big data, converged infrastructure, containers,
IoT or virtual everything were invented much later than 2000, so these file
systems are now used in situations they were never designed to handle. It is
clearly time for a new approach to file storage
Recently, I
became aware of a modern file storage system: Qumulo File Fabric (QF2).
Gartner recently named Qumulo the only new visionary vendor in the 2017 Magic Quadrant for distributed file systems and
object storage. QF2 was designed by several of the same engineers who built
Isilon roughly 15 years ago, and obviously their experiences led them to a very
modern and flexible solution.
This article
highlights some of QF2's main features that I think are worth sharing here. 1)
QF2 is hardware independent
Several
vendors say their product is independent of hardware-specific requirements.
They may have used the term "software defined." According to Wikipedia,
two qualities of a software-defined product are:
It operates independent of any
hardware-specific dependencies and is programmatically extensible.
QF2 fulfils both requirements admirably. You can run QF2 on standard hardware provided by Qumulo, on HPE Apollo 4200 servers, and in AWS. For development and testing purposes, Qumulo offers a free OVA package so that you to run a fully functional cluster on VMware Workstation or Fusion. You can also run a standalone instance of QF2, with 5TB of storage, on AWS for free. You only pay for the AWS infrastructure.
QF2 fulfils both requirements admirably. You can run QF2 on standard hardware provided by Qumulo, on HPE Apollo 4200 servers, and in AWS. For development and testing purposes, Qumulo offers a free OVA package so that you to run a fully functional cluster on VMware Workstation or Fusion. You can also run a standalone instance of QF2, with 5TB of storage, on AWS for free. You only pay for the AWS infrastructure.
Because QF2 ist fully manageable via an API, it's fully extensible and it can be integrated into any operational environment.
The QF2 architecture
The
following figure illustrates the main components of QF2:
Figure 1: QF2 and its main components/functions
QF2 comes
with many enterprise features such as continuous replication, snapshots, real
time quotas and analytics and is complemented by a cloud based monitoring and
superior customer support.
Within the
Qumulo Core software, the underlying Scalable Block Storage (SBS) is basically
a massively scalable distributed database, specialized for file-based data. It
is where data is stored and data protection occurs.
QF2 runs in user space
The QF2 core
OS is built on Ubuntu. Qumulo developers can leverage all the capabilities of the
Linux ecosystem. The QF2 file system processes run in Linux user space rather
than in kernel space, which has a number of advantages [1]:
• QF2
has its own implementations of protocols
such as SMB, NFS and LDAP, which are independent of the underlying OS. For
example, NFS runs as a service with its own notations of users/groups. This
makes QF2 more portable.
• Kernel
mode is primarily for device drivers that work with specific hardware. By
operating in user space, QF2 reinforces its hardware independence. It can run
in a wide variety of configurations and environments.
• Running
in user space means that Qumulo can develop and deliver features at a much
faster pace.
• Running
in user space improves QF2 reliability. As an independent user-space process,
QF2 is isolated from other system components that could introduce memory
corruption, and the QF2 development processes can make use of advanced memory
verification tools that allow memory-related coding errors to be detected prior
to software release. By using a dual partition strategy for software upgrades,
Qumulo can automatically update both the operating system and Qumulo Core
software for fast and reliable upgrades. You can easily restart QF2 without
having to reboot the OS, node or cluster.
Interactive API wizard
QF2 is
programmatically extensible. It has a complete API, which can be extended and
integrated into any datacenter environment.
If you like,
you can use the API as the primary interface for all your management and
operation tasks. However, for convenience, there is also a web UI and a CLI
available. Both the UI and CLI use the same API that anyone can use to interact
with QF2. The API and Python bindings are documented and available on GitHub. The same is true for the command line wrapper,
qq.
One thing
that Qumulo did very well is something I'd call its API wizard. On the web UI, under
API & Tools, you can select any
operational task and the system comes back with the related API call and
response. Figure 2 shows an example for a user login. You select the action for
Session Management (here POST Login)
and then hit Try it! Figure 3 shows
the output and the JSON code.
Figure 2: Web UI and Interactive API Screen:
Login method selected
Figure 2: Web UI and Interactive API Screen
Simplicity
The web UI
is very intuitive and all the typical tasks can be done from there. I'd say
that if an admin has any storage experience, operating QF2 is a no-brainer. Play
with the Web UI for 10 minutes, and you'll be fine. For scripting, you might
consider using the command line wrapper qq locally on the cluster or from your
laptop. Also, updates and adding or removing nodes from a cluster is a very
simple and non-disruptive process.
Hybrid SSD/HDD architecture for hot/cold tiering and all-flash
With Qumulo,
SSDs are not only used to cache reads. Instead, every write is written to the SSDs
and destaged at a later time to the HDDs according to policies. Thus, not only are
initial writes much faster, but also the destaging IOs to HDDs will be faster
because they'll be consolidated and sequential. And all that happens without
any proprietary system components like NVRAM cards.
For the
majority of workloads, these hybrid configuration work extremely well with a
very favorable cost profile. For workloads with extreme IO profiles, all flash
systems (Qumulo P-series) are available as well.
Higher efficiency
While legacy
systems do erasure coding (EC) on a file-by-file basis or data protection with
RAID, QF2 does erasure coding on a block by block basis (see the SBS layer
shown in figure 1). This approach is much faster and more efficient. It is
faster because it is independent of the file count and more efficient because
it is independent of file sizes. Protecting small and large files requires the
same protection overhead. Legacy systems switch to mirroring to protect smaller
files.
The block-based
approach also makes the distributed rebuild process much faster. Much better
MTDL values can be achieved with a similar protection level compared to a file
by file based approach (at least when there are a lot of small files). This
also increases the capacity efficiency of the file system. While legacy systems
recommend using only 80% of the capacity, QF2 can use 100% of the available
space. QF2’s block-based protection
requires no user-provisioned capacity for reprotection other than a small amount
of space for sparing, which is excluded from the reported user-provisioned
capacity.
Real-time analytics and quotas
One of the
smartest things in QF2 is its real-time analytics capabilities. Metadata, such
as bytes used and file counts, is aggregated when files and directories are
created or modified, which means the information is available for timely
processing without expensive file system tree walks.
The web UI
includes a large number of real-time dash boards and graphs such as IO hotspots
and throughput hotspots, and all the data can be retrieved via the well-documented
API if you would also like to process it with other tools, such as Splunk, for
example.
Figure 4: Example of Qumulo's dashboards:
capacity trends
Continuous delivery of new features and non-disruptive upgrades
Traditional
storage system vendors deliver about one or two major upgrades per year, which
often require service interruption and precise planning. Qumulo is different.
Due to their Agile development methods, a new version is delivered every few
weeks. Testing is an integral part of their development methodology and the
upgrade process is simple to run.
Superior support
I heard from
several customers that they specifically like the way Qumulo provides support
for them when they need it most. Instead of getting connected to a first-line
support team who ask you for your serial number, site ID or other things and
then work through a number of static processes or scripts, you get connected
directly with your dedicated Customer Success Manager (CSM) who has years of
actual storage management experience. These folks will help you directly rather
than passing your case from one queue to another and around the world. You can
even get help through your personal Slack channel or the Qumulo Care Support
Portal. There are many customer voices and feedback comments listed on Gartner's Peer Insight site that prove the exceptional support
Qumulo offers.
Very simple subscription based pricing
Qumulo offers a very simple subscription-based SaaS
licensing. It's a straightforward and transparent €/TB model that also includes
future roadmap features. Also, the subscriptions are transferrable to future
platforms.
References and further reading
[1] Qumulo File Fabric Technical
Overview, White Paper
https://qumulo.com/documents/20/WP-Q152-QF2-Technical-Overview.pdf
https://qumulo.com/documents/20/WP-Q152-QF2-Technical-Overview.pdf
[2] Manage Your Data, Not Your
Storage - Surviving Today’s Machine Data Onslaught;
Webinar https://qumulo.com/resources/library/manage-your-data-not-your-storage-surviving-todays-machine-data-onslaught/
Webinar https://qumulo.com/resources/library/manage-your-data-not-your-storage-surviving-todays-machine-data-onslaught/
[3] Enterprise Storage Playbook for
Managing Billions of Files & Petabytes of Data
Webinar, 59 Min: https://qumulo.com/resources/library/enterprise-storage-playbook-managing-billions-files-petabytes-data/
Webinar, 59 Min: https://qumulo.com/resources/library/enterprise-storage-playbook-managing-billions-files-petabytes-data/
No comments:
New comments are not allowed.