The Network Data Management Protocol was
invented decades ago by NetApp. It addressed the fact that NAS appliances
typically don’t allow 3rd party applications -such as backup agents-
to run on their operating system. Thus, the idea of NDMP was to provide a
general interface that backup applications can use to backup data on NAS
appliances.Many backup software vendors
have implemented it so that it could be seen as an industry standard.
However, it never really became a standard
in a way that works across different Software and/or Storage vendors. Every
vendor as a slightly different implementation
many things have changed and here are
better options today.In addition, the
idea to separate the control and data path to offload the data traffic to the SAN
is obsolete. As this was an advantage a decade ago, companies are moving away
from SAN because of complexity and cost. 10, 40 and even 100 Gigabit Ethernets
became the standard in datacenters so running a different network technology is
no longer effective.
I’d summarize the issues that are related
to NDMP as follows:
●NDMP is not storage agnostic.
In general, you cannot backup data and restore to another array from another
vendor or sometimes even another OS version.
●NDMP requires admin privileges.
No problem for backups of large systems but not nice for restores, especially
if a user wants to restore
a single file or just a set of files.
●Many of the backup software
solutions do not index the files of the NDMP sets. May be you can store a Table
of Content (TOC) with the backup but if you want to restore a single file you
have to load the TOC into a temporary table to work with it. This can be very
time consuming.
●NDMP doesn't really support
an incremental forever strategy.
That means you have to do a full backup periodically which is a no go with
large filesystems at petabyte scale that contain billions of files.
●NDMP has been developed for
Tape media in mind. So many small(er) files are being collected in a large tar
file that can then be stored to tape. This is not a good idea for today’s
backup targets like disk or object storage.
QF2 is a modern Scale-Out NAS solutions
(and one of a few that Gartner put into their leader’s quadrant
for scale out object and file solutions in 2018). It is being built for
billions of files and to backup data in a modern era requires much more
efficient approach.
QF2 Solution to Backup Millions of files
Any backup solution can use its native
backup mechanisms to backup data on an NFS export or SMB share. By doing so,
POSIX permissions or SMB ACLs will be preserved and any backup media that is
supported by the backup software can be used.
If you have millions of files, you may want
to backup a number of shares in parallel rather than mounting only the single
root share.
While the first backup will require a full
tree walk and may take quite some time to complete, Qumulo provides and elegant
way to avoid treewalks and long backup times for the following incremental
backups.
The Snapshot Difference API
QF 2 provides an API that can create a file
list of all files that have changed between two snapshots. For example, if
business wants to run a backup every day, a daily snapshot should be created by
Qumulo. This is fully automated and can be schedule through the GUI, CLI or
API. Then, before you start the incremental backup job, you’d call the Snapshot
Differences API to pull a list of files that have changed. This list is in
human and machine readable JSON format. The following picture illustrates that.
In this example a new directory /mchmiel/new_dir has been created as well as the file new_filewithin that
directory. As well, we see that the parent directory /mchmiel has change because the access time changed.
This file can easily be converted into any
other format like a flat text or CSV file.
The convertred file would then be used by
the backup application to only process the files within that list. Treewalks
are completely avoided and future incremental backups will be very fast.
For example, with Veritas NetBackup, the
relevant CLI parameter to use the list of files for the backup input is -listfile:
Other vendors like Atempo have already
integrated the use of Qumulo’s API into their solution so that the intermediate
step to create the file can be omitted.
Advantages of the proposed Method
This method has several advantages:
It works with all major backup solutions
It uses the native formats of the backup solution.
Any media that is supported by the backup solution can be used, such as Disk, Object Storage, Tape, VTL
Restore is very granular
Restore is storage agnostic
It is very fast
Vendors have already started to integrate the Qumulo API in their backup applications. With that, the intermediate step to create the file list can be omitted.
Atempo FastScan
As mentioned above, Atempo leverages the Qumulo API already for its FastScan technology. Atempo’s FastScan feature allows to rapidly collect the list of new, changed and deleted files (by leveraging the Qumulo API) on QF2 in order to initiate data movement early on.
From a high level, Atempo’s FastScan technology for QF2 does the following:
Trigger and manage SnapShots used to capture coherent file lists at a given point in time.
Retrieve the list of new, changed and deleted files since the last snapshot.
Perform the backup or archive to any media that is supported by Atempo (Disk, Tape, Object) without Treewalks on incremental backups.
By using QF2’s API, Atempo implements a modern incremental forever strategy that performs backups is a very fast manner. In addition, users can restore files in a very granular manner to any filesystem with requiring administrative privileges.
Splunk is a market leading platform for machine data. It allows to gather all kinds of log and machine generated data in a scalable manner to index, analyze, visualize large data sets. It provides historic and real time data analytics and a large ecosystem around it, including Machine Learning libraries and many more tools.
Figure 1: Splunk harnesses machine data of any kind for indexing, searching, analysis etc.
Architecture
The main components of any Splunk implementation are Forwarders, Indexers and Search Heads. Forwarders are typically software agents that run on the devices to monitor and forward steams of logs to the indexers. Indexers are the heart of Splunk’s Architecture. This is where data is parsed and
About two
decades ago, a number of parallel and distributed file systems were developed. The
impetus was that, when data began growing exponentially, it became clear that
scale-out storage was the paradigm to follow for large data sets. Some examples
of good scale-out file systems are WAFL (not really scale-out) , IBM Spectrum
Scale (aka GPFS), Lustre, ZFS and OneFS. All these systems have something in
common: they had their "first boot" sometime around the year
2000.They also all have their strengths
and weaknesses. Some of these systems are not really scale-out; others are
difficult to install and operate; some require special hardware or don't
support common NAS protocols; they may have scalability limits, or lack speed
of innovation.
Just the
fact that these systems were designed 20 years ago is a problem. Many important
Internet technology trends such as DevOps, big data, converged infrastructure, containers,
IoT or virtual everything were invented much later than 2000, so these file
systems are now used in situations they were never designed to handle. It is
clearly time for a new approach to file storage
Recently, I
became aware of a modern file storage system: Qumulo File Fabric (QF2).
Gartner recently named Qumulo the only new visionary vendor in the 2017 Magic Quadrant for distributed file systems and
object storage. QF2 was designed by several of the same engineers who built
Isilon roughly 15 years ago, and obviously their experiences led them to a very
modern and flexible solution.
This article
highlights some of QF2's main features that I think are worth sharing here. 1)
QF2 is hardware independent
Several
vendors say their product is independent of hardware-specific requirements.
They may have used the term "software defined." According to Wikipedia,
two qualities of a software-defined product are:
It operates independent of any
hardware-specific dependencies and is programmatically extensible.
Advanced Driver Assistance Systems, or ADAS, is the fastest growing segment in the automotive electronics [1][2]. The purpose of Advanced Driver Assistant Systems is to automate and improve safe driving. We use several ADAS features already built into our cars, such as Adaptive Light Control, Adaptive Cruise Control, Lane departure warnings, Traffic sign recognition and many more. Almost all car manufacturers and all leading suppliers such as Bosch, Autoliv, Continental, Mobileye and many others are working on ADAS systems and the final goal is to build a car that can drive completely autonomous – without any driver involvement. The Society of Automotive Engineers have defined 6 levels to describe the degree of automation[3].
Table1: Six Levels of automation in ADAS
The higher the desired automation level, the larger the validation efforts that are required to develop these assistance systems. The majority of the ADA Systems built into mass production cars today are between 2 and 4. For these systems, millions of kilometers need to be captured and simulated before the final control units are production ready.
Sensors
The majority of the data volume today is produced by video sensors. However, there are many other sensors generating data:
• Radar
• Lidar
• GPS
• Ultrasonic
• Vehicle Data
Kafka is a very famous
streaming platform. It scales well because you can cluster the brokers and it has intelligent but relatively THICK clients. These intelligent clients make it good for server to
server communication and it keeps the brokers quite lightweight. However, heavy clients are not well suited for IoT where you have tiny devices with
very little CPU and memory resources. For these type of environments, MQTT is an often used light weight protocol.
However, MQTT is weak when it comes to scaling it
horizontally (you’d need load balances from both sides, publishers and
subscribers and http which is too heavy and not reliable (subscribers must always
be on)). In this video,
Tim Kellog describes a method where MQTT environments have been made salable
with Kafka! Quite interesting approach to combine the strength of MQTT (lightweight on the client side) with Kafka (very scale-able streaming plattform).
During an average week, an ‘interaction worker[1]’ spends 19% of the time searching and gathering information[2]. Another source specifies that in 2013 content searches cost companies over $14,000 per worker and nearly 500 hours per worker[3]. Utilizing an efficient tool to assist in this process can have a considerable ROI.
On Dell EMC’s Isilon Scale-Out NAS platforms, users generate petabytes of unstructured data and billions of files. Data is created by individual users and machine generated data is exploding due to the growing number of sensors, log files, security devices etc. To be able to mine and search within the growing data lakes (imagine, the average size on Isilon clusters is approaching 1PB!), Dell EMC is working on a Search Appliance that Indexes data on OneFS in real time and allows users and admin to search metadata and content in a fraction of a second. Alongside the functionality to increase corporate efficiency through search, we are embarking on a journey to mine and analyze user-generated data and further leverage it to create additional business-value.
In it's first version, the planned features are:
Index files from multiple Isilon clusters
Search for files by name, location, size, owner, file type, and date.
Index files within containers such as zip and tar files.
Perform a targeted full content index (FCI) on search results to view a preview of the content and search for keywords and content inside.
Perform advanced search queries including symbols, wildcards, filters, and operators.
Preview and download content.
For example, administrators and end-users can execute the following use-cases on Isilon arrays:
As an End-user, find all my MS-Word files from last year, and then index the full-content of the files, and show me all the files with ‘project Andromeda’ in it
As an End-user, show me a chart of how my files breakdown by size and/or last-accessed date and/or size
As an Admin, find all PDF files owned by corp/user1 that were modified in the first three months of this year, compact them, and export them to a specified location
As an Admin, find all MPG files that are over 1GB in the /ifs/recordings subtree
As an Admin, find all Word, Excel, and PowerPoint documents that have not been accessed in a year
To get an Idea of the capabilities of the search appliance in it’s coming first release, watch the following video.
It's a true Scale-Out Solution
The product is a virtual appliance with Wizards for configuration, and it relies on Elasticsearch indexing technology and the Lucene search engine; it has a ‘google-like’ UI with visual Filtering capabilities. The technology is scaleable: Search nodes can be added ‘hot’, and it scales to billions of files and provides responses in 1-2 seconds. Once the user filters appropriately, s/he can execute actions such as export, and full-content indexing on the results.
Real time Indexing
While the initial index scan may take some time to complete, the solution will update the index in real time by plugging into Isilon’s audit notifications and the CEE framework. The solution will index meta data such as filename, file type, path, size, dates (last modified, create, last accessed), owner/uid, group/gid and access mode. Optionally we can index full content and application specific meta data.
Fig 1: Components of the solution
It uses the OneFS API to perform certain actions like deletes and other stuff. The protocol auditing (create, delete, modify,...) forwards notifications to a CEE server (typically running on a VM) so that index updates can be made in real time (watch the video so see it). A current limitation is that only file changes carried out via SMB and NFS are monitored and updated. Changes via FTP, the OneFS API (HTTP), HDFS, or on the local file system will not be reflected in the index without a re-scan at this point in time. User actions such as downloading files that show up in a search result is performed via an SMB share.
Searches
It is important to mention that searches are done against the index and regardless of the complexity of the query; the OneFS cluster will not be affected by the search. The UI is very simple to use and allows filtering, it shows detailed metadata of search matches, visualizations and allows user actions such as preview, download, export etc.
Fig 2: The search UI
Installation
The install is self-contained. The user does not need to ‘leave’ the UI at all during the whole process.
Interested in Beta testing?
If your customer is interested in participating in the Beta test, please register here. Be aware that we are interested in serious feedback and discussion with the user. The program is not required to have a nice test and play experience.
Requirements for the Beta Test
The customer needs to provide the following to be able to run the Alpha code:
VMWare ESX v 5.x or 6.x
Resources for the VM
32GB RAM
8 vCPUs
Can be reduced for smaller Isilon clusters
556GB disk space
Can be increased up to 2TB disk space
Can be thin provisioned
2TB is enough for 6+ billion files and folders
Isilon Cluster with OneFS 7.2 or higher
Chrome or Firefox web browser (IE will be supported for GA)
External Active Directory or LDAP server(s) (optional)
The Isilon Search virtual appliance has a built-in OpenLDAP server
Add additional external AD or LDAP servers to support specific users/groups for search or administration
OneFS must expose an SMB share on /ifs. The user specified when the Isilon Search is configured must have full access to this share. The share is used to download files and access them for full content indexing
Isilon Search will automatically:
Enable protocol auditing for all Access Zones (Indexing per Access Zone is planned for a future release.
Point “Event Forwarding” to the CEE server on the Isilon Search virtual appliance
For the Beta, no existing CEE audit servers may be configured. This will not be a restriction for GA
Only one Isilon Search system can point to a single Isilon cluster
Event forwarding can only be set for one destination
How many objects on your cluster ?
To determine the total objects on the Isilon Cluster, SSH into one of the nodes and run isi job start lincount. This will return a job number. Use isi job reports view <job number> to see the results once it completes. It may take a while to complete – typically about 30m for 1 billion object (like always depending on utilization, node types etc.).
More to come
Join us for this journey of creating business-value from user-generated data. The next stations are support for additional Dell EMC platforms, and for more high-value use-cases.
References
[1] Defined by McKinsey as “high-skill knowledge workers, including managers and professionals” [2] McKinsey Global Institute (MGI) report, July 2012: “The social economy: Unlocking value and productivity through social technologies”.
[3] Source: https://aci.info/2013/11/06/the-cost-of-content-clutter-infographic/
For more than a decade, Isilon has been delivered as an appliance, based on standard x64 servers using internal disks on which the scale out filesystem resides. Beside the scale-out character of the system and the ease of use, I strongly believe that the appliance idea is one of the main reason for the success of Isilon (and even other NAS appliances in the market). You get everything pre-installed, pre-tested, pre-optimized and easy to support system. However, there are also use cases, for which customers have asked for a software only version of Isilon. And because it’s based on FreeBSD and runs on standard x64 servers it’s just a matter of configuration, support and testing. The nodes in the Isilon appliance that EMC delivers are nothing else than x64 servers with different number of CPUs, memory, SSDs and HDDs, depending on the not type you choose. There is basically only one piece of hardware which has been specifically developed by the Isilon team. This is a NVRAM card that is used for the filesystem journal and caching. However, with OneFS 8.0, the code has been adopted so that now SSDs can be used as well for this purpose.
The Architecture
Considering the potential use cases, the decision has been made to provide the first version of IsilonSD Edge (SD stands for Software Defined) in a virtualized environment, based on VMWare ESXi 5.5 or 6. Since Isilon has a scale-out architecture and does erasure coding across nodes, it’s required to run a OneFS cluster on at least 3 ESX servers to maintain the availability the Isilon users are used to have. The OneFS cluster nodes run as virtual machines (VMs) and IsilonSD Edge supports only direct-attached disks and VMFS-5 datastores (data disks) that are created from the direct-attached disks. The maximum number of nodes in an IsilonSD Edge cluster currently is 6. The components that are included in the download package [4] are:
The IsilonSD Edge Management Server (runs in a VM)
IsilonSD Management Plug-in for VMWare vCenter
the OneFS virtual machine files.
Figure 1: Architecture of IsilonSD Edge
Current Configuration Options
The Software-only version of Isilon allows currently to be deployed with the following configurations:
Three to six nodes with one node per ESXi server
Data Disks:
Either 6 or 12 defined data disks
Minimum size of each data disk—64 GB
Maximum size of each data disk—2 TB
Minimum of eight of disks per node 8 (6 data disks, 1 journal disk, and 1 boot disk)
Maximum of 14 disks per node14 (12 data disks, 1 SSD for journal disk and 1 boot disk)
Minimum cluster capacity—1152 GB (Calculated as shown: Minimum disk size * # of drives per node * 3 (minimum number of nodes))
Maximum cluster capacity—Varies depending on your licenses and the resources available on your system.
Journal Disks
One SSD for the journal journal per node with at least 1 GB of free space
Boot Disk
One SSD or HDD for the OS per node with at least 20 GB of free space
Memory: minimum of 6 GB of free memory per node
Supported Servers
EMC IsilonSD Edge is supported on all of the VMware Virtual SAN compatible systems that meet the minimum deployment requirements. You can identify the compatible systems on the VMWare Compatibility Guide . Please note: although we use the vSAN compatible HCL to identify the supported systems, IsilonSD Edge itself does not support vSAN at this point in time. That might sound a bit strange but think about how OneFS protects data: with erasure coding across nodes using native disks. Although it will most probably work, using vSAN would add a redundant level of data protection and would most probably be contra-productive for performance. A typical good system for an IsilonSD Edge deployment would be the Dell PowerEdge R630 servers. If you spend a bit more by using a Dell PowerEdge FX2, you would get a 4 node IsilonSD cluster in a single 2U chassis.
A Free Version of IsilonSD Edge
There are two versions of IsilonSD Edge available. One is the regular paid version and one free community version. A lot of features are enabled in both versions except SyncIQ, SmartLock, CloudPools which are only enabled on the community version. You can start installing the free version and acquire license keys later on which can then be entered via the UI. The following table lists the differences of enabled features in both versions.
Feature
Function
Free license
Paid license
SmartPools
Groups nodes and files into pools
yes
yes
CloudPools
Transparent File Archiving to the Cloud
no
yes
Protocols NAS
NFS, SMB, HTTP,FTP,HDFS
yes
yes
Object Protocols
Swift
yes
yes
InsightIQ
Nice Monitoring for performance, capacity and forecasting
yes
yes
SyncIQ
Policy based and parallel synchronization between clusters
no
yes
SmartLock
WORM functionality for directories which require compliance protection
no
yes
SmartConnect Advanced
Load Balancing: round robin or based on CPU-utilization, connection count or throughput
yes
yes
SmartDedupe
Post process deduplication
yes
yes
SmartQuota
Enforcement and monitoring for quotas
yes
yes
SnapshotIQ
Filesystem or directory based Snapshots
yes
yes
NDMP Backup
IsilonSD Edge only supports 3way backups (no 2way due to lack of Fibre Channel connection)
yes
yes
Table 1: Supported features for the free and paid version of IsilonSD Edge
Use Cases
The most obvious use case for the software-only version are remote offices where no data center is available or locations where everything runs in a virtual environment. Using SyncIQ you can pull data that’s stored in the remote location into a central datacenter for backup purposes for example. Or you can even push content from a central location towards the edges (remote offices). You can even combine this with CloudPools [5] which enables you to keep the actual content local, while files that have not been used for some time are transparently pushed out the cloud. This can be very powerful because you get local performance with a small footprint but logically your NAS could be huge! What people also like is the fact that the Isilon appliance that resides in a data center is being managed in the same way as the virtual instances in remote locations (except the additional management VM that’s being used for the deployment).
The OneFS code is the very same code used in the appliance and SD version of Isilon. Therefore, IsilonSD Edge might be a good vehicle for functional testing. Be aware that performance tests are very much depending on the underlying hardware so they might not make sense if you want to know performance characteristics for an appliance version.
Webcast and more Information
I am running a short 30-40 minutes WebCast explaining IsilonSD Edge: Title: Extend your Data Center to your Remote Office with Software Defined Storage When: 14 September 2016 – 14:00 UK time / 15:00 Berlin time Register here: This link works if you want to see the recording.
Here are some more links with useful information. Especially [1] contains almost everything you need.
A good alternative: the Isilon x210 Fast Start Bundle
While I am writing this post, EMC hast just announced an entry level starter kit, containing three or more x210 nodes for a very attractive price. If you want to start with a small Isilon cluster, this might be a fantastic option to consider as well. As every appliance, it come pre-installed, pre-tested, ready to go. The new fast start bundle kit contains:
Three to eight X210 12TB Nodes
TWO 8-Port Mellanox Infiniband Switches (for the interal network) – no need to manage.
Enterprise Bundle:
SmartConnect
SnapshotIQ SmartQuotas
Cables & Accessories
Optional Support
This bundle is aggressively priced and the promotion runs until December 31st 2016. It can be acquired through an EMC partner. The maximum cluster size you can get is 8x 12TB = 96TB (raw). However, this limit is only a promo limit, not a technical one. It means that you can extend your cluster with any other available node type, including additional x210. In that case please note two things: 1.) You need to purchase bigger Infiniband Switsches if you want to build bigger clusters. 2.) For the configuration options you choose beyond the special offer, your regular company discount applies.
Summary
IsilonSD Edge is the first software-only version of Isilon and is a good starting point for a virtualized environment (based on VMWare ESXi - other hypervisors might be supported in the future). It’s a good way to connect remote locations with your central Data Lake built on an EMC Isilon Scale out Cluster. The functionality is equal to the appliance version of Isilon ( the free version has some restrictions). My personal preferred alternative would be a small Isilon cluster based on the appliance with x210 nodes, but this attractive promotion only runs until end of 2016.