Category: Data management
For many mainframers, the concept of writing to object storage from zSeries mainframes over TCP/IP is a new concept.
The ease of use and the added value of implementing this solution is clear, but there is another question: What to use as a target repository? How do customers decide on a vendor for object storage and whether a private cloud, hybrid cloud, or public cloud should be used? Target repositories can be either an on-prem object storage system, like Hitachi HCP, Cohesity, or a public cloud, such as AWS, Azure or GCP.
The best option for you depends on your individual needs. There are pros and cons in each case. In this post, we break down the factors you need to consider so you know how to choose a target cloud repository that will meet your needs.
- Network bandwidth and external connections
- Amount of data recalled, restored or read back from repository
- DR testing and recovery plans
- Corporate strategies, such as “MF Modernization” or “Cloud First”
- Cloud acceptance or preferred cloud vendor already defined
- Cyber resiliency requirements
- Floor or rack space availability
Network bandwidth and external connections
Consider the bandwidth of the OSA cards and external bandwidth to remote cloud, if cloud is an option. Is the external connection shared with other platforms? Is a cloud connection already established for the corporation?
For on-premise storage, network connectivity is required, yet it is an internal network with no external access
Amount of data recalled, restored or read back from repository
There are added costs for reading data back from the public cloud, so an understanding of expected read throughput is important when comparing costs. If the read rate is high, then consider an on-premise solution.
DR testing and recovery plans
Cloud-based recovery allows recovery from anywhere. And public clouds can replicate data across multiple sites automatically. The disaster recovery or recovery site must have network connectivity to the cloud.
On-premise solution requires a defined disaster recovery setup, a second copy of the object storage off-site that is replicated from the primary site. Recovery at the DR site will access this replicated object storage.
Corporate strategies, such as “Mainframe Modernization” or “Cloud First”
You should be able to quickly move mainframe data to cloud platforms by modernizing backup and archive functions. Cloud also offers either policy-driven and/or automatic tiering of data to lower the cost of cold storage.
If there is no cloud initiative, the on-premise solution may be preferred. Many object storage providers have options to push the data from on-premise to public cloud. So, hot data can be close and cold data can be placed on clouds.
Cloud acceptance or preferred cloud vendor already defined
Many corporations already have a defined cloud strategy and a cloud vendor of choice. You’ll want a vendor agnostic solution.
The knowledge of defining the repository and maintaining it could be delegated to other groups within the organization familiar with and responsible for the corporate cloud.
Cyber resiliency requirements
On-premise solutions can generate immutable snapshots to protect against cyber threats. An air-gapped solution can be architected to place copies of data on a separate environment that can be detached from networks.
Cloud options also include things like versioning, multiple copies of data, and multi-authentication to protect data and allow recovery.
Floor or rack space availability
With an on-premise solution, floor space, rack space, power, etc is required. With a cloud solution, no on-premise hardware is required
There is no clear-cut performance benefit for either solution. It depends upon the hardware and network resources and the amount of data to be moved and contention from other activity in the shop using the same resources.
Cloud customers with performance concerns may choose to establish a direct connection to cloud providers in local regions to prevent latency issues. These concerns are less relevant when a corporate cloud strategy is already in place.
Cloud storage is priced by repository size and type. There are many add-on costs for features and costs for reading back. There are mechanisms to reduce costs, such as tiering data. Understanding these costs upfront is important.
On-premise object storage requires at minimum two systems for redundancy, installation and maintenance.
If you are still using a legacy VTL/Tape solution, you could be enjoying better performance by sending backup and archive copies of mainframe data directly to cloud object storage.
The reason for this is when you replace legacy technology with modern object storage, you can eliminate bottlenecks that throttle your performance. In other words, you can build a connection between your mainframe and your backup/archive target that can move data faster. You can think of this as “ingestion throughput.”
3 ways you can increase ingestion throughput for backup and archive copies of mainframe data
Here are the top three ways you can increase ingestion throughput:
#1: Write data in parallel, not serially
The legacy mainframe tapes used to make backup and archive copies required data to be written serially. This is because physical tape lived on reels, and you could only write to one place on the tape at a time. When VTL solutions virtualized tape, they carried over this sequential access limitation.
In contrast, object storage does not have this limitation and does not require data to be written serially. Instead, it is possible to use a new method to send multiple chunks of data simultaneously directly to object storage using TCP/IP.
#2: Use zIIP engines instead of mainframe MIPS
Legacy mainframe backup and archive solutions use MSUs, taking away from the processing available to other tasks on the mainframe. This in effect means that your mainframe backups are tying up valuable mainframe computing power, reducing the overall performance you can achieve across all the tasks you perform there.
You do not need to use MSUs to perform backup and archive tasks. Instead, you can use the mainframe zIIP engines—reducing the CPU overhead and freeing up MSUs to be used for other things.
#3: Compress data before sending it
Legacy mainframe backup and archive solutions do not support compressing data before sending it to Tape/VTL. This means that the amount of data that needs to be sent is much larger than it could be using modern compression techniques.
Rather it is possible to compress your data before sending it to object storage. Not only do you benefit from smaller data transfer sizes, but you can increase the effective capacity of your existing connection between the mainframe and storage target. For example, compressing data at a 3:1 ratio would effectively turn a 1GB line into a 3GB line—allowing you to send the same amount of data faster while still using your existing infrastructure.
Faster than VTL: Increase Mainframe Data Management Performance
Replacing your legacy VTL/Tape solution with a modern solution that can compress and move data to cloud-based object storage can significantly decrease the amount of time it takes to backup and archive your mainframe data, without increasing resource consumption.
Writing in parallel, leveraging zIIP engines, and employing compression is a low-risk, and high-reward option that leverages well-known, well-understood, and well-proven technologies to address a chronic mainframe challenge. This can yield immediate, concrete benefits such as reducing the amount of time it takes for you to backup and archive your mainframe data and cut costs while boosting capabilities.
Mainframe modernization is a broad topic and one that elicits symptoms of anxiety in many IT professionals. Whether the goals are relatively modest, like simply updating part of the technology stack or offloading a minor function to the cloud, or an ambitious goal like a change of platform with some or all functions heading to the cloud, surveys show it is a risky business…
For example, according to the 2020 Mainframe Modernization Business Barometer Report, published by OneAdvanced.com, a UK software company, some 74 percent of surveyed organizations have started a modernization program but failed to complete it. This is in accord with similar studies highlighting the risks associated with ambitious change programs.
Perhaps that’s why mainframe-to-cloud migration is viewed with such caution. And, indeed, there are at least five reasons to be wary (but in each case, the right strategy can help!)
Top 5 reasons why mainframe to cloud migration initiatives fail
A focus on lift and shift of business logic
Lift and shift is easier said than done when it comes to mainframe workloads. Mainframe organizations that have good documentation and models can get some clarity regarding business logic and the actual supporting compute infrastructure. However, in practice, such information is usually inadequate. Even when the documentation and models are top notch, they can miss crucial dependencies or unrecognized processes. As a consequence, efforts to recreate capabilities in the cloud can yield some very unpleasant surprises when the switch is flipped. That’s why many organizations take a phased and planful approach, testing the waters one function at a time and building confidence in the process and certainty in the result. Indeed, some argue that the lift and shift approach is actually obsolete. One of the enablers for the more gradual approach is the ability to get mainframe data to the cloud when needed. This is a requirement for any ultimate switchover but if it can be made easy and routine it also allows for parallel operations, where cloud function can be set up and tested with real data, at scale, to make sure nothing is left to chance and that a function equal to or better than on-premises has been achieved.
Ignoring the need for hybrid cloud infrastructure
Organizations can be forgiven for wanting to believe they can achieve a 100 percent cloud-based enterprise. Certainly, there are some valid examples of organizations that have managed this task. However, for a variety of good, practical reasons, analysts question whether completely eliminating on-premises computing is either achievable or wise. A “Smarter with Gartner” article, Top 10 Cloud Myths, noted “The cloud may not benefit all workloads equally. Don’t be afraid to propose non cloud solutions when appropriate.” Sometimes there’s a resiliency argument in favor of retaining on-prem capabilities. Or, of course, there may be data residency or other requirements tilting the balance. The point is that mainframe cloud migration that isn’t conceived in hybrid terms is nothing less than a rash burning of one’s bridges. And a hybrid future, particularly when enabled by smooth and reliable data movement from mainframe to cloud, can deliver the best of both worlds in terms of performance and cost-effective spending.
Addressing technology infrastructure without accounting for a holistic MDM strategy
Defined by IBM as “a comprehensive process to drive better business insights by providing a single, trusted, 360-degree view into customer and product data across the enterprise,” master data management (MDM) is an important perspective to consider in any migration plan. After taking initial steps to move data or functions to the cloud, it quickly becomes apparent that having a comprehensive grasp of data, no matter where it is located, is vital. Indeed, a recent TDWI webinar dealt with exactly this topic, suggesting that multi-domain MDM can help “deliver information-rich, digitally transformed applications and cloud-based services.” So, without adaptable, cloud-savvy MDM, migrations can run into problems.
Assuming tape is the only way to back up mainframe data
Migration efforts that neglect to account for the mountains of data in legacy tape and VTL storage can be blindsided by how time consuming and difficult it can be to extract that data from the mainframe environment. This can throw a migration project off schedule or lead to business problems if backup patterns are interrupted or key data suddenly becomes less accessible. However, new technology makes extraction and movement much more feasible and the benefits of cloud data storage over tape in terms of automation, access, and simplicity are impressive.
Overlooking the value of historical data accumulated over decades
A cloud migration is, naturally, a very future-focused activity in which old infrastructure and old modes of working are put aside. In the process, organizations are sometimes tempted to leave some of their data archives out of the picture, either through simply shredding tapes no longer retained under a regulatory mandate or simply warehousing them. This is particularly true for older and generally less accessible elements. But for enterprises fighting to secure their future in a highly competitive world, gems of knowledge are waiting regarding every aspect of the business – from the performance and function of business units, the shop floor and workforce demographics to insights into market sectors and even consumer behavior. With cloud storage options, there are better fates for old data than gathering dust or a date with the shredder. Smart organizations recognize this fact and make a data migration strategy, the foundation for their infrastructure modernization efforts. The data hiding in the mainframe world, is truly an untapped resource that can now be exploited by cloud-based services.
Failure is not an option
Reviewing these five potential paths to failure in mainframe-cloud migration should not be misconstrued as an argument against cloud. Rather, it is intended to show the pitfalls to avoid. When considered carefully and planfully – and approached with the right tools and the right expectations – most organizations can find an appropriate path to the cloud.
Blame the genius that gave us the term “cloud” as shorthand for distributed computing. Clouds, in many languages and cultures, are equated with things ephemeral and states of mind that are dreamy or thoughts that are vague.
Well, cloud computing is none of those “cloud things.” It is the place where huge capital investments, the best thinking about reliability, and the leading developments in technology have come together to create a value proposition that is hard to ignore.
When it comes to reliability, as a distributed system – really a system of distributed systems – cloud accepts the inevitability of failure in individual system elements and in recompense, incorporates very high levels of resilience across the whole architecture.
For those counting nines (those reliability figures quoted as 99.xxx) there can be enormous comfort in the figures quoted by cloud providers. Those digging deeper, may find the picture to be less perfect in ways that make the trusty mainframe seem pretty wonderful. But the vast failover capabilities built into clouds, especially those operated by the so-called hyperscalers, is so immense as to be essentially unmatchable, especially when other factors are considered.
The relevance of this for mainframe operators is not about “pro or con.” Although some enterprises have taken the “all cloud” path, in general, few are suggesting the complete replacement of mainframe by cloud.
What is instead true, is that the cloud’s immense reliability – its ability to offer nearly turnkey capabilities in analytics and many other areas, and its essentially unlimited scalability – means it is the only really meaningful way to supplement mainframe core capabilities and in 2021 its growth is unabated.
Whether it is providing the ultimate RAID-like storage reliability across widely distributed physical systems to protect and preserve vital data or spinning up compute power to ponder big business (or tech) questions, cloud is simply unbeatable.
So, for mainframe operations, it is futile to try to “beat” cloud but highly fruitful to join – the mainframe + cloud combination is a winner.
Indeed, Gartner analyst Jeff Vogel, in a September 2020 report, “Cloud Storage Management Is Transforming Mainframe Data,” predicts that one-third of mainframe data (typically backup and archive) will reside in the cloud by 2025 — most likely a public cloud — compared to less than 5% at present – a stunning shift.
This change is coming. And it is great news for mainframe operators because it adds new capabilities and powers to what’s already amazing about mainframe. And it opens the doors to new options that have immense potential benefits for enterprises ready to take advantage of them.
Change is good – a familiar mantra, but one not always easy to practice. When it comes to moving toward a new way of handling data, mainframe organizations, which have earned their keep by delivering the IT equivalent of corporate-wide insurance policies (rugged, reliable, and risk-averse), naturally look with caution on new concepts like ELT — extract, load and transform.
Positioned as a lighter and faster alternative to more traditional data handling procedures such as ETL, (extract, transform and load), ELT definitely invites scrutiny. And that scrutiny can be worthwhile.
Definitions provided by SearchDataManagement.com say that ELT is “a data integration process for transferring raw data from a source server to a data system (such as a data warehouse or data lake) on a target server and then preparing the information for downstream uses.” In contrast, another source defines ETL as “three database functions that are combined into one tool to pull data out of one database and place it into another database.”
The crucial functional difference in those definitions is the exclusive focus on database-to-database transfer with ETL, while ELT is open-ended and flexible. To be sure, there are variations in ETL and ELT that might not fit those definitions but the point is that in the mainframe world ETL is a tool with a more limited focus, while ELT is focused on jump-starting the future.
While each approach has its advantages and disadvantages, let’s take a look as to why we think ETL is all wrong for mainframe data migration.
ETL is Too Complex
ETL was not originally designed to handle all the tasks it is now being asked to do. In the early days it was often applied to pull data from one relational structure and get it to fit in a different relational structure. This often included cleansing the data, too. For example, a traditional RDBMS can get befuddled by numeric data where it is expecting alpha data or by the presence of obsolete address abbreviations. So, ETL is optimized for that kind of painstaking, field-by-field data checking, `cleaning,’ and data movement, not so much for feeding a hungry Hadoop database or modern data lake. In short, ETL wasn’t invented to take advantage of all the ways data originates and all the ways it can be used in the 21st century.
ETL is Labor Intensive
All that RDBMS-to-RDBMS movement takes supervision and even scripting. Skilled DBAs are in demand and may not last at your organization. So, keeping the human part of the equation going can be tricky. In many cases, someone will have to come along and recreate their hand-coding or replace it whenever something new is needed.
ETL is a Bottleneck
Because the ETL process is built around transformation, everything is dependent on the timely completion of that transformation. However, with larger amounts of data in play (think, Big Data), this can make the needed transformation times inconvenient or impractical, turning ETL into a potential functional and computational bottleneck.
ETL Demands Structure
ETL is not really designed for unstructured data and can add complexity rather than value when asked to deal with such data. It is best for traditional databases but does not help much with the huge waves of unstructured data that companies need to process today.
ETL Has High Processing Costs
ETL can be especially challenging for mainframes because they generally incur MSU processing charges and can burden systems when they need to be handling real-time challenges. This stands in contrast to ELT which can be accomplished using mostly the capabilities of built-in zIIP engines, which cuts MSU costs, with additional processing conducted in a chosen cloud destination. In response to those high costs, some customers have taken the Transformation stage into the cloud to handle all kinds of data transformations, integrations, and preparations to support analytics and the creation of data lakes.
It would obviously be wrong to oversimplify a decision regarding the implementation of ETL or ELT, there are too many moving parts and too many decision points to weigh. However, what is crucial is understanding that rather than being focused on legacy practices and limitations, ELT speaks to most of the evolving IT paradigms. ELT is ideal for moving massive amounts of data. Typically the desired destination is the cloud and often a data lake, built to ingest just about any and all available data so that modern analytics can get to work. That is why ELT today is growing and why it is making inroads specifically in the mainframe environment. In particular, it represents perhaps the best way to accelerate the movement of data to the cloud and to do so at scale. That’s why ELT is emerging as a key tool for IT organizations aiming at modernization and at maximizing the value of their existing investments.
One of the great revelations for those considering new or expanded cloud adoption is the cost factor – especially with regard to storage. The received wisdom has long been that nothing beats the low cost of tape for long-term and mass storage.
In fact, though tape is still cheap, cloud options are getting very close such as with Amazon S3 Glacier Deep Archive, and offer tremendous advantages that tape can’t match. A case in point is Amazon S3 Intelligent-Tiering.
Tiering (also called hierarchical storage management or HSM) is not new. It’s been part of the mainframe world for a long time, but with limits imposed by the nature of the storage devices involved and the software. According to Amazon, Intelligent Tiering helps to reduce storage costs by up to 95 percent and now supports automatic data archiving. It’s a great way to modernize your mainframe environment by simply moving data to the cloud, even if you are not planning to migrate your mainframe to AWS entirely.
How does Intelligent-Tiering work? The idea is pretty simple. When objects are found to have been rarely accessed over long periods of time, they are automatically targeted for movement to less expensive storage tiers.
Migrate Mainframe to AWS
In the past (both in mainframes and in the cloud) you had to define a specific policy stating what needed to be moved to which tier and when, for example after 30 days or 60 days. The point with the new AWS tiering is that it automatically identifies what needs to be moved, when, and then moves it at the proper time. To migrate mainframe to Amazon S3 is no problem because modern data movement technology now allows you to move both historical and active data directly from tape or virtual tape to Amazon S3. Once there, auto-tiering can transparently move cold and long-term data to less expensive tiers.
This saves the trouble of needing to specifically define the rules. By abstracting the cost issue, AWS simplifies tiering and optimizes the cost without impacting the applications that read and write the data. Those applications can continue to operate under their usual protocols while AWS takes care of selecting the optimal storage tier. According to AWS, this is the first and, at the moment, the only cloud storage that delivers this capability automatically.
When reading from tape, the traditional lower tier for mainframe environments, recall times are the concern as the system has to deal with tape mount and search protocols. In contrast, Amazon S3 Intelligent-Tiering can provide a low millisecond latency as well as high throughput whether you are calling for data in the Frequent or Infrequent access tiers. In fact, Intelligent-Tiering can also automatically migrate the most infrequently used data to Glacier, the durable and extremely low-cost S3 storage class for data archiving and long-term backup. And with new technology allowing efficient and secure data movement over TCP/IP, getting mainframe data to S3 is even easier.
The potential impact on mainframe data practices
For mainframe-based organizations this high-fidelity tiering option could be an appealing choice compared with tape from both a cost and benefits perspective. However, the tape comparison is rarely that simple. For example, depending on the amount of data involved and the specific backup and/or archiving practices, any given petabyte of data needing to be protected may have to be copied and retained two or more times, which immediately makes tape seem a bit less competitive. Add overhead costs, personnel, etc., and the “traditional” economics may begin to seem even less appealing.
Tiering, in a mainframe context, is often as much about speed of access as anything else. So, in the tape world, big decisions have to be made constantly about what can be relegated to lower tiers and whether the often much-longer access times will become a problem after that decision has been made. But getting mainframe data to S3, where such concerns are no longer an issue, is now easy. Modern data movement technology means you can move your mainframe data in mainframe format directly to object storage in the cloud so it is available for restore directly from AWS.
Many mainframe organizations have years, even decades of data on tape. The management of this tape data is retained only in the tape management system. Or perhaps it was just copied forward from a prior tape system upgrade. How much of this data is really needed? Is it even usable anymore? To migrate mainframe to AWS, specifically this older data, allows management of the data in a modern way and can reduce the amount of tape data on-premises.
And what about those tapes that today are shipped off-site for storage and recovery purposes? Why not put that data on cloud storage for recovery anywhere?
For mainframe organizations interested in removing on-premise tape technology, reducing tape storage sizes, or creating remote backup copies, cloud options like Amazon S3 Intelligent Tiering can offer cost optimization that is better “tuned” to an organization’s real needs than anything devised manually or implemented on-premises. Furthermore, with this cloud-based approach, there is no longer any need to know your data patterns or think about tiering, it just gets done.
Best of all, you can now perform a stand-alone restore directly from cloud. This is especially valuable with ransomware attacks on the rise because there is no dependency on a potentially compromised system.
You can even take advantage of AWS immutable copies and versioning capabilities to further protect your mainframe data.
Of course, in order to take advantage of cloud storage like Amazon S3 Intelligent Tiering, you need to find a way to get your mainframe data out of its on-premises environment. Traditionally, that has presented a big challenge. But, as with multiplying storage options, the choices in data movement technology are also improving. For a review of new movement options, take a look at a discussion of techniques and technologies for Mainframe to Cloud Migration.
We recently looked at the topic of “Why Mainframe Data Management is Crucial for BI and Analytics” in an Analytics Insight article written by our CEO, Gil Peleg. Our conclusions, in brief, are that enterprises are missing opportunities when they allow mainframe data to stay siloed. And, while that might have been acceptable in the past, today data and analytics are very critical to achieving business advantage.
How did we get here? Mainframes are the rock on which many businesses built their IT infrastructure. However, while the rest of IT has galloped toward shared industry standards and even open architectures, mainframe has stood aloof and unmoved. It operates largely within a framework of proprietary hardware and software that does not readily share data. But with the revolutionary pace of change, especially in the cloud, old notions of scale and cost have been cast aside. As big and as powerful as mainframe systems are, there are things the cloud can now do better, and analytics is one of those things.
In the cloud no problem is too big. Effectively unlimited scale is available if needed and a whole host of analytic tools like Kibana, Splunk and Snowflake, have emerged to better examine not only structured data but also unstructured data, which abounds in mainframes.
Cloud tools have proven their worth on “new” data, yielding extremely important insights. But those insights could be enhanced, often dramatically, if mainframe data, historic and current, were made available in the same way or, better yet combined – for instance in modern cloud-based data lakes.
It turns out that most organizations have had a good excuse for not liberating their data: It has been a difficult and expensive task. For example, mainframe data movement, typically described as “extract, transform, and load” (ETL), requires intensive use of mainframe computing power. This can interfere with other mission-critical activities such as transaction processing, backup, and other regularly scheduled batch jobs. Moreover, mainframe software vendors typically charge in “MSUs” which roughly correlate with CPU processing loads.
This is not a matter of “pie in the sky” thinking. Technology is available now to address and reform this process. Now, mainframe data can be exported, loaded, and transformed to any standard format in a cloud target. There, it can be analyzed using any of a number of tools. And this can be done as often as needed. What is different about this ELT process is the fact that it is no longer so dependent on the mainframe. It sharply reduces MSU charges by accomplishing most of the work on built-in zIIP engines, which are a key mainframe component and have considerable processing power.
What does all this mean? It means data silos can be largely a thing of the past. It means an organization can finally get at all its data and can monetize that data. It means opening the door to new business insights, new business ideas, and new business applications.
An incidental impact is that there can be big cost savings in keeping data in the cloud in storage resources that are inherently flexible (data can move from deep archive to highly accessible quickly) rather than on-premises. And, of course, no capital costs – all operational expenses. Above all, though, this provides freedom. No more long contracts, mandatory upgrades, services, staff, etc. In short, it’s a much more modern way of looking at mainframe storage.
Introducing object storage terminology and concepts – and how to leverage cost-effective cloud data management for mainframe
Object storage is coming to the mainframe. It’s the optimal platform for demanding backup, archive, DR, and big-data analytics operations, allowing mainframe data centers to leverage scalable, cost-effective cloud infrastructures.
For mainframe personnel, object storage is a new language to speak. It’s not complex, just a few new buzzwords to learn. This paper was written to introduce you to object storage, and to assist in learning the relevant terminology. Each term is compared to familiar mainframe concepts. Let’s go!
What is Object Storage?
Object storage is a computer data architecture in which data is stored in object form – as compared to DASD, file/NAS storage and block storage. Object storage is a cost-effective technology that makes data easily accessible for large-scale operations, such as backup, archive, DR, and big-data analytics and BI applications.
IT departments with mainframes can use object storage to modernize their mainframe ecosystems and reduce dependence on expensive, proprietary hardware, such as tape systems and VTLs.
Let’s take a look at some basic object storage terminology (and compare it to mainframe lingo):
- Objects. Object storage contains objects, which are also known as blobs. These are analogous to mainframe data sets.
- Buckets. A bucket is a container that hosts zero or more objects. In the mainframe realm, data sets are hosted on a volume – such as a tape or DASD device.
Data Sets vs. Objects – a Closer Look
As with data sets, objects contain both data and some basic metadata describing the object’s properties, such as creation date and object size. Here is a table with a detailed comparison between data set and object attributes:
The object attributes described below are presented as defined in AWS S3 storage systems.
Volumes vs. Buckets – a Closer Look
Buckets, which are analogous to mainframe volumes, are unlimited in size. Separate buckets are often deployed for security reasons, and not because of performance limitations. A bucket can be assigned a life cycle policy that includes automatic tiering, data protection, replication, and automatic at-rest encryption.
The bucket attributes described below are presented as defined in AWS S3 storage systems.
In the z/OS domain, a SAF user and password are required, as well as the necessary authorization level for the volume and data set. For example, users with ALTER access to a data set can perform any action – read/write/create/delete.
In object storage, users are defined in the storage system. Each user is granted access to specific buckets, prefixes, objects, and separate permissions are defined for each action, for example:
In addition, each user can be associated with a programmatic API key and API secret in order to access the bucket and the object storage via a TCP/IP-based API. When accessing data in the cloud, HTTPS is used to encrypt the in-transit stream. When accessing data on-premises, HTTP can be used to avoid encryption overhead. If required, the object storage platform can be configured to perform data-at-rest encryption.
Disaster Recovery Considerations
While traditional mainframe storage platforms such as tape and DASD rely on full storage replication, object storage supports both replication and erasure coding. Erasure coding provides significant savings in storage space, as the data can be spread over multiple geographical locations. For example, on AWS, data is automatically spread across a minimum of 3 geographical locations, thus providing multi-site redundancy and disaster recovery from anywhere in the world. Erasure-coded buckets can also be fully replicated to another region, as is practiced with traditional storage. Most object storage platforms support both synchronous and asynchronous replication.
Model9 – Connecting Object Storage to the Mainframe
Model9’s Cloud Data Manager for Mainframe is a software-only platform that leverages powerful, scalable cloud-based object storage capabilities for data centers that operate mainframes.
The platform runs on the mainframe’s zIIP processors, providing cost-efficient storage, backup, archive, and recovery functionalities with an easy-to-use interface that requires no object-storage knowledge or skills.
For mainframe shops that need to move data on or off the mainframe, whether to the cloud or to an alternative on-premises destination, FICON, the IBM mainstay for decades, is generally seen as the standard, and with good reason. When it was first introduced in 1998 it was a big step up from its predecessor ESCON that had been around since the early 1990s. Comparing the two was like comparing a firehose to a kitchen faucet.
FICON is fast, in part, because it runs over Fibre Channel in an IBM proprietary form defined by ANSI FC-SB-3 Single-Byte Command Code Sets-3 Mapping Protocol for Fibre Channel (FC) protocol. In that schema it is a FC layer 4 protocol. As a mainframe protocol it is used on IBM Systems Z to handle both DASD and tape I/O. It is also supported by other vendors of disk and tape storage and switches designed for the IBM environment.
Over time, IBM has increased speeds and added features such as High Performance FICON, without significantly enhancing the disk and tape protocols that traverse over it; meaning these limitations on data movement remain. For this reason, the popularity and a long-history of FICON does not make it the answer for every data movement challenge.
Stuck in the Past
One challenge, of particular concern today, is that mainframe secondary storage is still being written to tape via tape protocols, whether it is real physical tape or virtual tape emulating actual tape. With tape as a central technology, it implies dealing with tape mount protocols and tape management software to maintain where datasets reside on those miles of Mylar. The serial nature of tape and limitations of the original hardware required large datasets to often span multiple tape images.
Though virtual tapes written to DASD improved the speed of writes and recalls, the underlying protocol is still constrained by tape’s serialized protocols. This implies waiting for tape mounts and waiting for I/O cycles to complete before next data can be written. When reading back, the system must traverse through the tape image to find the specific dataset requested. In short, while traditional tape may have its virtues, speed – the 21st century speed of modern storage – is not among them. Even though tape and virtual tape is attached via FICON, the process of writing and recalling data relies on the underlying tape protocol for moving data, thus making FICON attached less-than-ideal for many modern use cases.
Faster and Better
But there is an alternative that doesn’t rely on tape or emulate tape because it does not have to.
Instead, software generates multiple streams of data from a source and pushes data over IBM Open Systems Adapter (OSA) cards using TCP/IP in an efficient and secure manner to an object storage device, either on premise or in the cloud. The Open Systems Adapter functions as a network controller that supports many networking transport protocols, making it a powerful helper for this efficient approach to data movement. Importantly, as an open standard, OSA is developing faster than FICON. For example, with the IBM z15 there is already a 25GbE OSA-Express7S card, while FICON is still at 16Gb with the FICON Express16 card.
While there is a belief common among many mainframe professionals that OSA cards are “not as good as FICON,” that is simply not true when the necessary steps are taken to optimize OSA throughput.
To achieve better overall performance, the data is captured well before tape handling, thus avoiding the overhead of tape management, tape mounts, etc. Rather than relying on serialized data movement, this approach breaks apart large datasets and sends them across the wire in simultaneous chunks, while also pushing multiple datasets at a time. Data can be compressed prior to leaving the mainframe and beginning its journey, reducing the amount of data that would otherwise be written. Dataset recalls and restores are also compressed and use multiple streams to ensure quick recovery of data from the cloud.
Having the ability to write multiple streams further increases throughput and reduces latency issues. In addition, compression on the mainframe side dramatically reduces the amount of data sent over the wire. If software is also designed to run on zIIP engines within the mainframe, data discovery and movement as well backup and recovery workloads will consume less billable MIPS and TCP/IP cycles also benefit.
This approach delivers mainframe data to cloud storage, including all dataset types and historical data, in a quick and efficient manner. And this approach can also transform mainframe data into standard open formats that can be ingested by BI and Analytics off of the mainframe itself, with a key difference. When data transformation occurs on the cloud side, no mainframe MIPS are used to transform the data. This allows for the quick and easy movement of complete datasets, tables, image copies, etc. to the cloud, then makes all data available to open applications by transforming the data on the object store.
A modern, software-based approach to data movement means there is no longer a need to go to your mainframe team to update the complex ETL process on the mainframe side.
To address the problem of hard-to-move mainframe data, this software-based approach provides the ability to readily move mainframe data and, if desired, readily transform it to common open formats. This data transformation is accomplished on the cloud side, after data movement is complete, which means no MF resources are required to transform the data.
- Dedicated software quickly discovers (or rediscovers) all data on the mainframe. Even with no prior documentation or insights, Model9 can rapidly assemble and map the data to be moved, expediting both modernization planning and data movement.
- Policies are defined to move either selected data sets or all data sets automatically, reducing oversight and management requirements dramatically as compared to other data movement methods.
- For the sake of simplicity, a software approach can be designed to invoke actions via a RESTful API, or a management UI, as well as from the Mainframe side via a traditional batch or command line,
- A software approach can also work with targets both on premises or in the cloud.
In summary, a wide-range of useful features can make data movement with a software-based approach intuitive and easy. By avoiding older FICON and tape protocols, a software-based approach can push mainframe data over TCP/IP to object storage in a secure and efficient manner, making it the answer to modern mainframe data movement challenges!
At Model9, our position in the industry gives us a unique vantage point on what is happening across both the mainframe and cloud world. Clearly, while everyone expects the movement to the cloud to continue, with Gartner even going so far as to suggest that around one-third of mainframe storage will be in the cloud by 2025, there are some glitches and trends that we find interesting.
The unifying theme is the need to do more with data and do it at the lowest cost possible. Cloud costs are often extremely low in comparison with traditional on-premises costs, while also offering innumerable advantages in terms of data mobility and cost-effective analytics. The big challenge is data movement “up” from on-premises repositories to those cloud opportunities.
Rediscovering Mainframe Value
Looking ahead, it seems that mainframe organizations are finally realizing that successful digital transformation requires adopting modern technology solutions, taking some risks, and not just relying on professional services firms to force feed the process. That means these same organizations will be looking to optimize their investment in MF instead of simply migrating off the platform. They need a way to optimize the value of their investments by integrating mainframe and cloud rather than attempting the risky path of migrating completely away from the mainframe. Moving data and enhancing analytics is a great place to start. The goal will be first, to seek to leverage cloud analytics and BI tools for their MF data and, second, to leverage cloud technologies to simplify daily processes and reduce IT costs.
Speeding Past ETL
Last year’s BMC survey discussed the continued support for mainframes even as cloud storage continues to grow…We have heard tales of woe from individuals at some well-known independent companies that were built around the expectation of a substantial and rapid mainframe-to-cloud transition. The problem they face is that traditional data movement (extract, transform, load –ETL) processes are slow and expensive (by comparison with newer extract, load and transform – ELT), contributing to slower than expected movement to the cloud, and perhaps even driving some fear, uncertainty, and doubt amongst mainframe operators about the path ahead. With Gartner pointing to the compelling case for cloud storage, we expect more mainframe shops to look beyond the “same old” movement strategies in the year ahead to try something new. Certainly, there is no longer any question about the capabilities of these options — and again, Gartner has pointed this out in their report.
A Superior Data Lake
Another thing we definitely see from customers is a move to create and/or grow bigger and better data lakes. Data lakes are almost always synonymous with cloud storage. The economics are compelling and the analytic options, equally appealing.
Analysts are predicting as much as a 29 percent compound annual growth rate (CAGR) for data lake implementation over the next five years. We also see that organizations want all their data in the data lake, so they can run comprehensive analytics, with AI, ML and every other tool of the trade. That means, if at all possible they want to include mainframe data, which is often critical for understanding customers and the business as a whole. And that means wrestling with the challenges of data movement in a more effective way than in the past. It is almost like getting organizations to make the leap from “sneaker net” (the old days when companies transferred big files by moving floppy disks or portable drives) to actual high-bandwidth networks. The leap in data movement and transformation today is equally dramatic.
The Competitive Edge Provided by Data
As more and more companies do succeed in sharing their mainframe data within a data lake, the superiority of outcomes in terms of analytic insight is likely to create a real competitive advantage that will force other companies to take the same path. It’s a sort of corollary to the data gravity theory. In this case, when others move their data, you may be forced by competitive pressure to do the same thing.
Without that critical spectrum of mainframe information, a data lake can easily become what some are terming a Data Swamp — something that consumes resources but provides little real value. In the year ahead, this view of data gravity should resonate more with decision-makers and will start to inform their strategic decisions.
Multicloud Becomes Mainstream
In the early days of cloud, as competition began to ramp up and as more and more companies migrated work to the cloud, the use of multi-cloud became controversial. Pundits worried on the one hand about vendor lock-in and on the other about complexity. It turns out, customers tell us they often need multicloud. Just as on-premises operations often depend on multiple hardware and software vendors, different clouds have different strengths and customers seem to be simply picking the best-of-breed for a specific purpose. Fortunately, data movement between mainframe and cloud is getting easier!
That leads us to think multicloud will no longer be treated as exceptional or controversial and instead, organizations will focus on making the most of it.
No matter how the Covid crisis and the global economy evolve over the next year, the cloud-mainframe relationships will be dynamic – and interesting to watch. The “new oil” of data will continue to influence us all and will put a premium on getting storage out of static storage and into circulation where it can be monetized. We wish our customers the best as they navigate these waters!