Category: Mainframe data
Mainframe modernization is a top priority for forward-thinking I&O leaders who don’t want to be left behind by rapid technological change. But moving applications to the cloud has proven to be difficult and risky, which can lead to analysis-paralysis or initiatives that run over budget and fail. This is slowing the pace of adaptation, starving cloud functions of access to mainframe data, and often inhibiting any positive action at all.
So, at the moment, most mainframe data is effectively siloed, cut off from BI or AI cloud applications and data lakes in the cloud and simply locked in a “keep the lights on” mentality that is dangerous if continued too long.
Part of the problem is that mainframe organizations have focused on an application-first approach to cloud engagement which is usually the wrong approach because cost and complexity get in the way. Leaders should instead take a data-first approach that allows them to begin modernization by liberating mainframe data and moving storage and even their backup and recovery functions to the cloud. This has the benefit of making mainframe data immediately accessible in the cloud without requiring any mainframe application changes.
Why Is Mainframe Modernization So Hard?
The mainframe environment has been aptly described as a walled garden. Everything inside that garden works well, but the wall makes this Eden increasingly irrelevant to the rest of the world. And the isolation keeps the garden from reaching its full potential.
The walled garden is a result of the inherent nature of mainframe technology, which has evolved apart from the cloud and other on-prem environments. This means the architecture is fundamentally different, making a so-called lift-and-shift move to the cloud very difficult. Applications built for mainframe must stay on the mainframe and adapting them to other environments is often prohibitive. At an even more fundamental level, mainframe data is stored in forms that are incomprehensible to other environments.
How does Model9 Cloud Data Manager for Mainframe Work?
While mainframe lift-and-shift strategies may be very challenging, the movement of data to the cloud has suddenly gotten much easier thanks to Model9 Cloud Data Manager for Mainframe, which represents a fresh technology direction.
Our patented and proven technology takes mainframe data and moves it quickly and easily to the cloud and can then transform it in the cloud to almost any industry-standard form.
With Model9, mainframe data is first moved to an object storage target in a public cloud or a private cloud . The process is vendor agnostic and eliminates most of the traditional costs associated with mainframe ETL because it leverages zIIP engines to handle movement (over TCP/IP) and accomplishes the “transform” step in the cloud, without incurring MSU costs.
This can work with any mainframe data but is especially helpful for historic data and any data resident on tape or virtual tape, normally hard to access even for the mainframe itself.
The result is backup, archiving, and recovery options in the cloud that are cost-effective, faster, and easier to access than in traditional on-prem systems. And, Model9 has almost no impact on existing mainframe applications and operations. It is a data-first approach that allows you to transition mainframe data into the cloud with a software only solution
The Benefits Of Model9’s Data-first Approach
By focusing on the simpler task of moving mainframe data first, organizations gain multiple advantages including:
- Cost Reduction by reducing or eliminating the tape or VTL hardware footprint and associated mainframe software (DFSMShsm etc.), as well as reducing MSU charges
- Cloud can deliver a full data protection solution that can provide security and, “recover anywhere” capability.
- Cloud-based transformation immediately unlocks mainframe data for use in cloud applications.
- Cloud can also yield performance improvements such as reduced backup windows, reduced peak processing demand, and reduced system overhead.
Data First Works
Data-first mainframe modernization empowers leaders to broaden their cloud adoption strategy and secure more immediate benefits. It can accelerate cloud migration projects by leveraging non-mainframe skills and delivering simplified data movement. Organizations can readily maintain and access mainframe data after migrating to the cloud to meet corporate and regulatory requirements without the traditional high costs.
In addition, a data-first approach reduces the burden on your mainframe by offloading data to external platforms while empowering analytics and new applications in the cloud for some of your most valuable data.
According to Gartner, with new approaches to data and cloud, ‘Through 2024, 40% of enterprises will have reduced their storage and data protection costs by over 70% by implementing new storage technologies and deployment methods.’
Best of all, a data-driven approach allows organizations to combine the power of the mainframe with the scalability of the cloud and modernize on their own terms.
For decades, mainframe data tasks have regularly included ETL – extract, transform and load – as a key step on the road to insights. Indeed, ETL has been the standard process for copying data from any given source into a destination application or system. ETL got a lot of visibility with the rise in data warehouse operations but was often a bottleneck in those same data warehouse projects.
Today, ETL is still the default choice for data movement, especially in the mainframe. But there is a legitimate alternative – ELT – extract, load, and transform.
As the reshuffling of terms implies, ELT takes a much different approach, first extracting data from wherever it currently resides and then loading it, generally to a target outside of the mainframe. It is there, wherever that “there” is, that the hard work of transform happens, typically as a prelude to the application of analytics.
So, ELT is an acronym, but one that’s pretty revolutionary.
Why? By reframing the idea of ETL with the technologies of today, the entire process has the potential to be faster, easier, and less expensive because it can use the most appropriate and cost-effective resources. Not just the mainframe CPU.
ELT tends to require less maintenance than ETL, which typically has many requirements for manual, ad hoc intervention and management. In contrast, ELT is based on automated, cloud-based processing. Similarly, ELT loads more quickly, since transformation is closely linked to the ultimate cloud-based analysis work. ELT, then, is primarily concerned with getting data from mainframe to the cloud. Finally, of course, it is usually faster overall. And, because it depends primarily on pay-as-you-go cloud resources rather than on the billing structure of the mainframe, it is generally less expensive.
ELT empowers the routine and regular movement of mainframe operational and archived data from expensive and slow tape and VTL to storage environments that are both fast and highly cost-effective, such as AWS S3 Tiered Storage. ELT can also deliver data directly for transformation to standard formats in the cloud – and then make that data available to data lakes and other modern BI and analytics tools. Because ELT retains its original format and structure, the options for how the data can be used (transformed) in the cloud are practically unlimited.
The key to ELT on the mainframe is, of course, zIIP engines, the helpful processing capability provided by IBM for handling exactly this kind of `non-critical’ activity. It’s just that no one tried before.
With zIIP help and TCP/IP to assist in movement, buried data sets can be liberated from mainframe data silos and deliver real monetary value. What’s more, companies that have tried ELT have discovered how easy it is to move mainframe data. They can more easily take advantage of cloud storage economics –potentially eliminating bulky and expensive tape and VTL assets.For these many good reasons, ELT is `NJAA,’ not just another acronym – it’s an acronym worth getting to know.
Vendors are scrambling to deliver modern analytics to act on streams of real-time mainframe data. There are good reasons for attempting this activity, but they may actually be missing the point or at least missing a more tantalizing opportunity.
Real-time data in mainframes comes mostly from transaction processing. No doubt, spotting a sudden national spike in cash withdrawals from a bank’s ATM systems or an uptick in toilet paper sales in the retail world may have significance beyond the immediate “signal” to reinforce cash availability and reorder from a paper goods supplier. These are the kinds of things real-time apostles rave about when they tout the potential for running mainframe data streams through Hadoop engines and similar big data systems.
What’s missed, however, is the fact that mainframe systems have been quietly accumulating data points just like this for decades. And where mainframe data can be most valuable is in supporting analytics across the time axis. Looking at when similar demand spikes have happened over time and their duration and repetition offers the potential to predict them in the future and can hint at the optimal ways to respond and their broader meaning.
Furthermore, for most enterprises, a vast amount of real-time data exists outside the direct purview of mainframe: think about the oceans of IoT information coming from machinery and equipment, real-time sensor data in retail, and consumer data floating around in the device universe. Little of this usually gets to the mainframe. But it is this data, combined with mainframe data that is not real-time (but sometimes near-real-time), that may have the greatest potential as a font of analytic insight, according to a recent report.
To give mainframes the power to participate in this analytics bonanza requires some of the same nostrums being promoted by the “real-time” enthusiasts but above all requires greatly improving access to older mainframe data, typically resident on tape or VTL.
The optimal pattern here should be rescuing archival and non-real-time operational data from mainframe storage and sharing it with on-prem or cloud-based big data analytics in a data lake. This allows the mainframe to continue doing what it does best while providing a tabula rasa for analyzing the widest range and largest volume of data.
Technology today can leverage the too-often unused power of zIIP engines to facilitate data movement inside the mainframe and help it get to new platforms for analytics (ensuring necessary transformation to standard formats along the way).
It’s a way to make the best use of data and the best use of mainframe in its traditional role while ensuring the very best in state-of-the-art analytics. This is a far more profound opportunity than simply dipping into the flow or real-time data in the mainframe. It is based on a fuller appreciation of what data matters and how data can be used. And it is the path that mainframe modernizers will ultimately choose to follow.
Change is good – a familiar mantra, but one not always easy to practice. When it comes to moving toward a new way of handling data, mainframe organizations, which have earned their keep by delivering the IT equivalent of corporate-wide insurance policies (rugged, reliable, and risk-averse), naturally look with caution on new concepts like ELT — extract, load and transform.
Positioned as a lighter and faster alternative to more traditional data handling procedures such as ETL, (extract, transform and load), ELT definitely invites scrutiny. And that scrutiny can be worthwhile.
Definitions provided by SearchDataManagement.com say that ELT is “a data integration process for transferring raw data from a source server to a data system (such as a data warehouse or data lake) on a target server and then preparing the information for downstream uses.” In contrast, another source defines ETL as “three database functions that are combined into one tool to pull data out of one database and place it into another database.”
The crucial functional difference in those definitions is the exclusive focus on database-to-database transfer with ETL, while ELT is open-ended and flexible. To be sure, there are variations in ETL and ELT that might not fit those definitions but the point is that in the mainframe world ETL is a tool with a more limited focus, while ELT is focused on jump-starting the future.
While each approach has its advantages and disadvantages, let’s take a look as to why we think ETL is all wrong for mainframe data migration.
ETL is Too Complex
ETL was not originally designed to handle all the tasks it is now being asked to do. In the early days it was often applied to pull data from one relational structure and get it to fit in a different relational structure. This often included cleansing the data, too. For example, a traditional RDBMS can get befuddled by numeric data where it is expecting alpha data or by the presence of obsolete address abbreviations. So, ETL is optimized for that kind of painstaking, field-by-field data checking, `cleaning,’ and data movement, not so much for feeding a hungry Hadoop database or modern data lake. In short, ETL wasn’t invented to take advantage of all the ways data originates and all the ways it can be used in the 21st century.
ETL is Labor Intensive
All that RDBMS-to-RDBMS movement takes supervision and even scripting. Skilled DBAs are in demand and may not last at your organization. So, keeping the human part of the equation going can be tricky. In many cases, someone will have to come along and recreate their hand-coding or replace it whenever something new is needed.
ETL is a Bottleneck
Because the ETL process is built around transformation, everything is dependent on the timely completion of that transformation. However, with larger amounts of data in play (think, Big Data), this can make the needed transformation times inconvenient or impractical, turning ETL into a potential functional and computational bottleneck.
ETL Demands Structure
ETL is not really designed for unstructured data and can add complexity rather than value when asked to deal with such data. It is best for traditional databases but does not help much with the huge waves of unstructured data that companies need to process today.
ETL Has High Processing Costs
ETL can be especially challenging for mainframes because they generally incur MSU processing charges and can burden systems when they need to be handling real-time challenges. This stands in contrast to ELT which can be accomplished using mostly the capabilities of built-in zIIP engines, which cuts MSU costs, with additional processing conducted in a chosen cloud destination. In response to those high costs, some customers have taken the Transformation stage into the cloud to handle all kinds of data transformations, integrations, and preparations to support analytics and the creation of data lakes.
It would obviously be wrong to oversimplify a decision regarding the implementation of ETL or ELT, there are too many moving parts and too many decision points to weigh. However, what is crucial is understanding that rather than being focused on legacy practices and limitations, ELT speaks to most of the evolving IT paradigms. ELT is ideal for moving massive amounts of data. Typically the desired destination is the cloud and often a data lake, built to ingest just about any and all available data so that modern analytics can get to work. That is why ELT today is growing and why it is making inroads specifically in the mainframe environment. In particular, it represents perhaps the best way to accelerate the movement of data to the cloud and to do so at scale. That’s why ELT is emerging as a key tool for IT organizations aiming at modernization and at maximizing the value of their existing investments.
We recently looked at the topic of “Why Mainframe Data Management is Crucial for BI and Analytics” in an Analytics Insight article written by our CEO, Gil Peleg. Our conclusions, in brief, are that enterprises are missing opportunities when they allow mainframe data to stay siloed. And, while that might have been acceptable in the past, today data and analytics are very critical to achieving business advantage.
How did we get here? Mainframes are the rock on which many businesses built their IT infrastructure. However, while the rest of IT has galloped toward shared industry standards and even open architectures, mainframe has stood aloof and unmoved. It operates largely within a framework of proprietary hardware and software that does not readily share data. But with the revolutionary pace of change, especially in the cloud, old notions of scale and cost have been cast aside. As big and as powerful as mainframe systems are, there are things the cloud can now do better, and analytics is one of those things.
In the cloud no problem is too big. Effectively unlimited scale is available if needed and a whole host of analytic tools like Kibana, Splunk and Snowflake, have emerged to better examine not only structured data but also unstructured data, which abounds in mainframes.
Cloud tools have proven their worth on “new” data, yielding extremely important insights. But those insights could be enhanced, often dramatically, if mainframe data, historic and current, were made available in the same way or, better yet combined – for instance in modern cloud-based data lakes.
It turns out that most organizations have had a good excuse for not liberating their data: It has been a difficult and expensive task. For example, mainframe data movement, typically described as “extract, transform, and load” (ETL), requires intensive use of mainframe computing power. This can interfere with other mission-critical activities such as transaction processing, backup, and other regularly scheduled batch jobs. Moreover, mainframe software vendors typically charge in “MSUs” which roughly correlate with CPU processing loads.
This is not a matter of “pie in the sky” thinking. Technology is available now to address and reform this process. Now, mainframe data can be exported, loaded, and transformed to any standard format in a cloud target. There, it can be analyzed using any of a number of tools. And this can be done as often as needed. What is different about this ELT process is the fact that it is no longer so dependent on the mainframe. It sharply reduces MSU charges by accomplishing most of the work on built-in zIIP engines, which are a key mainframe component and have considerable processing power.
What does all this mean? It means data silos can be largely a thing of the past. It means an organization can finally get at all its data and can monetize that data. It means opening the door to new business insights, new business ideas, and new business applications.
An incidental impact is that there can be big cost savings in keeping data in the cloud in storage resources that are inherently flexible (data can move from deep archive to highly accessible quickly) rather than on-premises. And, of course, no capital costs – all operational expenses. Above all, though, this provides freedom. No more long contracts, mandatory upgrades, services, staff, etc. In short, it’s a much more modern way of looking at mainframe storage.
With a global pandemic-induced downturn disrupting economies and whole industries, it has rarely been more important to get “bang for your buck.” Making the most of mainframe data is an excellent example of doing just that. By adopting modern data movement tools, cutting-edge analytics, and low capex cloud resources, organizations can do much more with less – quickly gaining vital insights that can help protect or grow business and/or potentially shaving mainframe costs through reduced MSUs and reduced storage hardware.
Data warehouses were a big step forward when they began to be more widely adopted some 20-30 years ago. But they were expensive and resource-intensive, particularly the extract-transform-load (ETL) process by which disparate and sometimes poorly maintained data was pumped into them.
By contrast, in the same period, data analytics have been undergoing revolution on top of revolution outside of the mainframe world. That’s been particularly so in the cloud where scalability, when needed, is ideal for accommodating periodic or occasional analytic exercises, without incurring heavy capital or operational costs. It is also where some of the most useful analytics tools are at home.
Hadoop, the big data star of recent years, is famous for finding value in even very unstructured data and has helped change the analytic paradigm, which is now rich with AI and machine-learning options for assessing data. Hadoop and other contemporary analytic tools can also digest the kind of structured data that exists in most mainframe applications. So, it would be ideal if one could simply take all that critical mainframe data and let tools like Hadoop look for valuable nuggets hidden within.
Although technically possible to run Hadoop on Mainframe, most organizations choose to run Hadoop off the MF because of challenges, particularly in the areas of data governance, data ingestion and cost.
In fact, getting mainframe data into Hadoop in a form that can be processed has been very challenging – and expensive. For example, mainframe data could be in EBCDIC form, possibly compressed, rather than the more widely used ASCII. COBOL Copybooks have their own peculiarities as do DB2 and IMS databases and VSAM files.
Fortunately, Model9 has been finding ways to unlock and relocate this badly needed data. Using an extract-load-transform process that is much faster and easier than ETL (as it doesn’t require mainframe CPU cycles). Model9’s patented technology connects the mainframe directly over TCP/IP to cloud storage chosen by the customer. And it translates all that mainframe data into standard forms, widely used in the cloud. And from there, the analytical choices are numerous.
Best of all, because you can move data back to the mainframe as needed just as easily, Model9 can even eliminate the need for virtual tape libraries and physical tapes.
But the reward that comes from liberating data is probably even more crucial – especially as companies around the globe struggle to make sense of the rapidly changing business conditions and emerging opportunities of 2020 and beyond.
In modern analytics, significant value can be gained from insights that are based on multiple data sources. That’s the power of the data lake concept. But for most larger organizations, unaware that there are easy data movement options, data lakes still exist far from the organization’s most important and often largest data collection—the data in mainframe storage.
Whether this data is already at home in a mainframe data warehouse or scattered in multiple databases and information stores, its absence from the data lakes is a tremendous problem.
In fact, in an Information Age article, Data Storage & Data Lakes, editor Nick Ismail noted, “If an organization excludes difficult to access legacy data stores, such as mainframes, as they build the data lake, there is a big missed opportunity.”
Recognizing this growing business challenge, Model9, a company founded by mainframe experts and cloud evangelists, created unique, patented technology that can move mainframe data and transform it to and from standard industry formats and between the cloud and mainframe. Specifically, Model9 eliminates the traditional ETL process, which is expensive in terms of time, money and CPU cycles, and delivers richer outcomes with fewer resources.
In other words, Model9 helps get mainframe data into the game.
Unlike traditional brute force methods of moving data to other platforms, requiring heavy use of mainframe processing power, Model9 does most of the work outside of the mainframe, a process of extract, load, and transform (ELT) rather than extract, transform, and load (ETL). It is fast and economical.
Is it really that easy? Yes. The Model9 architecture includes a zIIP-eligible agent running on z/OS and a management server running in a Docker container on Linux, z/Linux, or zCX. The agent does the job of reading and writing mainframe data from DASD or tape directly to cloud storage over TCP/IP using DFDSS as the underlying data mover. Other standard z/OS data management services are also used by the agent, such as system catalog integration, SMS policy compliance, and RACF authorization controls. Compression and encryption are performed either using zEDC and CryptoExpress cards if available, or using zIIP engines.
Although the world of DB2 tools on mainframe has made a lot of progress in integration with other SQL databases by using CDC technology, this remains an expensive approach and one that does not scale optimally. In contrast, the Model9 Image Copy transformation offering is an industry-first lift & shift solution for DB2 on mainframes.
Additionally, Model9 offers migration capabilities for unstructured mainframe data types such as VSAM, PS, and PO as well as support for COBOL copy books, delivering end-to-end process automation.
Presto, organizations can now easily share mainframe data with analytic tools and incorporate it into data lakes, potentially yielding powerful insights. Model9 offers data lakes a chance to reach their fullest potential and makes mainframe pros into business partners!