Category: Data analytics
For decades, mainframe data tasks have regularly included ETL – extract, transform and load – as a key step on the road to insights. Indeed, ETL has been the standard process for copying data from any given source into a destination application or system. ETL got a lot of visibility with the rise in data warehouse operations but was often a bottleneck in those same data warehouse projects.
Today, ETL is still the default choice for data movement, especially in the mainframe. But there is a legitimate alternative – ELT – extract, load, and transform.
As the reshuffling of terms implies, ELT takes a much different approach, first extracting data from wherever it currently resides and then loading it, generally to a target outside of the mainframe. It is there, wherever that “there” is, that the hard work of transform happens, typically as a prelude to the application of analytics.
So, ELT is an acronym, but one that’s pretty revolutionary.
Why? By reframing the idea of ETL with the technologies of today, the entire process has the potential to be faster, easier, and less expensive because it can use the most appropriate and cost-effective resources. Not just the mainframe CPU.
ELT tends to require less maintenance than ETL, which typically has many requirements for manual, ad hoc intervention and management. In contrast, ELT is based on automated, cloud-based processing. Similarly, ELT loads more quickly, since transformation is closely linked to the ultimate cloud-based analysis work. ELT, then, is primarily concerned with getting data from mainframe to the cloud. Finally, of course, it is usually faster overall. And, because it depends primarily on pay-as-you-go cloud resources rather than on the billing structure of the mainframe, it is generally less expensive.
ELT empowers the routine and regular movement of mainframe operational and archived data from expensive and slow tape and VTL to storage environments that are both fast and highly cost-effective, such as AWS S3 Tiered Storage. ELT can also deliver data directly for transformation to standard formats in the cloud – and then make that data available to data lakes and other modern BI and analytics tools. Because ELT retains its original format and structure, the options for how the data can be used (transformed) in the cloud are practically unlimited.
The key to ELT on the mainframe is, of course, zIIP engines, the helpful processing capability provided by IBM for handling exactly this kind of `non-critical’ activity. It’s just that no one tried before.
With zIIP help and TCP/IP to assist in movement, buried data sets can be liberated from mainframe data silos and deliver real monetary value. What’s more, companies that have tried ELT have discovered how easy it is to move mainframe data. They can more easily take advantage of cloud storage economics –potentially eliminating bulky and expensive tape and VTL assets.For these many good reasons, ELT is `NJAA,’ not just another acronym – it’s an acronym worth getting to know.
Vendors are scrambling to deliver modern analytics to act on streams of real-time mainframe data. There are good reasons for attempting this activity, but they may actually be missing the point or at least missing a more tantalizing opportunity.
Real-time data in mainframes comes mostly from transaction processing. No doubt, spotting a sudden national spike in cash withdrawals from a bank’s ATM systems or an uptick in toilet paper sales in the retail world may have significance beyond the immediate “signal” to reinforce cash availability and reorder from a paper goods supplier. These are the kinds of things real-time apostles rave about when they tout the potential for running mainframe data streams through Hadoop engines and similar big data systems.
What’s missed, however, is the fact that mainframe systems have been quietly accumulating data points just like this for decades. And where mainframe data can be most valuable is in supporting analytics across the time axis. Looking at when similar demand spikes have happened over time and their duration and repetition offers the potential to predict them in the future and can hint at the optimal ways to respond and their broader meaning.
Furthermore, for most enterprises, a vast amount of real-time data exists outside the direct purview of mainframe: think about the oceans of IoT information coming from machinery and equipment, real-time sensor data in retail, and consumer data floating around in the device universe. Little of this usually gets to the mainframe. But it is this data, combined with mainframe data that is not real-time (but sometimes near-real-time), that may have the greatest potential as a font of analytic insight, according to a recent report.
To give mainframes the power to participate in this analytics bonanza requires some of the same nostrums being promoted by the “real-time” enthusiasts but above all requires greatly improving access to older mainframe data, typically resident on tape or VTL.
The optimal pattern here should be rescuing archival and non-real-time operational data from mainframe storage and sharing it with on-prem or cloud-based big data analytics in a data lake. This allows the mainframe to continue doing what it does best while providing a tabula rasa for analyzing the widest range and largest volume of data.
Technology today can leverage the too-often unused power of zIIP engines to facilitate data movement inside the mainframe and help it get to new platforms for analytics (ensuring necessary transformation to standard formats along the way).
It’s a way to make the best use of data and the best use of mainframe in its traditional role while ensuring the very best in state-of-the-art analytics. This is a far more profound opportunity than simply dipping into the flow or real-time data in the mainframe. It is based on a fuller appreciation of what data matters and how data can be used. And it is the path that mainframe modernizers will ultimately choose to follow.
We recently looked at the topic of “Why Mainframe Data Management is Crucial for BI and Analytics” in an Analytics Insight article written by our CEO, Gil Peleg. Our conclusions, in brief, are that enterprises are missing opportunities when they allow mainframe data to stay siloed. And, while that might have been acceptable in the past, today data and analytics are very critical to achieving business advantage.
How did we get here? Mainframes are the rock on which many businesses built their IT infrastructure. However, while the rest of IT has galloped toward shared industry standards and even open architectures, mainframe has stood aloof and unmoved. It operates largely within a framework of proprietary hardware and software that does not readily share data. But with the revolutionary pace of change, especially in the cloud, old notions of scale and cost have been cast aside. As big and as powerful as mainframe systems are, there are things the cloud can now do better, and analytics is one of those things.
In the cloud no problem is too big. Effectively unlimited scale is available if needed and a whole host of analytic tools like Kibana, Splunk and Snowflake, have emerged to better examine not only structured data but also unstructured data, which abounds in mainframes.
Cloud tools have proven their worth on “new” data, yielding extremely important insights. But those insights could be enhanced, often dramatically, if mainframe data, historic and current, were made available in the same way or, better yet combined – for instance in modern cloud-based data lakes.
It turns out that most organizations have had a good excuse for not liberating their data: It has been a difficult and expensive task. For example, mainframe data movement, typically described as “extract, transform, and load” (ETL), requires intensive use of mainframe computing power. This can interfere with other mission-critical activities such as transaction processing, backup, and other regularly scheduled batch jobs. Moreover, mainframe software vendors typically charge in “MSUs” which roughly correlate with CPU processing loads.
This is not a matter of “pie in the sky” thinking. Technology is available now to address and reform this process. Now, mainframe data can be exported, loaded, and transformed to any standard format in a cloud target. There, it can be analyzed using any of a number of tools. And this can be done as often as needed. What is different about this ELT process is the fact that it is no longer so dependent on the mainframe. It sharply reduces MSU charges by accomplishing most of the work on built-in zIIP engines, which are a key mainframe component and have considerable processing power.
What does all this mean? It means data silos can be largely a thing of the past. It means an organization can finally get at all its data and can monetize that data. It means opening the door to new business insights, new business ideas, and new business applications.
An incidental impact is that there can be big cost savings in keeping data in the cloud in storage resources that are inherently flexible (data can move from deep archive to highly accessible quickly) rather than on-premises. And, of course, no capital costs – all operational expenses. Above all, though, this provides freedom. No more long contracts, mandatory upgrades, services, staff, etc. In short, it’s a much more modern way of looking at mainframe storage.
With a global pandemic-induced downturn disrupting economies and whole industries, it has rarely been more important to get “bang for your buck.” Making the most of mainframe data is an excellent example of doing just that. By adopting modern data movement tools, cutting-edge analytics, and low capex cloud resources, organizations can do much more with less – quickly gaining vital insights that can help protect or grow business and/or potentially shaving mainframe costs through reduced MSUs and reduced storage hardware.
Data warehouses were a big step forward when they began to be more widely adopted some 20-30 years ago. But they were expensive and resource-intensive, particularly the extract-transform-load (ETL) process by which disparate and sometimes poorly maintained data was pumped into them.
By contrast, in the same period, data analytics have been undergoing revolution on top of revolution outside of the mainframe world. That’s been particularly so in the cloud where scalability, when needed, is ideal for accommodating periodic or occasional analytic exercises, without incurring heavy capital or operational costs. It is also where some of the most useful analytics tools are at home.
Hadoop, the big data star of recent years, is famous for finding value in even very unstructured data and has helped change the analytic paradigm, which is now rich with AI and machine-learning options for assessing data. Hadoop and other contemporary analytic tools can also digest the kind of structured data that exists in most mainframe applications. So, it would be ideal if one could simply take all that critical mainframe data and let tools like Hadoop look for valuable nuggets hidden within.
Although technically possible to run Hadoop on Mainframe, most organizations choose to run Hadoop off the MF because of challenges, particularly in the areas of data governance, data ingestion and cost.
In fact, getting mainframe data into Hadoop in a form that can be processed has been very challenging – and expensive. For example, mainframe data could be in EBCDIC form, possibly compressed, rather than the more widely used ASCII. COBOL Copybooks have their own peculiarities as do DB2 and IMS databases and VSAM files.
Fortunately, Model9 has been finding ways to unlock and relocate this badly needed data. Using an extract-load-transform process that is much faster and easier than ETL (as it doesn’t require mainframe CPU cycles). Model9’s patented technology connects the mainframe directly over TCP/IP to cloud storage chosen by the customer. And it translates all that mainframe data into standard forms, widely used in the cloud. And from there, the analytical choices are numerous.
Best of all, because you can move data back to the mainframe as needed just as easily, Model9 can even eliminate the need for virtual tape libraries and physical tapes.
But the reward that comes from liberating data is probably even more crucial – especially as companies around the globe struggle to make sense of the rapidly changing business conditions and emerging opportunities of 2020 and beyond.
In modern analytics, significant value can be gained from insights that are based on multiple data sources. That’s the power of the data lake concept. But for most larger organizations, unaware that there are easy data movement options, data lakes still exist far from the organization’s most important and often largest data collection—the data in mainframe storage.
Whether this data is already at home in a mainframe data warehouse or scattered in multiple databases and information stores, its absence from the data lakes is a tremendous problem.
In fact, in an Information Age article, Data Storage & Data Lakes, editor Nick Ismail noted, “If an organization excludes difficult to access legacy data stores, such as mainframes, as they build the data lake, there is a big missed opportunity.”
Recognizing this growing business challenge, Model9, a company founded by mainframe experts and cloud evangelists, created unique, patented technology that can move mainframe data and transform it to and from standard industry formats and between the cloud and mainframe. Specifically, Model9 eliminates the traditional ETL process, which is expensive in terms of time, money and CPU cycles, and delivers richer outcomes with fewer resources.
In other words, Model9 helps get mainframe data into the game.
Unlike traditional brute force methods of moving data to other platforms, requiring heavy use of mainframe processing power, Model9 does most of the work outside of the mainframe, a process of extract, load, and transform (ELT) rather than extract, transform, and load (ETL). It is fast and economical.
Is it really that easy? Yes. The Model9 architecture includes a zIIP-eligible agent running on z/OS and a management server running in a Docker container on Linux, z/Linux, or zCX. The agent does the job of reading and writing mainframe data from DASD or tape directly to cloud storage over TCP/IP using DFDSS as the underlying data mover. Other standard z/OS data management services are also used by the agent, such as system catalog integration, SMS policy compliance, and RACF authorization controls. Compression and encryption are performed either using zEDC and CryptoExpress cards if available, or using zIIP engines.
Although the world of DB2 tools on mainframe has made a lot of progress in integration with other SQL databases by using CDC technology, this remains an expensive approach and one that does not scale optimally. In contrast, the Model9 Image Copy transformation offering is an industry-first lift & shift solution for DB2 on mainframes.
Additionally, Model9 offers migration capabilities for unstructured mainframe data types such as VSAM, PS, and PO as well as support for COBOL copy books, delivering end-to-end process automation.
Presto, organizations can now easily share mainframe data with analytic tools and incorporate it into data lakes, potentially yielding powerful insights. Model9 offers data lakes a chance to reach their fullest potential and makes mainframe pros into business partners!