Four Steps to Performant Analytics Using Data on the Mainframe
Mainframes have long established themselves as the workhorses of transaction processing in enterprises around the globe. Far from their mischaracterization as legacy systems, modern mainframes are full-fledged participants in modern hybrid IT architectures.
Extracting the full value of corporate data in cloud analytics applications when 80% of it—much of it the most critical information an enterprise collects—remains on the mainframe, continues to be a challenge for the simple reason that BI tools only leverage data that resides in the cloud in open formats.
Fortunately, there are modern approaches for delivering business value from BI and analytics in the cloud that can quickly and cost-effectively analyze the sum total of data collected in both mainframe and open formats.
Running a mainframe analytics program doesn’t solve the problem
Enterprises have been extracting value from data on mainframes for decades, of course. However, the programs that perform such operations are quite long in the tooth and lack many modern features that today’s executives expect.
These mainframe programs also run in batch mode, delivering reports when runs are complete (remember the green and white striped paper?) As a result, the information they deliver is always out of date.
The biggest drawback to mainframe analytics programs is perhaps the most obvious: they only deal with data on the mainframe, leaving business users scratching their heads about how mainframe and non-mainframe data relate to each other.
Some improvement: ETL on the mainframe, loading the result into a data warehouse
Extract, transform, and load (ETL) has long been the standard approach for populating data into a data warehouse. Data warehouses expect input data to have a particular structure and format, and the ETL step prepares the data accordingly.
There are numerous drawbacks to the ETL approach when extracting data from mainframes. It uses expensive mainframe processing, and like mainframe-based analytics programs, is also batch-oriented. As a result, data in the warehouse is never even close to current.
In fact, the costs can be so prohibitive that the ETL process can only cost-effectively transfer a subset of available mainframe data. The result: frequent, often complex requests for the mainframe to produce new data for ETL – taking even more time and running up the bill.
Migrating all relevant data off the mainframe still doesn’t provide sufficient value
If the transform step in ETL is too slow and expensive, how about simply skip it? Run a bulk data migration from the mainframe to a database or data warehouse somewhere else.
Such an approach might work for cold data – that is, historical data on tape or virtual tape. In such cases, the use case may be to modernize the backup and restore capabilities for such data. Without the hotter transactional data, however, such extraction is likely too incomplete to support many analytics use cases.
In any case, mainframe data typically reside in proprietary, often archaic formats. It is thus impossible for modern analytics applications to process mainframe data unless the organization has transformed them into standard formats first.
Migrating transactional data is also processor-intensive (hence slow and expensive), and the quantity of data you’re transferring is likely to bog down the network. Such migration is also quite slow and expensive.
In practice, bulk data migration off the mainframe is a task organizations only tackle once, in situations where they’re retiring the mainframe. Such migration thus doesn’t adequately support analytics requirements, especially for organizations that continue to value the platform.
Maximizing the value of mainframe data: ELT using Model9
The primary difference between data warehouses and data lakes is when the data transformation step takes place.
Data warehouses expect the traditional ETL approach, where the operator transforms the data according to the needs of the warehouse before loading it. As a result, the specifics of the transformation step fail to prioritize the shifting needs of the business.
Data lakes, in contrast, rely upon an extract-load-transform process. The operator loads relevant data in an untransformed state into the lake. The transformation step, in contrast to data warehouses, takes place on the fly when a business user or application requests information from the data lake.
The resulting higher performance, near real-time analytics, maximizes the value the organization can extract from mainframe data – especially when combined with data from different locations with different levels of structure.
Applying an ELT approach to mainframe data is not without its challenges, however. They are the same as those of migrating data off the mainframe wholesale: such migration is processor-intensive and slow.
Model9 has addressed these challenges. With Model9, the mainframe’s System z Integrated Information Processor (zIIP) engine extracts both hot data (transactional data in data sets and databases) as well as cold data (tape and virtual tape-based data) faster and far less expensively than traditional mainframe data migration would normally entail.
Model9 then compresses and encrypts all the mainframe data in its original format for transport (again using zIIP engines), lightening the load on the network and improving the performance of the transfer.
Model9 transfers this compressed mainframe data to cloud-based object storage, and then transforms it, without the mainframe being involved at all, into open formats in the cloud for use in cloud applications (instead of transforming the data on the mainframe).
For analytics use cases, Model9 loads the data into the data lake. Because Model9 transfers all relevant data, the data lake need not return to the mainframe to fetch additional historical data in response to new queries.
The end result: optimized analytics performance across mainframe and non-mainframe data with low mainframe CPU costs, thus creating a truly hybrid data source that optimizes the business value of cloud analytics applications.
The Intellyx Take
From the business user’s perspective, it doesn’t matter where the enterprise stores or processes its data. The data have business value, and the point of analytics is to extract that value.
Supporting this business-centric view of corporate data has always presented challenges to the IT organization, given the diversity of platforms, database engines, data structures, and data formats that such companies have to deal with on a daily basis.
When the mainframe is in the mix, organizations risk adversely impacting their ability to conduct business. As a result, it’s essential for the data team to take a proactive approach to delivering high performance analytics capabilities leveraging mainframe data using tools like Model9.