Don’t Lose your Digital Transformation Energy to Data Gravity
An Intellyx BrainBlog by Jason English, for Model9
The limited resources of our modern world have placed a premium value on energy, especially for powering our homes and buildings. Government and business leaders talk constantly about ‘transforming the power grid’ and finding more sustainable, reliable, and cost efficient sources of electricity.
The initiatives always sound like great ideas, and technologies for harnessing alternative sources like wind and solar have indeed come a long way. So why do so many such projects still encounter limitations?
Our challenges go beyond our methods for generating electricity at its source. We overlook the difficulty and cost of transforming, moving and holding that potential energy closest to its point of utility in batteries or elsewhere – the transmission and storage cost.
From a technology perspective, data is the resource that fuels a modern enterprise – and parallels between the difficulties of moving energy and moving enterprise data are becoming all too clear.
The inertia of data gravity prevents us from exploiting the true value of all of our enterprise data for transformative change, just when we need it most.
When the lift-and-shift of ETL isn’t strong enough
ETL stands for Extract, Transform and Load. For as long as we have been exporting data off of one core enterprise system and importing it to another for business intelligence, transactional and compliance reasons, ETL was the expected norm for making that happen.
In a world where batch reconciliation processes and all-night backups off the mainframe were the norm, the ETL approach served enterprises well for decades – until it didn’t.
The rise of service-based architectures and on-demand cloud infrastructure raised expectations for faster delivery of new application features that can rapidly scale and respond in an instant to meet business demands.
A new space of cloud-based data lakes and warehouses started filling up and providing constant firehoses of data for powering advanced applications like AI-based fraud detection, inference-driven customer recommendations and analytics.
There are hundreds of ETL tools that can be used to lift-and-shift chunks of legacy data from one system to another – but in this real-time application world, it is quickly becoming a constraint to progress.
Until now, many overlooked a critical flaw in modernization goals: the time and cost of moving so much data from mainframes, silos and sources of data – to where it will be productively exploited.
Reframing data gravity on the mainframe
Data has gravity. Therefore, we want to keep it as close as possible to the application functionality it serves.
Fortunately, the community of mainframe system vendors like IBM are already thinking about this, introducing zIIP capabilities on the mainframe and new zSeries systems that allow distributed workloads to run on excess mainframe capacity as well as bursting workloads cloud infrastructure.
For instance, a bank could put its AI-based fraud detection inference workloads right next to the mainframe’s transaction processing engine, so it is ready to act as a watchdog and block any settlement that violates policy, or even seems suspicious, in an instant.
There’s still a problem though – how do you train that AI engine with enough data to detect fraud in the first place? Financial crooks and cybercriminals are constantly thinking up new schemes for gaming financial systems in search of a payout, and the sheer volume of transactions that need to be checked is increasing rapidly.
In AI/ML scenarios, there is no way to know the exact logical outcomes the historical training data will produce, so you may need to transport petabytes of data to cover all eventualities. Reserving the needed capacity for so much ML work in conventional databases and servers, or on the mainframe, could be cost and performance prohibitive.
The transfer time for sequentially queued blocks of data over variable connection speeds to different regions in AWS, Azure, GCS, IBM Cloud and other public and private cloud storage infrastructures represents yet another hurdle to productivity.
Rather than referring to remote data sources, Machine learning (ML) processes should naturally be conducted in the cloud where compute capacity is elastic, and where a constantly updated set of ML data is stored in adjacent S3 buckets, data lakes and cloud data warehouses like Snowflake and Redshift.
Parallel preparation through ELT
If ETL isn’t good enough, how can we make the heavy lift of high-gravity backup and mainframe data viable for analytics and ML in the cloud – efficiently, and at low cost, and in bulk?
One approach taken by Model9 in their Gravity product is to flip the script on ETL and make it ELT (extract-load-transform), which uses on-premise agents running on the mainframe’s zIIP engines to push data – without any transformation but in massively parallel chunks – to the cloud. Model9 then spins up abundant and cheaper cloud resources to transform proprietary data formats into open formats like JSON or CSV for seamless ingestion into a data lake.
Once the analysis or AI workload in the cloud produces a valuable result, that nugget of modeling insight or ML training dataset can be pushed back to the mainframe via ONX, where it can contribute to the intelligence of a real-time estimation or inferencing engine.
Respect the power of data gravity on both ends
There are already several great examples of companies who have leveraged the ELT paradigm to lessen the burden of data gravity at both ends of the mainframe-to-cloud equation for business value.
One large US transportation company had exactly such a logjam that limited it to scheduling ETL imports of only 20 DB2 tables a day from its mainframe directly to Snowflake. Using Model9 Gravity, they were able to parallelize this transport to move 2,000 DB2 tables into AWS in off-hours every night for cloud analytics work in Snowflake, meaning they could plan against a complete business model every day.
In another financial services company, they were spending more than 4 hours a night manually managing the consolidation of transaction data in order to move it over to AWS for training of their fraud detection models. That’s not enough breathing room for updating and testing fraud profiling AIs between operating hours.
They were able to get the ELT process to move data next to their AWS SageMaker engine to under 30 minutes, thereby returning several productive hours of updated fraud protection ML routines back to the mainframe.
The Intellyx Take
Like energy services, some see the transport of data over the Internet as a utility, when in reality the end-to-end transfer of data from core systems to cloud for valuable use is governed by more than network bandwidth and the speed of light.
Companies that fail to respect data gravity may experience transport delays and conflicts, and shy away from truly leveraging all of the rich historical and current data available on the mainframe for powering the business transformative analysis and machine intelligence capabilities we can leverage in the cloud.
©2022 Intellyx LLC. Intellyx is solely responsible for the content of this document. At the time of writing, Model9 is an Intellyx client. Photo credit: Client licensed from iStock.