For decades, mainframe data tasks have regularly included ETL – extract, transform and load – as a key step on the road to insights. Indeed, ETL has been the standard process for copying data from any given source into a destination application or system. ETL got a lot of visibility with the rise in data warehouse operations but was often a bottleneck in those same data warehouse projects.
Today, ETL is still the default choice for data movement, especially in the mainframe. But there is a legitimate alternative – ELT – extract, load, and transform.
As the reshuffling of terms implies, ELT takes a much different approach, first extracting data from wherever it currently resides and then loading it, generally to a target outside of the mainframe. It is there, wherever that “there” is, that the hard work of transform happens, typically as a prelude to the application of analytics.
So, ELT is an acronym, but one that’s pretty revolutionary.
Why? By reframing the idea of ETL with the technologies of today, the entire process has the potential to be faster, easier, and less expensive because it can use the most appropriate and cost-effective resources. Not just the mainframe CPU.
ELT tends to require less maintenance than ETL, which typically has many requirements for manual, ad hoc intervention and management. In contrast, ELT is based on automated, cloud-based processing. Similarly, ELT loads more quickly, since transformation is closely linked to the ultimate cloud-based analysis work. ELT, then, is primarily concerned with getting data from mainframe to the cloud. Finally, of course, it is usually faster overall. And, because it depends primarily on pay-as-you-go cloud resources rather than on the billing structure of the mainframe, it is generally less expensive.
ELT empowers the routine and regular movement of mainframe operational and archived data from expensive and slow tape and VTL to storage environments that are both fast and highly cost-effective, such as AWS S3 Tiered Storage. ELT can also deliver data directly for transformation to standard formats in the cloud – and then make that data available to data lakes and other modern BI and analytics tools. Because ELT retains its original format and structure, the options for how the data can be used (transformed) in the cloud are practically unlimited.
The key to ELT on the mainframe is, of course, zIIP engines, the helpful processing capability provided by IBM for handling exactly this kind of `non-critical’ activity. It’s just that no one tried before.
With zIIP help and TCP/IP to assist in movement, buried data sets can be liberated from mainframe data silos and deliver real monetary value. What’s more, companies that have tried ELT have discovered how easy it is to move mainframe data. They can more easily take advantage of cloud storage economics –potentially eliminating bulky and expensive tape and VTL assets.For these many good reasons, ELT is `NJAA,’ not just another acronym – it’s an acronym worth getting to know.
Change is good – a familiar mantra, but one not always easy to practice. When it comes to moving toward a new way of handling data, mainframe organizations, which have earned their keep by delivering the IT equivalent of corporate-wide insurance policies (rugged, reliable, and risk-averse), naturally look with caution on new concepts like ELT — extract, load and transform.
Positioned as a lighter and faster alternative to more traditional data handling procedures such as ETL, (extract, transform and load), ELT definitely invites scrutiny. And that scrutiny can be worthwhile.
Definitions provided by SearchDataManagement.com say that ELT is “a data integration process for transferring raw data from a source server to a data system (such as a data warehouse or data lake) on a target server and then preparing the information for downstream uses.” In contrast, another source defines ETL as “three database functions that are combined into one tool to pull data out of one database and place it into another database.”
The crucial functional difference in those definitions is the exclusive focus on database-to-database transfer with ETL, while ELT is open-ended and flexible. To be sure, there are variations in ETL and ELT that might not fit those definitions but the point is that in the mainframe world ETL is a tool with a more limited focus, while ELT is focused on jump-starting the future.
While each approach has its advantages and disadvantages, let’s take a look as to why we think ETL is all wrong for mainframe data migration.
ETL is Too Complex
ETL was not originally designed to handle all the tasks it is now being asked to do. In the early days it was often applied to pull data from one relational structure and get it to fit in a different relational structure. This often included cleansing the data, too. For example, a traditional RDBMS can get befuddled by numeric data where it is expecting alpha data or by the presence of obsolete address abbreviations. So, ETL is optimized for that kind of painstaking, field-by-field data checking, `cleaning,’ and data movement, not so much for feeding a hungry Hadoop database or modern data lake. In short, ETL wasn’t invented to take advantage of all the ways data originates and all the ways it can be used in the 21st century.
ETL is Labor Intensive
All that RDBMS-to-RDBMS movement takes supervision and even scripting. Skilled DBAs are in demand and may not last at your organization. So, keeping the human part of the equation going can be tricky. In many cases, someone will have to come along and recreate their hand-coding or replace it whenever something new is needed.
ETL is a Bottleneck
Because the ETL process is built around transformation, everything is dependent on the timely completion of that transformation. However, with larger amounts of data in play (think, Big Data), this can make the needed transformation times inconvenient or impractical, turning ETL into a potential functional and computational bottleneck.
ETL Demands Structure
ETL is not really designed for unstructured data and can add complexity rather than value when asked to deal with such data. It is best for traditional databases but does not help much with the huge waves of unstructured data that companies need to process today.
ETL Has High Processing Costs
ETL can be especially challenging for mainframes because they generally incur MSU processing charges and can burden systems when they need to be handling real-time challenges. This stands in contrast to ELT which can be accomplished using mostly the capabilities of built-in zIIP engines, which cuts MSU costs, with additional processing conducted in a chosen cloud destination. In response to those high costs, some customers have taken the Transformation stage into the cloud to handle all kinds of data transformations, integrations, and preparations to support analytics and the creation of data lakes.
It would obviously be wrong to oversimplify a decision regarding the implementation of ETL or ELT, there are too many moving parts and too many decision points to weigh. However, what is crucial is understanding that rather than being focused on legacy practices and limitations, ELT speaks to most of the evolving IT paradigms. ELT is ideal for moving massive amounts of data. Typically the desired destination is the cloud and often a data lake, built to ingest just about any and all available data so that modern analytics can get to work. That is why ELT today is growing and why it is making inroads specifically in the mainframe environment. In particular, it represents perhaps the best way to accelerate the movement of data to the cloud and to do so at scale. That’s why ELT is emerging as a key tool for IT organizations aiming at modernization and at maximizing the value of their existing investments.