Introduction
Being able to efficiently wrangle raw big data into hidden insights and trends, through the power of business intelligence, is an essential skill for any data-driven organization. You may have heard of ETL for data integration—but have you heard of its cousin ELT (extract, load, transform)?
What is ELT exactly, and how does ELT work? What are the benefits of ELT, and what about the issue of ETL vs. ELT? Below, we answer all these questions and more.
What is ELT (Extract, Load, Transform)?
The traditional ETL process consists of three steps: extract, transform, and load. This has been the dominant paradigm for data integration since ETL was born in the 1970s.
However, recent technological advances in storage space and compute power have enabled an alternate ELT approach. As the name suggests, ELT (extract, load, transform) reverses the order of the load and transformation stages. In this section, we'll break down the three stages of the ELT process In more detail.
Extract
The data extraction phase of ELT is essentially the same as in ETL: raw data is first extracted from one or more source systems. The data sources in ELT may include:
- SQL relational databases and NoSQL (non-relational) databases
- Flat files (e.g. CSV, XML, JSON, Excel spreadsheets, etc.)
- SaaS platforms such as CRM (customer relationship management) and ERP (enterprise resource planning) software
- APIs (application programming interfaces)
- System logs and metadata
Whether they're on-premises, in the cloud, or from a third party, the data sets and source databases you use during the ELT extraction phase hold valuable information—which is why your choice of ELT solution must be able to efficiently pull information from many source locations.
Load
The second phase is where ETL and ELT start to diverge. While ETL doesn't load data until the final stage of the process, loading data is the second stage of ELT, immediately after data extraction.
ETL and ELT differ not only in the order of data loading, but also in the target system:
- With ETL, the destination of data loading is typically a cloud data warehouse such as Azure SQL Data Warehouse, Google BigQuery, Snowflake, or Amazon Redshift.
- With ELT, the target data can also be stored in a data lake (although some organizations use ELT for the staging tables in their data warehousing solution). A data lake is a solution for long-term data storage that is capable of holding large amounts of unrefined, unstructured data.
Transform
Data transformation is the last stage of ELT before the data is taken up by business intelligence and analytics workloads. With ELT, data is transformed in-place on an as-needed basis, rather than as a prerequisite step before data loading.
During the transformation process, unstructured data (with a variety of data types and data formats) is transformed into structured data. In addition, data that is already structured is also converted to make it easier for BI and analytics tools to work with. Just a few types of data transformations are:
- Data cleansing: Improving data quality by removing inaccurate, incomplete, or irrelevant data.
- Data aggregation: Gathering data in aggregate and creating new data or presenting it in a new format. For example, you can calculate your total annual sales revenue by adding up the revenue from each deal closed by each of your sales representatives.
- Data filtering: Selecting only certain fields, rows, or columns from a dataset that are relevant for future data analysis.
- Data joining: Combining information from multiple data sources.
- Data validation: Ensuring that your source data does not have empty, null, or corrupted elements.
ETL vs ELT: What Are the Differences Between ETL and ELT?
Thus far, we've discussed only the mechanical differences between ETL and ELT: simply put, these two data integration processes flip the order of the loading and transformation stages. In this section, we'll dive a little deeper into the differences between ELT and ETL, and how this order reversal has wide-ranging implications for the question of ETL vs. ELT.
1. Availability
ETL is a much more mature, widely used data integration process than ELT. Whereas ETL has been used for decades, ELT has only been possible relatively recently, thanks to technological advances.
This difference between ETL and ELT means that it's easier to find ETL tools and ETL experts than it is to do the same for ELT. However, this is gradually changing, as many ETL tools have added ELT functionality as well.
2. Target System
ETL is most often used with OLAP (online analytical processing) database and data warehouse destinations, because the data has already been transformed to fit the target schema. However, ETL is also compatible with data lakes, which can hold either unstructured or structured data.
With ELT, it's much more likely that your target system will be a data lake, built on a distributed file system such as Hadoop. ELT is compatible with any data format: structured, semi-structured, unstructured, and raw data. The data in your data lake is transformed into structured data on an ad hoc basis whenever it is required for your BI and analytics workloads. However, some organizations use ELT to push structured data into a data warehouse, and then perform transformations in-place.
3. Complexity
Data transformation is the most technically complex stage of ETL and ELT, but the way in which these transformations are carried out differs between the two methods. Instead of having an intermediate staging area for transformed data before loading, as occurs with ETL, ELT performs transformations directly on the data inside the target data lake.
ELT also tends to be less technically complex than ETL on the whole. This is because transformation is the last stage in the data pipeline, so ELT can be less selective about the data it extracts and loads into the target location. On the other hand, ETL needs to know in advance which data sources it will pull from and which data transformations will be performed, since the data discarded during ETL will not be available in the target system.
4. Regulatory Compliance
Using ELT can present issues for compliance with data privacy and data security regulations such as GDPR, CCPA, and HIPAA. These laws and regulations often require you to mask, redact, or encrypt sensitive and confidential information before sending it to a data store in the cloud.
However, by default, ELT performs data transformations only after loading. To comply with these regulations, you may need to perform some preliminary data processing to protect personally identifiable information (PII) before loading it into a target system. This ELT variation is sometimes called "ETLT" (extract, transform, load, transform).
What Are the Benefits of ELT?
The differences between ELT and ETL mean that ELT has its own advantages and use cases, which are often distinct from ETL.
ETL and ELT share the benefits of better data management and smarter business decision-making. Regardless of when you perform data transformations, making your enterprise data available for BI and analytics helps organizations achieve better and more accurate decision-making, predicting, and forecasting. In addition, formally defining your data integration processes through ETL or ELT is a crucial part of good data management and data governance, helping to improve data quality, predictability, and consistency.
However, there are some benefits of ELT that are distinct from the benefits of ETL. The advantages of ELT include:
- Flexibility: Unlike ETL, which must work with structured data, ELT is capable of working with any kind of data format: structured, semi-structured, and unstructured. Data transformations are more dynamic, applied ad hoc for BI and analytics purposes only on the data you need. This makes ELT a good choice if you don't know in advance which data you'll be using, or what kind of analytical queries to perform. Meanwhile, ETL processes are much more rigid and less flexible if you want to make changes to your data sources, transformations, or queries.
- Speed: At large scales, the first two stages of ELT (extraction and loading) are much faster than data transformation. Data in ETL processes needs to go through the bottleneck transformation stage before being loaded into the target data warehouse. ELT is a good choice for very large quantities of data that are arriving too quickly to consistently perform data transformations before loading.
Given these advantages of flexibility and speed, ELT may be the right choice in the following situations:
- Organizations that handle diverse types of data, including structured, semi-structured, and unstructured data.
- Organizations that prioritize speed, especially for real-time streaming use cases.
- Organizations that have enough storage space and compute power to handle ELT workloads. ELT typically requires more storage than ETL because it does not reduce the size of the data before loading.
Of course, whether you're working with ETL or ELT, the benefits of an automated data integration process can't be overstated. By automatically integrating your data into a centralized location, ELT and ETL improve scalability and eliminate the need for human workers to perform time- and labor-intensive data integration manually. Automated ETL and ELT solutions are much more cost-effective, letting your employees focus on higher-level revenue-generating activities.