Introduction
You’re almost certainly familiar with the ETL (extract, transform, load) process for data integration: data is first extracted from one or more sources, transformed to improve data quality and fit the target schema, and finally loaded into the destination (usually a data warehouse or data lake).
ETL has been around for decades in some form or fashion, along with its cousin ELT, which switches the order by performing in-place transformations on the loaded data. Recently, however, we’ve seen a new concept emerge: reverse ETL.
Given the rigidity of the ETL process, the idea of “reverse ETL” might not make much sense: how can you load data before you’ve even extracted it in the first place? In fact, the term “reverse ETL” is a bit of a misnomer—so what is reverse ETL exactly, and how does reverse ETL work? We’ll answer these questions and more in this all-in-one reverse ETL guide.
What is Reverse ETL? How Does Reverse ETL Work?
“Reverse ETL” is a newly coined term for a common problem: trying to get data out of your centralized data warehouse and into third-party systems.
With reverse ETL, what’s being reversed is not the order of operations itself, but rather the traditional ETL sources and targets. In standard ETL, you pull from sources such as databases and SaaS applications, and load this data into a unified repository. In reverse ETL, however, this repository acts as the source for the ETL process, and your third-party systems are the target.
Perhaps one of the first occurrences of the phrase “reverse ETL” is in the article “Reverse ETL — A Primer” by Astasia Myers, an investor at Redpoint Ventures. She defines reverse ETL as “the process of moving data from a data warehouse into third-party systems to make data operational.”
The third-party systems that are the target of reverse ETL are typically SaaS applications such as CRM (customer relationship management) software or sales and marketing platforms. Robust, sophisticated tools such as Salesforce, Marketo, HubSpot, and Zendesk come with their own data analytics capabilities—but without access to the full suite of your enterprise data, any insights they can glean will necessarily be limited in scope.
More and more, companies are finding that their centralized data warehouse, which they imagined would improve access to information, is actually having the opposite effect. This is because the ETL process is one-way: data flows from the SaaS sources to the target data warehouse, but does not flow in the opposite direction.
The goal of reverse ETL is to close this gap and facilitate access to data and insights for everyone. In reverse ETL, the data warehouse serves as the “single source of truth” for your organization, propagating information to other locations as need be to support data analytics and queries.
Beyond switching the data sources and targets, the structure of reverse ETL looks very much like its standard counterpart. Information is first extracted from the data source (here a centralized data warehouse or data lake), transformed to fit the target schema, and loaded into a third-party system or SaaS application.
Why Use Reverse ETL?
Organizations have long been struggling to get information out of their data warehouses before the idea of reverse ETL came on the scene. The main problem is this: while traditional ETL tools may have many connectors and integrations to transfer data into the data warehouse, the same is not true for transferring data out.
Unfortunately, there was little in the way of pre-built solutions prior to the rise of “reverse ETL” as its own concept. Instead, organizations usually had to build their own manual API connectors for each destination, which had several drawbacks. First of all, even with the right technical skills in-house, manually constructing an API connector can take multiple days or even weeks, taking up valuable time and effort. API endpoints also need to be maintained over time due to their brittle and inflexible nature, which can change without prior notice.
Reverse ETL casts aside this time-consuming and technically challenging paradigm in favor of a different approach: creating automated, out-of-the-box connectors and integrations between the source data repository and the target third-party systems. In so doing, you can automatically pull and validate information from your centralized data warehouse, and send it exactly when and where it needs to go.
Reverse ETL: Benefits and Use Cases
With Reverse ETL, organizations can patch a long-standing hole in their enterprise data management, allowing for the free flow of information away from your data warehouse. As you can imagine, there is a wide range of potential applications for reverse ETL. Most commonly, reverse ETL is used to enable nimbler operational analytics, helping to improve the efficiency of your day-to-day operations.
Just a few of benefits and use cases of reverse ETL solutions include:
- Enhanced customer relationship management: CRM software helps sales, marketing, and support teams do their jobs better by recording all of your interactions with customers and potential customers. While the data in your CRM system can be tremendously useful for plumbing insights and powering data-driven decision-making, it can be even more helpful when enhanced with other information from your data warehouse. Reverse ETL can transfer this data to CRM systems such as Salesforce, enabling smarter queries and analyses about which customers and segments you should focus on.
- Stronger business intelligence(BI) and analytics : Although many data warehouses come with their own built-in analytical capabilities, you might prefer to use a third-party BI solution such as Microsoft Power BI, Qlik, or Tableau. With reverse ETL, you can easily move the information you need to the BI and analytics platforms of your choice.
- Better data governance: As mentioned above, prior to full-fledged reverse ETL, organizations were often forced to use a piecemeal approach, manually building connectors for each third-party target system. Not only is this unnecessarily convoluted, it’s also a nightmare for data governance, since each connector has to be created and maintained separately. Using a reverse ETL solution, you can unite all of your third-party connections under a single roof.
Perhaps the singular benefit of reverse ETL is that it helps adapt the ETL process to your organization, rather than the other way around. Instead of requiring everyone to run analytical queries within the data warehouse itself, the data can flow to the third-party tool of your choice, whether that’s Salesforce, Power BI, or something else entirely.
As the concept of reverse ETL comes into vogue, more and more data integration providers are offering solutions that incorporate this novel approach. Looking to bring reverse ETL into your own organization? Make sure to choose a mature data integration platform that offers plenty of third-party connectors to external databases and SaaS applications, so that you have the tools at your disposal to move your data wherever it needs to go.