ETL data pipelines — designed to extract, transform and load data into a warehouse — were, in many ways, designed to protect the data warehouse. Minimizing the amount of data that could be loaded helped preserve expensive on-premise computation and storage.
However, the cost of processing power and storage with today’s cloud-based warehouse options is next to nothing. Advances in cloud warehousing make the ETL process archaic. ELT is rapidly becoming the new standard.
Reversing the sequence of transformation and loading means that data pipelines can be designed in a manner that is completely agnostic to the anticipated analyses. By loading data before transforming it, analysts don’t have to determine beforehand exactly what insights they want to generate. This modularizes the data pipeline and removes the brittleness inherent to a system that must be redesigned with changing business needs.
With ETL, analysts have to decide ahead of time which questions to ask the data. Significant engineering resources are consumed to create a bespoke pipeline. Adding sources requires additional engineering work both to build and maintain data connectors, and pipelines can break or leak as upstream data sources change.
A Single Source of Truth
With the underlying source data funnelled directly into a data warehouse, analysts have a single source of truth. They can perform transformations on the data as needed without compromising the integrity of the warehoused data.
With ETL, every transformation performed on the data obscures some of the underlying information. With binning or Simpson’s Paradox in mind, it’s dangerous to draw conclusions from data that hasn’t been properly sliced.
ETL has a very high technical barrier. It typically necessitates the close involvement of IT and engineering talent to design and build the necessary infrastructure, along with extensive configuration and bespoke code to extract and transform data from each source.
Modularizing the data pipeline enables the construction of standardized connectors with normalized schemas, and with it, the automation and outsourcing of data pipelines to outside vendors. This makes ELT accessible to analysts and relatively non-technical users in an organization. Combining ELT with a cloud-based BI tool broadens access to analytics across the organization, allowing anyone to answer ad hoc questions.
About Fivetran: Shaped by the real-world needs of data analysts, Fivetran technology is the smartest, fastest way to replicate your applications, databases, events and files into a high-performance cloud warehouse. Fivetran connectors deploy in minutes, require zero maintenance, and automatically adjust to source changes — so your data team can stop worrying about engineering and focus on driving insights.