Thousands of users rely on Whaly every day to monitor and improve their revenue. Join them now!
Most of your internal data is trapped into your SaaS tools, making it impossible to query or combine it.
You need a specific "Data extractor" tool that can get your data from your favorite tools and put it into one place (generally a database). Those extractors are called "ETL" for Extract-Transform-Load or "ELT" Extract-Load-Transform.
Note: The name difference (ETL vs. ELT) is explained a lot on the internet and is irrelevant in this article, so we'll call all Data extractor platforms ETL.
When building Whaly ๐ณ, we tested numerous ETL tools for our customers and we forged our opinions on what ETL should be like. Here is a review of the criteria we considered for choosing your ETL based on our own experiences.
To get confidence in your data analysis and dashboards, you need fresh data at your disposal. To do so, you need an ETL that runs often and fast. Data should be here in seconds, or minutes, not hours โณ
Top players:
Second-rate players:
If moving data is expensive, you will probably stop doing it, or be really selective over what you extract, blinding yourself in the process from all the insights you could get.
So, your ETL solution should be in the 10-100$ range per month per data source, not more.
Top players:
Second-rate players:
Data that you move always have relations. A campaign is linked to a creative and a spent table. A contact is linked to a company, etc.
The information about how your extracted tables are related to each other is living in the data source, it should be extracted with the data.
Top players:
Second-rate players:
Some APIs endpoints are a mess and some normalization need to happen because nobody wants to start analyzing data directly from the "API result".
A first normalization step to convert timestamp, to store data into tables when it makes sense, to remove invalid data is needed so that data analyst can keep their sanity and be productive.
Top players:
Second-rate players:
When something is deleted in a source system, it should be deleted or flagged in the data warehouse.
Weirdly, most connectors around there are not making the extra mile needed to make this a reality.
Top players:
Second-rate players:
Data is the lifeblood of business. When business changes, data change accordingly.
This means new attributes are getting created, updated, or deleted every day/week. It's the same for objects.
A good ETL tool understands that it should be a breeze to manage those changes and react properly to them.
Top players:
Second-rate players:
Sometimes, you will need to tweak the code of your ETL to add a specific property or improve a specific connector that only suits your need. Having access to the source code and being able to edit it to fit your needs can therefore come in very handy.
Sadly, not all ETL share their connector source code or let end-users contribute or run their own modified version of the connector.
Top players:
Second-rate players:
Being able to get a good level of monitoring of what was synced, what just failed, why, and how it did so, is paramount for any data pipeline such as an ETL.
So a proper ETL platform should be tightly integrated with monitoring tools to help Ops manage and react to any weird behavior.
Top players:
As you can see, no ETL platform crosses each of the boxes, they each have limitations in one domain or another.
Fivetran will be more costly but save you time on the data/engineering side because your team will have less work to do in terms of preparation and transformations.
Stitch and the other players are cheaper. However, your data/engineering team will spend a lot of time transforming the data once loaded.
At Whaly ๐ณ, as those are the characteristics where we can't do any compromises (because our users won't ๐), we had to build our own connectors that check all of the points above ๐ช
Thousands of users rely on Whaly every day to monitor and improve their revenue. Join them now!