Things to think about when working with Azure Data Factory

Achraf Chennan
2 min readJan 4, 2021

Azure Data Factory brings ETL to a whole new level. With the great responsive GUI, you can build great ETL pipelines that are easily tested and debugged. While Azure Data Factory is a great service, there are some downfalls to think about.

The lookup activity with a limit of 5000 records

If you’re using ADF, short for Azure Data Factory, chances are high you will need the lookup activity. In my current project, we are using the lookup activity to find the names of the tables. The lookup activity is easy to use and the results can be used in a for-each loop. The only problem is that the lookup activity has a limit of 5000 records. In my current project, we have 5500 tables, that’s 500 more than the lookup can handle.

Luckily for every problem, we have a solution. We added a second lookup activity and used a “like” statement in the where clause.

Convert SQL data to CSV or Parquet

Another great feature ADF offers is the conversion of data. You can convert your data to SQL, CSV, parquet, and lots more. It’s a great feature, the only downfall is that it takes a lot of time and performance. In my current project, we convert on-premise SQL data to parquet. We have an on-premise host where the self-hosted integration runtime is installed. The host has a lot of memory and a powerful processor, but even so, it takes a lot of time to convert the data.

We did find a workaround by sending the data first to a SQL database inside Azure and after that, we send the data to a data lake. In the last step, we convert the data to parquet. This speeded up the process by almost 40%.

Sending an email from an ADF pipeline

Currently, there is no activity you can use to send an email inside a pipeline. Luckily there is an alternative for this problem. ADF has an activity to send a request to a web server. You can create a Logic App in Azure that accepts some parameters and uses those parameters to send an email with the Outlook task.

I also implement it this way. You can find the link to the tutorial at the end of this article.

Links

--

--