Inspired by an earlier blog where we looked at 'How Interchangeable Delta Tables Are Between Databricks and Synapse' I decided to do a similar exercise, but this time with the integration pipeline components taking centre stage.

As I said in my previous blog post, the question in the heading of this blog should be incredibly pertinent to all solution/technical leads delivering an Azure based data platform solution so to answer it directly:

Question: How Interchangeable Are Integration Pipelines Between Azure Data Factory and Azure Synapse Analytics?

Answer: Very interchangeable! 

Or, to ask the question another way:

Question: Can we use the same integration components in Azure Data Factory and Azure Synapse Analytics at the same time?

Answer: Yes!

The only caveat to both these questions is that in the source control configuration for each resource you set the 'root folder' to the same location. In my case this was just the root of the repository itself because I created the test case from scratch. Link below if you want to view the contents.

https://github.com/mrpaulandrew/AzureIntegrationPipelines


Not convinced? Watch this…


With the above in mind, for me, as an architect, things now get very interesting when designing a data platform solution.

  • Delta Tables interchangeable as an open source standard when working with Apache Spark as the compute.
  • Data Lake storage interchangeable and accessible by lots of different resources by the very nature of the underlying distributed file system.
  • Orchestration components interchangeable between integration resources when accessing the same Git repository and using the same pipeline artifacts.

Therefore, in a given data platform architecture (before Synapse arrived) where there were a common set of core components, listed below. Now there isn't any reason why (in most cases) we can't switch things over to Azure Synapse Analytics, if we wanted to.

Pre-Synapse Core Resources

  1. Data Lake
  2. Databricks
  3. Data Factory

Post-Synapse Core Resources

  1. Data Lake
  2. Synapse – Spark Pools
  3. Synapse – Integration Pipelines

The other great thing, as data engineers we wouldn't need to do much work for these resources in our solution to become almost plug and play. We could even run solutions in parallel with some creative code branching!

Now, trolls, I fully appreciate my initial test in the video was very very simple, mainly due to a lack of time. So, I will continue this work and test all the integration components including debugging in both resources at the same time to see if we uncover any side effects to this repo sharing. So, stay tuned.

For now I wanted to plant the seed of architecture interchangeability so you could consider trying out the same and maybe unlock Synapse in a future data platform solution because it's fairly easy to do so, I think you'll agree 🙂


Many thanks for reading.