[New post] Using Mermaid to Create a ProcFwk Pipeline Lineage Diagram
Paul Andrew posted: "The other day Kamil (@NowinskiK) messaged me on Teams asking if I liked Mermaid's 🙂 I prepared myself for a conversation about which was better; the fish half being on top vs on the bottom. Ha! Of course not!! Kamil was actually referring to the"
I confess until this point, I hadn't encountered the tool.
The conversation that followed, with Kamil, was about creating a pipeline lineage diagram from implementations of the Azure pipeline orchestration framework (AKA: procfwk, AKA: my baby 🙂 ).
Given the bespoke natural of the framework, a lot of traditional data cataloging tools can't simply scrap lineage metadata from Data Factory or Synapse Integration Pipelines directly as the framework means worker pipeline triggering is abstracted way via the control pipelines. They would need to traverse the procfwk metadata model.
To cut a long story short, this traversal is what has now been done. Below you'll find a T-SQL script for the procfwk database that outputs a Mermaid 'flowchart' diagram showing the execution stages for a given batch, with pipelines per stage and across stage pipeline dependencies connected via a dotted line.
To render the graphic we used a Wiki page within Azure DevOps.
I'll apply a few more tweaks to this and then include it in the next release of the procfwk via it's own repository, probably wrapped up as a Stored Procedure and better documented.
T-SQL
SET NOCOUNT ON; --local variables DECLARE @BatchName VARCHAR(255) = 'Daily'; --set as required DECLARE @PageContent NVARCHAR(MAX) = ''; DECLARE @BaseData TABLE ( [OrchestratorId] INT NOT NULL, [OrchestratorName] NVARCHAR(200) NOT NULL, [StageId] INT NOT NULL, [StageName] VARCHAR(225) NOT NULL, [PipelineId] INT NOT NULL, [PipelineName] NVARCHAR(200) NOT NULL ) --get reusable metadata INSERT INTO @BaseData SELECT o.[OrchestratorId], o.[OrchestratorName], s.[StageId], s.[StageName], p.[PipelineId], p.[PipelineName] FROM [procfwk].[Pipelines] p INNER JOIN [procfwk].[Orchestrators] o ON p.[OrchestratorId] = o.[OrchestratorId] INNER JOIN [procfwk].[Stages] s ON p.[StageId] = s.[StageId] INNER JOIN [procfwk].[BatchStageLink] bs ON s.[StageId] = bs.[StageId] INNER JOIN [procfwk].[Batches] b ON bs.[BatchId] = b.[BatchId] WHERE p.[Enabled] = 1 AND b.[BatchName] = @BatchName; --add orchestrator(s) sub graphs ;WITH orchestrators AS ( SELECT DISTINCT [OrchestratorId], [OrchestratorName], 'subgraph ' + [OrchestratorName] + CHAR(13) + 'style ' + [OrchestratorName] + ' fill:#E2F0D9,stroke:#E2F0D9' + CHAR(13) + '##o' + CAST([OrchestratorId] * 10000 AS VARCHAR) + '##' + CHAR(13) + 'end' + CHAR(13) AS OrchestratorSubGraphs FROM @BaseData ) SELECT @PageContent += OrchestratorSubGraphs FROM orchestrators; --add stage sub graphs ;WITH stages AS ( SELECT DISTINCT [OrchestratorId], [StageName], [StageId] FROM @BaseData ), stageSubs AS ( SELECT [OrchestratorId], STRING_AGG('subgraph ' + [StageName] + CHAR(13) + 'style ' + [StageName] + ' fill:#FFF2CC,stroke:#FFF2CC' + CHAR(13) + '##s' + CAST([StageId] AS VARCHAR) + '##' + CHAR(13) + 'end', CHAR(13) ) AS 'StageSubGraphs' FROM stages GROUP BY [OrchestratorId] ) SELECT @PageContent = REPLACE(@PageContent,'##o' + CAST([OrchestratorId] * 10000 AS VARCHAR) + '##',[StageSubGraphs]) FROM stageSubs; --add pipelines within stage ;WITH pipelines AS ( SELECT [StageId], STRING_AGG( CONCAT('p',CAST([PipelineId] * 10 AS VARCHAR),'(',[PipelineName],')',CHAR(13), 'style ','p',CAST([PipelineId] * 10 AS VARCHAR),' fill:#F2F2F2,stroke:#F2F2F2'),CHAR(13) ) AS 'PipelinesInStage' FROM @BaseData GROUP BY [StageId] ) SELECT @PageContent = REPLACE(@PageContent,'##s' + CAST([StageId] AS VARCHAR) + '##',[PipelinesInStage]) FROM pipelines --add stage nodes ;WITH stageNodes AS ( SELECT DISTINCT [StageId], 's' + CAST([StageId] * 100 AS VARCHAR) + '[' + [StageName] + ']' + CHAR(13) + 'style s' + CAST([StageId] * 100 AS VARCHAR) + ' fill:#FFF2CC,stroke:#FFF2CC' + CHAR(13) AS StageNode FROM @BaseData ) SELECT @PageContent = @PageContent + [StageNode] FROM stageNodes ORDER BY [StageId]; --add stage to pipeline relationships SELECT @PageContent = @PageContent + 's' + CAST([StageId] * 100 AS VARCHAR) + ' --> ' + 'p' + CAST([PipelineId] * 10 AS VARCHAR) + CHAR(13) FROM @BaseData; --add stage to stage relationships ;WITH maxStage AS ( SELECT MAX([StageId]) -1 AS maxStageId FROM @BaseData ), stageToStage AS ( SELECT DISTINCT 's' + CAST(b.[StageId] * 100 AS VARCHAR) + ' ==> ' + 's' + CAST((b.[StageId] + 1) * 100 AS VARCHAR) + CHAR(13) AS Content FROM @BaseData b CROSS JOIN maxStage WHERE b.[StageId] <= maxStage.[maxStageId] ) SELECT @PageContent = @PageContent + [Content] FROM stageToStage; --add pipeline to pipeline relationships SELECT @PageContent = @PageContent + 'p' + CAST(pd.[PipelineId] * 10 AS VARCHAR) + ' -.- ' + 'p' + CAST(pd.[DependantPipelineId] * 10 AS VARCHAR) + CHAR(13) FROM [procfwk].[PipelineDependencies] pd INNER JOIN @BaseData b1 ON pd.[PipelineId] = b1.[PipelineId] INNER JOIN @BaseData b2 ON pd.[DependantPipelineId] = b2.[PipelineId]; --add batch subgraph SELECT @PageContent = 'subgraph ' + [BatchName] + CHAR(13) + 'style ' + @BatchName + ' fill:#DEEBF7,stroke:#DEEBF7' + CHAR(13) + @PageContent FROM [procfwk].[Batches] WHERE [BatchName] = @BatchName; SET @PageContent = @PageContent + 'end'; --add mermaid header SELECT @PageContent = '::: mermaid' + CHAR(13) + 'graph LR' + CHAR(13) + @PageContent; --return output PRINT @PageContent
Mermaid (Markdown)
This is an example of the output from the procfwk default metadata.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.