How to Send a failure email to third party in Synapse if the conditions in our Code is not satisfying the Rules?

Question:

I am Currently Looking for a Solution in which we are Doing some transformations in our Synapse Notebook and we should send a mail to third party regarding the failure if the rules are not satisfying .

The Scenario here is i am currently writing a code in my Synapse Notebook using Pyspark for transforming the Source Files which we are Picking from Synapse ADLS.
In the Starting I am taking the Count of Some of the FLAG Columns we are getting in our Source File. These Counts are without Null values

enter image description here

So once we are Done with our Transformations we need to Check whether the Counts of these columns are getting reduced or getting increased.

CASE1: If Counts of FLAG Columns are getting Reduced after Transformations which we have taken in starting from source file then we need to send a mail that the Count is getting Reduced and Process will fail and took the new file

For EXAMPLE SOURCE FILE flag Count= 3456
After Transformations it = 3000

CASE2: If Counts of FLAG Columns are equal or greater then the FLAG Count of Source File after transformations Process the File should processed successfully.

This Code and Comparison i need to do at the end Code after writing all my transformations Queries. How can we implement this in Synapse.

Thanks in Advance for your Response

Asked By: BigData Lover

||

Answers:

In this repro, I used dataflow activity to check the count of data in each column before and after transformation. Then I added If activity to check if count after transformation is lesser than source and basis the if activity’s output, the mail will be triggered. Below are the steps.

  • In dataflow activity, add two sources. One for dataset before transformation and the other for dataset after transformation.

enter image description here

  • Add aggregate transformation next to both sources. In aggregate settings, don’t select any column in group by and in aggregate, create new columns for deriving the count of non-null data in each column and assign the count using count() function.
    gif1

  • Add derived column transformation to aggregate transformation1 and create column called src and assign ‘src’ to that column. Similarly, add derived column transformation to aggregate transformation1 and create dummy column called tgt and assign ‘tgt’ to that column.
    gif2

  • Unpivot the data from derived column1.
    1. ungroup by: src
    2. unpivot key: unpivot column name: column
    3. unpivot column type: string
    4. Option: Pick column names as values.
    5. unpivoted columns: column arrangement: Normal
    6. Column name: src_value; type: long
    Similarly, unpivot it for derived column2.
    1. ungroup by: tgt
    2. unpivot key: unpivot column name: column
    3. unpivot column type: string
    4. Option: Pick column names as values.
    5. unpivoted columns: column arrangement: Normal
    6. Column name: tgt_value; type: long

  • Inner Join both outputs of unpivot1 and unpivot2 using join transformation basis the count column of both activities.
    enter image description here

  • Add derived column flag in order to find if source count is lesser than count after transformation.
    Expression for flag column is iif(src_value > tgt_value,1,0).
    enter image description here

  • Then aggregate transformation is added to sum all the flag column data.
    gif4

  • Add sink transformation and choose sink type as cache.
    enter image description here

  • After this add this dataflow activity in a pipeline.
    img1

  • Then if activity is added. In expression enter the output
    @greater(activity('Data flow1').output.runStatus.output.sink1.value[0].final_flag,0 )'enter image description here

  • if this expression results in true, You need to add activities in true case of IF activity to trigger mail. Refer the microsoft document on How to send email – Azure Data Factory & Azure Synapse | Microsoft Learn

Answered By: Aswin