WORKLOAD AUTOMATION COMMUNITY
  • Home
  • Blogs
  • Forum
  • Resources
  • Events
    • IWA 9.5 Roadshows
  • About
  • Contact
  • What's new

Case Study: Data Pipeline Orchestration ETL

6/2/2023

0 Comments

 
Picture
In this Blog, we would go through a Complete End to End Use Case of Data Pipeline Orchestration Scenario over HCL Workload Automation where we would have data coming in from Multiple Source Systems and detection of files and processing of Source Files would take place in different Systems, post Processing, we would have the Data Transformation done via Data Staging/Transformation Platform followed by ingestion of the Transformed Data into a Data Warehouse. 
Once Data is ingested into the Data Warehouse, we could run Analytics on it through any of the Popular Analytics Platform, Followed by Querying on Status of the Analytics Step run to get its Response and then run Reports from a Reporting Tool.
 
Customer Requirement:
A Customer is needing to Orchestrate his Data Pipeline End to End and has multiple incoming Interface Files coming from different Source Systems.

The Interface Files arrived must be detected in real time and would need to be processed in real time in different Systems.
Customer uses SAP as their Main ERP and has an SAP R/3 System to process an incoming Interface file with an ABAP Program passing a Variant and a Step User.

Customer also has a SAP Process Chain to be run also as part of the Flow to process an incoming File.

There’s also a File Transfer Step which would fetch one of the Source Files and import into another Target.

Once the Data processing is done, there is a need to transform the output generated and Customer uses Informatica as the Data Transformation Tool here.

The Transformed Data also needs to ingest into a Data Warehouse, Customer uses SAP DataWarehouse for this Purpose on SAP HANA DB.

There is also a need to run Analytics on top of the Data Ingested and here Customer uses PowerBI as the main Product to run Analytics on top of the ingested Data, PowerBI is used to refresh the DataSource here.

The Status of the Refreshed Data Source needs to be fetched, so this needs to ascertain the Refresh DataSource Step Finished Successfully.

Post the Status Check, there is also a need to run Reports using Workday Application which can then be mailed out to the Users.
​
Failure on any Step needs Alerts to be sent out in MS Teams and there’s also a need to alert on MS Teams if the File Transfer Step takes longer than 3 minutes during processing.
  
Solution:
HCL Workload Automation would act as the main Orchestrator here to orchestrate this Data Pipeline End to End cutting across Multiple Applications to run different Jobs in the Flow and would also detect Source Files from different Source Systems in real time while processing them and then using various Plugins available within the Product to readily connect to each Application and running individual Steps , while also tracking the Steps in real time and alert Abends/Failures on MS Teams and also keep track of Long Runners and alert in MS Teams in real time , if needed Recovery can also be Automated through Conditional Branching or through Recovery Jobs or via Event Rules(for the Scope of this Blog , we are currently not covering Automated Recovery).
Picture
Detecting Source Files:
Source Files can be detected easily through Event Rules with the “File Created” with an Action to Submit the Jobstream in real time or realized via File Dependencies to detect files in real time.
Picture
SAP R/3 Batch Job:
The SAP R/3 Batch job consists of running an ABAP Program with a Variant and Step user, you can also pass Printer Parameters like New Spool Request, Spool Recipients, Printer Formats etc. alongwith all typical Archiving Parameters you would find within SM36.They could be Multi Step SAP Jobs also.
Picture
HWA would fetch the SAP Joblog from within SAP in real time into the local Joblog within HWA:
Picture
SAP Process Chain Job:
An SAP Process Chain job is to run BW SAP Process Chains with the option of passing the Execution User also within to trigger the BW Process Chain:
Picture
Picture
File Transfer Job:
A File Transfer Job can be used to trigger File Transfers to and From External third parties or internally between two Different Servers, you also have the option of leveraging the A2A (Agent to Agent) protocol to do https Transfers from Source to Destination between Agents, you also have options for Regular Expression Mappings, Archival at Source, Archival at Destination with Date and Time Stamps etc.
Picture
Picture
Picture
Picture
Picture
In this case, the File Transfer Step runs long and goes past Max Duration set to 3 Mins for this particular Job:
Picture
​This results in a Notification/Alert sent to MS Teams via an Event Rule 
Picture
Picture
Informatica Job:
The Data Transformation is done via an Informatica Step where the Informatica PowerCenter Workflow is run from an Informatica Folder, also mentioning the Service Domain, Repository Domain, Repository Name etc.
Picture
Picture
The Joblog depicting the Transformation within Informatica:
Picture
SAP HANA Job:
SAP HANA DB Job is used to Query the DB or run a Stored Procedure against a SAP HANA DB, also mentioning the SAP HANA ngdbc Driver in the Job Definition, the job would do the DB Import into SAP HANA DB in the below way:
Picture
Picture
Picture
 Post the import in the Database, we could run Analytics on top of the Data ingested into the Database.
 
PowerBI Refresh DataSource:
The next step post the Import into the HANA DB is to refresh a Power BI Data Source, this can be accomplished through a RESTFUL Job within HWA. A Restful Job can be used to call a Service URL to do a https post (in this case Post) by passing a JSON Body to be posted in the Body of the Job either via a File or directly within the Body field:
Picture
Picture
Picture
Get Status of PowerBI Refresh:
The Status of the previous Step is pulled through another RESTFUL job which would do a https GET on the Status of the previous Step passing the DATASETID and REFRESHID as Variables from previous step:
Picture
Workday Reports:
Reports are run in Workday Application leveraging again a RESTFUL Post Job passing the JSON Body directly in the Body Field of the Job:
Picture
Picture
Overall Status of the Data Pipeline:
Picture
Using the Graphical Views of the Product, it's easy to track this entire process in real time and also get alerted in case of any Failures or Long Runners directly over MS Teams/Ticketing Tool/Email/SMS.
  
Advantages of running Data Pipeline Orchestration over HWA:
  1. Any Step in the Data Ingestion, Data Processing, DB Import, Analytics Step, Reporting Step could be realized through a Variety of Plugins. Automate and Realize any Step of the overall process leveraging the 99+ plugins shipped.
  2. Initiation of the Jobstream Flow, realization of Dependencies and Detection of Source Files is Out of the Box.
  3. Tracking of the Flow can be done via the Chatbot: CLARA or via the Graphical Views in Dynamic Workload Console.
  4. Alerting can be over MS-Teams (or any Messaging Tool of your Choice), Ticketing Tool, Email or over SMS.
  5. Entire Orchestration achieved End to End through HWA.

Authors Bio
Picture
Sriram V, ​Senior Technical Lead
​

I have been working with Workload Automation for the last 14 years , have had a chance to work with 100+ Customers from various industries as an Implementer , SME , India SME in IBM before also getting a chance to directly work on the Product Support Side as a DevOps Engineer on IBM SaaS Offering of the Product ,also had a stint with Lab Services Team developing Conversion Utilities to move from Competition,  have been working as a Technical Advisor on the Product lately on the Tech Sales Side.
View my profile on LinkedIn
0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Archives

    August 2023
    July 2023
    June 2023
    May 2023
    April 2023
    March 2023
    February 2023
    January 2023
    December 2022
    September 2022
    August 2022
    July 2022
    June 2022
    May 2022
    April 2022
    March 2022
    February 2022
    January 2022
    December 2021
    October 2021
    September 2021
    August 2021
    July 2021
    June 2021
    May 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    November 2020
    October 2020
    September 2020
    August 2020
    July 2020
    June 2020
    May 2020
    April 2020
    March 2020
    January 2020
    December 2019
    November 2019
    October 2019
    August 2019
    July 2019
    June 2019
    May 2019
    April 2019
    March 2019
    February 2019
    January 2019
    December 2018
    November 2018
    October 2018
    September 2018
    August 2018
    July 2018
    June 2018
    May 2018
    April 2018
    March 2018
    February 2018
    January 2018
    December 2017
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017

    Categories

    All
    Analytics
    Azure
    Business Applications
    Cloud
    Data Storage
    DevOps
    Monitoring & Reporting

    RSS Feed

www.hcltechsw.com
About HCL Software 
HCL Software is a division of HCL Technologies (HCL) that operates its primary software business. It develops, markets, sells, and supports over 20 product families in the areas of DevSecOps, Automation, Digital Solutions, Data Management, Marketing and Commerce, and Mainframes. HCL Software has offices and labs around the world to serve thousands of customers. Its mission is to drive ultimate customer success with their IT investments through relentless innovation of its products. For more information, To know more  please visit www.hcltechsw.com.  Copyright © 2019 HCL Technologies Limited
  • Home
  • Blogs
  • Forum
  • Resources
  • Events
    • IWA 9.5 Roadshows
  • About
  • Contact
  • What's new