WORKLOAD AUTOMATION COMMUNITY
  • Home
  • Blogs
  • Forum
  • Resources
  • Events
  • About
  • Contact
  • What's new

Seamless Workflow Orchestration -  Integrating Apache Airflow with Workload Automation for Enhanced Scheduling and Automation

1/10/2025

0 Comments

 
Picture
​Let us begin with understanding Apache Airflow, what it is all about before moving to our Apache Airflow Plug-in, and how it benefits our workload automation users. To empower your Workload Automation environment, download the Apache Airflow plug-in available on Automation Hub.
​Apache Airflow excels in flexibility and scalability for data engineering and ETL tasks, batch workload schedulers offer powerful, enterprise-grade capabilities for managing large, complex job streams across multiple platforms. For users seeking a low-code solution with advanced scheduling, dependency management, and monitoring features.
 
Workload Automation provides a comprehensive suite of tools that simplifies the orchestration of mission-critical workflows.
The integration of Apache Airflow brings significant benefits and opens up new possibilities for enhanced workflow orchestration and automation.
You can bring your existing specialized airflow DAGs that are taking advantage of specialized operators but you can remove the scheduler calendar and file dependencies. You can let Workload Automation handle from basic to advanced calendar rules, detailed file monitoring and managed file transfer, webhook triggers to release your DAGs. Not only that you will benefit from a full fledged automation only an enterprise scheduler can bring.
Here’s an overview of the advantages of this integration:
Key Features:
●      Workload Automation offers a graphical, and drag-and-drop interface, allowing users to build and manage work flows visually. This eliminates the need for writing code to define dependencies or schedules, as is necessary in Apache Airflow, where DAGs are created using Python.
●      The visual interface allows users to view job streams and dependencies in an intuitive, graphical format, making it easier to understand the flow of tasks and manage complex workflows without coding.
●      Workload Automation supports advanced scheduling features, including custom calendars, holiday rules, and exceptions, which make it easy to manage recurring tasks and adjust for variations in schedule needs.
●      Unlike Airflow, which requires dependencies to be coded within DAGs, batch workload schedulers offer a graphical way to create and manage task dependencies. This can include time-based dependencies, file-based dependencies (where jobs wait for specific files to arrive or change), or job-based dependencies.
●      Workflow Integration: Seamlessly integrate Workload Automation with Google Cloud Workflows to orchestrate complex processes and leverage plugins like Apache Spark, Hadoop, Snowflake, Tableau, Salesforce.
●      Visual Workflow Designer: Utilize the intuitive visual interface to design and manage your workflows, visualize your individual tasks and DAGs.
●      Event-Driven Automation: Trigger workflows based on events, such as API calls, messages on message queues like kafka, AWS SQS and more.
Use Cases:
●      Data Processing Pipelines: Trigger an Airflow DAG to run an ETL pipeline with Apache Spark and Hadoop Distributed File System. Workload Automation could initiate the pipeline, starting with data ingestion, and then Airflow can orchestrate the processing of that data through multiple stages in a Spark cluster, finally storing it in Amazon S3 or Google Cloud Storage.
●      Complex Application Orchestration Across Multiple Platforms: Schedule and orchestrate the deployment of microservices on Kubernetes, while also managing batch jobs on Azure Batch or AWS Batch through Airflow DAGs.
●      Automated File Transfers and Data Integration: Workload Automation to trigger Airflow DAGs for file transfer tasks with Axway SecureTransport or Sterling Connect, then continue with data integration jobs using Azure Data Factory or Talend Integration Cloud.
●      BI and Analytics Workflow Orchestration: Run ETL jobs using Oracle Data Integrator or SAP Data Services, and then trigger Airflow DAGs to push the cleaned data to Tableau or Google Big Query for visualization and analytics.
●      Cloud Resource Management and Cost Optimization: Leverage AWS Lambda, AWS Cloud Formation, and Azure Resource Manager to dynamically create and destroy cloud resources as needed for jobs. Workload Automation can initiate these workflows based on scheduled or event-driven triggers, while Airflow DAGs handle the actual resource management tasks.
●      Integration with Enterprise Applications and ITSM: Trigger Airflow DAGs to automate tasks in enterprise systems like SAP HANA, ServiceNow, or Salesforce. For example, Workload Automation can initiate a workflow to extract incident data from ServiceNow, process it in Airflow, and load it into an analytic tool for further analysis.
●      Event-Driven Automation: Trigger workflows based on events, such as changes in Cloud Storage or messages from Cloud Pub/Sub.
●      API Integration: Integrate with external APIs and services to automate tasks.
Example
A bank is striving to enhance customer engagement by generating actionable insights from customer transaction data. The bank has a data pipeline that involves collecting data from multiple sources, transforming it, and loading it into a data warehouse for reporting and analytics. To streamline the data pipeline and ensure reliable orchestration, the bank decides to use Apache Airflow for detailed task orchestration and Workload Automation to manage the overarching workflow.
Let's start with the job definition parameters section of our plugin.

Connect to Apache Airflow Application with Workload Automation:
Log in to the Dynamic Workload Console and open the Workload Designer. Choose to create a new job definition and select “Apache Airflow” job type in the other section.

Picture
General Tab:
Name: User can provide any name in the name field
Workstation: You need to choose the workstation

Picture
​Connection:

Apache Airflow API : The URL must have the following format: http://airflow-ip:PORT where airflow-ip is the reachable address of the Airflow application.

Port : Rest API port of the Airflow set up for example, the entire connection URL would appear as follows: http://10.134.240.85:8080

Choose the Authentication type.

Basic authentication
Use this option if your Apache Airflow instance is configured to use basic authentication.
●      Username
The user name associated with your Apache Airflow application. This is required when basic authentication is enabled.
●      Password:
The password associated with the specified username. Ensure this is kept secure and is mandatory for basic authentication.
Bearer Token (OAuth2)
Select this option if your Apache Airflow instance uses OAuth2 authentication for secure access.

Bearer Token (password)
The OAuth2 Bearer Token associated with your Apache Airflow application. This token is used for authorization and should be obtained via your identity provider or API client. Ensure it is valid and has the required permissions to execute Airflow DAGs.

Note: The Bearer Token must be refreshed periodically depending on your OAuth2 provider’s configuration. Ensure you have a process to obtain and update the token as needed.
Note: Before executing a job, enter the required fields: Username and Password for Basic Authentication, or the Bearer Token for OAuth2.

Picture
Picture
Test Connection
Click to verify if the connection works with Apache Airflow Setup correctly.
Picture
​Action:

Execute DAG: Use this section to specify the parameters required to execute an Airflow DAG.
DAG ID
The identifier for the DAG you wishes to execute. This field is required.

Search Button: Enter the complete DAG name or at least three characters to search for matching DAGs. The search uses a "contains" filter, meaning it returns any DAG names that contain the characters entered. This helps you quickly locate and select the desired DAG.

Configuration JSON: Optional configuration data in JSON format. This allows you to pass parameters to the DAG at run-time.

Fail on Paused DAG: Check this option to immediately fail the execution of DAGs that are in a paused state.

Logging Level : Specify the logging level for the DAG execution. Options are: Never or Child tasks.

Extra Parameters: This section describes additional parameters used in DAG execution to provide context and control over job-specific configurations:
●      Dag_run_id : Unique identifier for the DAG run, in the format manual__YYYY-MM-DDThh:mm:ss.ssssss+00:00. This parameter tracks the execution instance.
●      Conf: JSON object that holds custom configuration parameters to pass to the DAG run. Use this to provide dynamic inputs.
●      Job_Number : Identifier similar to the dag_run_id, indicating the job's specific run instance.
●      Dag_name : Name of the DAG being executed, helpful for tracking and logging purposes.
●      Status: Current status of the DAG run, represented by an integer. It allows for monitoring and handling DAG execution states programmatically
Picture
​Saving and submitting your job:
Submit your job into the current plan. You can add your job to the job stream that automates your business process flow. Select the action menu in the top-left corner of the job definition panel and click on Submit Job into Current Plan. A confirmation message is displayed, and you can switch to the monitoring view to see what is going on.
Picture
Picture
Monitor Page:
Users can track the jobs in the monitor page.
Picture
​If the job completes successfully in the backend, the status should be changed to successful.
Picture
Job Log Details: ​
Picture
Picture
Workflow Details Page:
Picture
​In Conclusion:
The integration between Workload Automation and Airflow enables you to design flexible, multistage workflows that take advantage of Workload Automation’s advanced scheduling and orchestration capabilities along with Airflow's rich plug-in ecosystem. This setup allows for streamlined, automated processes across diverse applications and platforms, improving both efficiency and visibility for complex workflows

Picture
Ernesto Carrabba, Product Manager, HCL Clara, HCL HERO and HCL Workload Automation 
 
Ernesto Carrabba is the Product Manager for HCL Clara, HCL HERO and HCL Workload Automation. Ernesto is a very dynamic product manager with experience in building and launching IoT products, combined with a master's degree in mechanical engineering and study researches on Augmented and Virtual Reality

Picture
​​Juscelino Candido De Lima Junior 
Juscelino has over 15 years in the IT industry, at IBM, he started as an IT Specialist - Workload Automation, in the last five years working as an infrastructure and application IT architect. His areas of expertise include multi-cloud architecture, containers, microservices, observability, virtualization, networks, distributed systems, systems administration, production control, and enterprise job scheduling. IBM Master Inventor with +20 filed patents.
0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Archives

    December 2025
    October 2025
    July 2025
    June 2025
    May 2025
    March 2025
    February 2025
    January 2025
    December 2024
    November 2024
    October 2024
    September 2024
    August 2024
    July 2024
    June 2024
    May 2024
    April 2024
    March 2024
    February 2024
    January 2024
    October 2023
    August 2023
    July 2023
    June 2023
    May 2023
    April 2023
    March 2023
    February 2023
    January 2023
    December 2022
    September 2022
    August 2022
    July 2022
    June 2022
    May 2022
    April 2022
    March 2022
    February 2022
    January 2022
    December 2021
    October 2021
    September 2021
    August 2021
    July 2021
    June 2021
    May 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    November 2020
    October 2020
    September 2020
    August 2020
    July 2020
    June 2020
    May 2020
    April 2020
    March 2020
    January 2020
    December 2019
    November 2019
    October 2019
    August 2019
    July 2019
    June 2019
    May 2019
    April 2019
    March 2019
    February 2019
    January 2019
    December 2018
    November 2018
    October 2018
    September 2018
    August 2018
    July 2018
    June 2018
    May 2018
    April 2018
    March 2018
    February 2018
    January 2018
    December 2017
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017

    Categories

    All
    Analytics
    Azure
    Business Applications
    Cloud
    Data Storage
    DevOps
    Monitoring & Reporting

    RSS Feed

www.hcltechsw.com
About HCL Software 
HCL Software is a division of HCL Technologies (HCL) that operates its primary software business. It develops, markets, sells, and supports over 20 product families in the areas of DevSecOps, Automation, Digital Solutions, Data Management, Marketing and Commerce, and Mainframes. HCL Software has offices and labs around the world to serve thousands of customers. Its mission is to drive ultimate customer success with their IT investments through relentless innovation of its products. For more information, To know more  please visit www.hcltechsw.com.  Copyright © 2024 HCL Technologies Limited
  • Home
  • Blogs
  • Forum
  • Resources
  • Events
  • About
  • Contact
  • What's new