If you want to avoid a potential business disruption in your Workload Automation environment, you should leverage the Master Domain Manager configuration. But, what happens if the RDBMS connected to Workload Automation crashes? In this article, we will describe how to manage the Master Domain Manager and DB2 HADR to allow business continuity during a disaster event. Scenario To avoid possible disasters in a Master Domain Manager production environment, you must configure your environment in high availability. In the above figure we can see the following components on three different platforms:
How to set up DB2 HADR for Workload Automation This configuration is composed of two nodes (NODE 1 and NODE 2) on which are installed Master Domain Manager (MDM) The DB2 HADR is composed of two nodes, one is the primary node (NODE 1) that is active and a secondary node (NODE 2) that is in standby mode synchronizing data with the primary node. DB2 HADR configuration To configure the Workload Automation database in HADR we’ve to setup DB2 as below on both nodes. In the following commands, "SIM" is the database name. Setup NODE 1 - NODE 2 database properties: This parameter specifies the hostname of the local database db2 update db cfg for SIM using HADR_LOCAL_HOST <NODE1|NODE2> NODE 1: db2 update db cfg for SIM using HADR_LOCAL_HOST 10.10.23.01 NODE 2: db2 update db cfg for SIM using HADR_LOCAL_HOST 10.10.23.02 This parameter specifies the hostname of the remote database db2 update db cfg for SIM using HADR_REMOTE_HOST <NODE1|NODE2> NODE 1: db2 update db cfg for SIM using HADR_REMOTE_HOST 10.10.23.02 NODE 2: db2 update db cfg for SIM using HADR_REMOTE_HOST 10.10.23.01 This parameter specifies the local DB2 service name db2 update db cfg for SIM using HADR_LOCAL_SVC <local service name> NODE 1: db2 update db cfg for SIM using HADR_LOCAL_SVC 60070 NODE 2: db2 update db cfg for SIM using HADR_LOCAL_SVC 60070 This parameter specifies the remote DB2 service name db2 update db cfg for SIM using HADR_REMOTE_SVC <remote service name> NODE 1: db2 update db cfg for SIM using HADR_REMOTE_SVC 60070 NODE 2: db2 update db cfg for SIM using HADR_REMOTE_SVC 60070 This parameter specifies the remote instance name db2 update db cfg for SIM using HADR_REMOTE_INST <remote instance name> NODE 1: db2 update db cfg for SIM using HADR_REMOTE_INST db2inst1 NODE 2: db2 update db cfg for SIM using HADR_REMOTE_INST db2inst1 This parameter set the logindexbuild database configuration parameter to ON to ensure that complete information is logged for index creation, re-creation, and reorganization db2 update db cfg for SIM using LOGINDEXBUILD <ON> NODE 1: db2 update db cfg for SIM using LOGINDEXBUILD ON NODE 2: db2 update db cfg for SIM using LOGINDEXBUILD ON This parameter specifies the transaction logs synch mode. This parameter should set depending on various factors like network speed between nodes db2 update db cfg for SIM using HADR_SYNCMODE <sync mode> NODE 1: db2 update db cfg for SIM using HADR_SYNCMODE NEARSYNC NODE 2: db2 update db cfg for SIM using HADR_SYNCMODE NEARSYNC This parameter specifies the media type of the primary destination for logs that are archived from the current log path db2 UPDATE DB CFG for SIM USING logarchmeth1 <LOGRETAIN> NODE 1: db2 UPDATE DB CFG for SIM USING logarchmeth1 LOGRETAIN NODE 2: db2 UPDATE DB CFG for SIM USING logarchmeth1 LOGRETAIN This parameter specifies the DB alternate server name and port db2 update alternate server for database SIM using hostname <other machine> port <db_port> NODE 1: db2 update alternate server for database SIM using hostname AS-AHC-LNX002 port 25010 NODE 2: db2 update alternate server for database SIM using hostname AS-AHC-LNX001 port 50001 Backup of NODE 1 and restore to NODE 2 Make a backup of NODE 1 and restore to NODE 2 to import the MDM definition and tables On NODE 1 issue the following command: db2 BACKUP DATABASE SIM ON ALL DBPARTITIONNUMS TO <folder> This command creates a file like: SIM.0.db2inst1.DBPART000.20240417052131.001 Copy this file to a folder on NODE 2 On NODE 2 issue the following command: db2 RESTORE DATABASE SIM FROM <folder> Start HADR on both nodes Now that HADR is configured, we have to start it using a fixed order: first the standby node and then the primary one. On NODE 2 issue the following command: db2 start hadr on db SIM as standby On NODE 1 issue the following command: db2 start hadr on db SIM as primary How to configure WebSphere Liberty to manage DB2 HADR After configured DB2 in HADR, we have to configure the MDM datasource of Liberty in order to point to HADR instead of single DB node. So, Liberty, also if doesn’t know where DB is physically active, is able to reach MDM database. To configure MDM datasource properties edit the file <TWA_HOME>/<DATADIR>/usr/servers/engineServer/configDropins/overrides/datasource.xml: Add the highlighted parameters to the properties section: <properties.db2.jcc serverName="NODE1" portNumber="50001" databaseName="SIM" user="db2inst1" password="="{xor}xxxxxxxxxxxxxxxxxx” clientRerouteAlternateServerName="NODE2" clientRerouteAlternatePortNumber="25010" retryIntervalForClientReroute="2" maxRetriesForClientReroute="3" Now let's try the steps:
db2 update db cfg for SIM using HADR_LOCAL_SVC 60070 db2 update db cfg for SIM using HADR_REMOTE_HOST 10.10.23.02 db2 update db cfg for SIM using HADR_REMOTE_SVC 60070 db2 update db cfg for SIM using HADR_REMOTE_INST db2inst1 db2 update db cfg for SIM using LOGINDEXBUILD ON db2 UPDATE DB CFG FOR SIM USING HADR_SYNCMODE NEARSYNC db2 UPDATE DB CFG for SIM USING logarchmeth1 LOGRETAIN db2 update alternate server for database SIM using hostname AS-AHC-LNX002 port 25010
db2 update db cfg for SIM using HADR_LOCAL_SVC 60070 db2 update db cfg for SIM using HADR_REMOTE_HOST 10.10.23.01 db2 update db cfg for SIM using HADR_REMOTE_SVC 60070 db2 update db cfg for SIM using HADR_REMOTE_INST db2inst1 db2 update db cfg for SIM using LOGINDEXBUILD ON db2 UPDATE DB CFG FOR SIM USING HADR_SYNCMODE NEARSYNC db2 UPDATE DB CFG for SIM USING logarchmeth1 LOGRETAIN db2 update alternate server for database SIM using hostname AS-AHC-LNX001 port 50001
Troubleshooting: How to check HADR health To check the HADR status issue the following command on both nodes: db2pd –hadr –db SIM where SIM is the database name. Here an example of output of the command on the NODE 1: This picture shows the status of HADR on primary node Highlighted parameters are the ones that describes the HADR healthy: HADR_ROLE: on primary node (NODE 1) must be PRIMARY and STANDBY on secondary node (NODE 2) HADR_STATE: must be PEER HADR_CONNECT_STATUS: must be CONNECTED LOG_TIME parameters describes the latest transaction log on all nodes: the date and time must be synchronized and up-to-date Below an example of output of the command on the NODE 2:
<variable name="db.clientRerouteAlternateServerName" value="AS-AHC-LNX002"/> <variable name="db.clientRerouteAlternatePortNumber" value="25010"/> retryIntervalForClientReroute="2" maxRetriesForClientReroute="3"
How to recover from disaster To recover from a disaster scenario, for example if the primary node (NODE 1) crashes, we can leverage multi node environment to allow business continuity. Takeover database on standby node We’ve to “takeover” the database on the secondary node.
Now the NODE 2 is a “PRIMARY” NODE: NODE 1 is “STANDBY” NODE:
Now the MDM is using NODE 2. I'm trying an “optman ls” and submitting one simple job
Simone Grammatico
Simone is an IT Specialist and QA Tester at HCL Software with 20 years of experience as a consultant to public administration clients and later, since 2012, as a QA business software tester. Today, he is in charge of customer support (L3), FixPacks release, and Security Test of Workload Automation.
0 Comments
Your comment will be posted after it is approved.
Leave a Reply. |
Archives
August 2024
Categories
All
|