Monday, January 26, 2015

"Start Of Day" in Tivoli Workload Scheduler

While writing the article about Scheduling FINAL I've realized the need to write a specific article about the meaning of the "Start Of Day" and how it works.

Most of the customers are still running TWS using Start Of Day set to 0600 (the 6 in the morning), or they are scheduling FINAL 1 minute before the Start Of Day. This is no more required since TWS 8.3, but changing this setting in an existing production environment is not easy, and just since 8.6 the default value for a fresh installation has been changed to 0000.


Monday, January 19, 2015

Java DNS caching

I just spent some hours in the last days understanding an issue caused by the default Java behavior resolving hostnames, and I think sharing this configuration detail can help other people.

In my case I was experiencing failure due to some of our application servers that was unable to connect to another server. The problem was initially appearing random, some server was working, other not, even if they have the same configuration.
The failing servers was receiving a "connection refused" error while connecting to the backed server, but contacting the same URL from the command line was working successfully.
Restarting the application server was fixing the issue for that machine, but providing no clue about what had caused the issue.

Using tcpdump command I was able to trace the actual IP address used for the connections attempts, confirming that the application server was actually contacting an IP address different from the current one (the one returned by nslookup command). Investigating with the remote server team they confirmed that the other IP address is a backup system where the service was down at that moment.
My failing server was contacting an old server, currently inactive.



As we found, the HA (High Availability) architecture for the remote server is based on DNS resolution, with the hostname resolved to the IP address of the currently active server.
The default behavior of Java is to cache DNS resolution forever, with the result that our servers was continuing to use the IP address cached inside Java even if the active server has changed and the DNS has been updated.

This technote documents how to tune the JVM and change this behavior.
In our case we have changed the java.security file setting networkaddress.cache.ttl=30.

If HA strategy requires to update the DNS, this Java behavior can impact several scenario where TWS server or TWS agent have to contact a remote server using this strategy, e.g. a remote LDAP server or an application scheduled via plugin.

If you like this article and you find it useful, please share it on social media so other people may take advantage of it.

Monday, January 12, 2015

Using HTTP Server - Part 1: Introduction

During the development of our SaaS infrastructure, we have found very useful the usage of IBM HTTP Server in front of our WAS servers. Not only for load balancing on TDWC cluster, but also for security, performances and to modify some behaviors.

Setting up IBM HTTP Server is pretty simple and includes the following phases.
  • Define architecture and SSL certificates
  • Configure TDWC in cluster
  • Install both IBM HTTP Server and Web Server Plugin
  • Configure HTTP server
  • Configure web server plugin
I'll dedicate a specific article to each of the above phases.

On our SaaS, in addition to TDWC access, we use the HTTP server also for connections from dynamic agents and to handle few redirects:
- to display a disclaimer at the beginning of each session
- to replace the logout page with a custom one.

HTTP server can also be used to set browser caching and reduce the network traffic and TDWC server load.
The presence of HTTP server improves TDWC scalability also because reduce the impact of network latency on the server. In this configuration TDWC can return the result back to TDWC very quickly, with HTTP server that will keep a thread active to return the data back to browser. This reduces the number of active threads in TDWC server.

If you like this article and you find it useful, please share it on social media so other people may take advantage of it.

Monday, January 5, 2015

Recover FINAL on Tivoli Worload Scheduler

As said on the Scheduling FINAL post, and as TWS administrator knows, the extension of the plan is one of most important process to monitor in the product. If it fails the plan is not extended and the new job stream instances are not available to run.
For this reason it's important that any TWS administrator is able to recover the FINAL quickly, without, possibly without the need to open a PMR and wait to have L2 or L3 support on-line to help with the recovery, at least in the most common situations.

Of course, if you are using IBM Workload Automation SaaS you don't have to worry about this, IBM is managing the environment and is monitoring and is ready to recover it in case of failure.

In this post I'll explain the role of each job in the FINAL and FINALPOSTREPORTS job streams and how each of them can be recovered in case of failure.