Hi, I'm using pool.activenodes() in my traffic script but for handling maintenance pages. Here;s my code:
sub handleMaintenancePage( $pool, $page ) {
if( pool.activeNodes( $pool ) < 1 ) {
log.warn( "No active nodes for pool ". $pool );
$contents = resource.get( $page );
http.sendResponse( "200 OK", "text/html", $contents, "Pragma: no-cache\r\nCache-Control:no-cache\r\nExpires:0" );
}
}
When I brought down a node pool.activenodes() behaves correctly and and decrements the node count for the pool. But when the same node is brought up back, pool.activenodes() does not increment its count. Also I have set up HTTP monitors which work correctly and identify that the node above is active again. Appreciate if you could help here to understand pool.activenodes() behaviour.
Thanks.
Solved! Go to Solution.
HI Fahim,
This has always been the case for monitors in SteelApp up to and including the current release (9.7).
With SteelApp you can add multiple monitors to pools, and for that reason it must be the same monitor which fails/recovers the node. For example; you wouldn't want a ping monitor recovering a web server after it had been failed by a Full HTTP monitor. We do cover this in SteelApp training course material, but unfortunately (it seems) not in the user manual.
I have raised a documentation bug to have the monitoring section expanded to include information about the node recovery process. I can only apologies for the inconvenience this has caused you.
Cheers,
Mark
Fahim,
I did a simple test with STM 9.6r1 as follows:
$pool_activenodes = pool.activeNodes("pool_web");
log.info("There are: " . $pool_activenodes . " available to service this request");
I then tested, failed a node, retested, returned the node to service, and tested again.
As you can see from the log files, the pool.activenodes() is working as it should:
INFO | 19/Aug/2014:16:32:06 +1000 | INFO | Rule pool_activenodes, Virtual Server vs_web: There are: 4 available to service this request | stm001 |
INFO | 19/Aug/2014:16:32:06 +1000 | INFO | Rule pool_activenodes, Virtual Server vs_web: There are: 4 available to service this request | stm001 |
INFO | 19/Aug/2014:16:32:01 +1000 | INFO | Pool pool_web, Node 1.2.3.104:80: Node 1.2.3.104 is working again | stm001 |
INFO | 19/Aug/2014:16:32:00 +1000 | INFO | Monitor Simple HTTP: Monitor is working for node '1.2.3.104:80'. | stm001 |
SERIOUS | 19/Aug/2014:16:31:47 +1000 | SERIOUS | Pool pool_web, Node 1.2.3.104:80: Node 1.2.3.104 has failed - A monitor has detected a failure | stm001 |
WARN | 19/Aug/2014:16:31:46 +1000 | WARN | Monitor Simple HTTP: Monitor has detected a failure in node '1.2.3.104:80': Timeout while waiting for valid server response | stm001 |
INFO | 19/Aug/2014:16:31:45 +1000 | INFO | Rule pool_activenodes, Virtual Server vs_web: There are: 3 available to service this request | stm001 |
INFO | 19/Aug/2014:16:31:45 +1000 | INFO | Rule pool_activenodes, Virtual Server vs_web: There are: 3 available to service this request | stm001 |
INFO | 19/Aug/2014:16:31:45 +1000 | INFO | Rule pool_activenodes, Virtual Server vs_web: There are: 3 available to service this request | stm001 |
INFO | 19/Aug/2014:16:31:45 +1000 | INFO | Rule pool_activenodes, Virtual Server vs_web: There are: 3 available to service this request | stm001 |
INFO | 19/Aug/2014:16:31:39 +1000 | INFO | Rule pool_activenodes, Virtual Server vs_web: There are: 3 available to service this request | stm001 |
INFO | 19/Aug/2014:16:31:39 +1000 | INFO | Rule pool_activenodes, Virtual Server vs_web: There are: 3 available to service this request | stm001 |
INFO | 19/Aug/2014:16:31:39 +1000 | INFO | Rule pool_activenodes, Virtual Server vs_web: There are: 3 available to service this request | stm001 |
SERIOUS | 19/Aug/2014:16:31:39 +1000 | SERIOUS | Pool pool_web, Node 1.2.3.104:80: Node 1.2.3.104 has failed - Timeout while establishing connection (the machine may be down, or the network congested; increasing the 'max_connect_time' on the pool's 'Connection Management' page may help) | stm001 |
INFO | 19/Aug/2014:16:31:35 +1000 | INFO | Rule pool_activenodes, Virtual Server vs_web: There are: 4 available to service this request | stm001 |
Can I get you to modify your TS as follows (I added line #2 below) and see what the output is when you test:
sub handleMaintenancePage( $pool, $page ) {
log.info("There are: ". pool.activeNodes($pool) . " nodes available in the ". $pool . " pool."); # <-- Add this line here.
if( pool.activeNodes( $pool ) < 1 ) {
log.warn( "No active nodes for pool ". $pool );
$contents = resource.get( $page );
http.sendResponse( "200 OK", "text/html", $contents, "Pragma: no-cache\r\nCache-Control:no-cache\r\nExpires:0" );
}
}
Thanks Aidan.
Im using ZXTm v 9.4 and I have a Full HTTP monitor and Passive monitoring turned on. I did some testing and found that when a node is marked as 'failed' by PASSIVE monitoring, it cannot be marked as active by an active monitor (i.e. Full HTTP monitor) - once the same node is active again.
Here's my test results.
(Pool has no back-end nodes responding)
(There are: 0 nodes available in the pool)
There are: 0 nodes available in the pool
No active nodes for pool
Pool now has working nodes
There are: 0 nodes available in the pool
No active nodes for pool
Node XYZ: Node XYZ is working again
So in summary when a node is marked as 'failed' by PM and then marked as 'active' by AM, pool.activenodes() still considers it as failed. Im not sure whether its as issue with this perticular version of ZXTM. Aidan, could you please help here.
Thanks.
Hi Fahim,
Are you sure SteelApp shows the node has recovered? I suspect that it is the passive monitor which marked the node as down, and so it is the passive monitor which must recover it. The problem is that all traffic is being responded to by the TrafficScript rule and unless a request makes it back to the server it will never recover.
I would recommend trying one of the following approaches (in no particular order):
If we went for the last option of using a blank.gif to bypass the rule, it might look like this:
sub handleMaintenancePage( $pool, $page ) {
if( pool.activeNodes( $pool ) < 1 ) {
log.warn( "No active nodes for pool ". $pool );
if ( http.getPath() == "/blank.gif" ) {
request.setMaxReplytime(1);
break;
}
contents = resource.get( $page );
http.sendResponse( "200 OK", "text/html", $contents, "Pragma: no-cache\r\nCache-Control:no-cache\r\nExpires:0" );
}
}
If you are always delivering the same maintenance page for this VirtualServer, then you could set the resource as an error page under "Connection Management" for the Virtual Server, and then it will be used automatically. No TrafficScript required.
Cheers,
Mark
Thanks Mark. Unfortunately I didn't find
I suspect that it is the passive monitor which marked the node as down, and so it is the passive monitor which must recover it.
in the manuals or traffic script guide. Is that true with latest versions of TM or its only for 9.4?
HI Fahim,
This has always been the case for monitors in SteelApp up to and including the current release (9.7).
With SteelApp you can add multiple monitors to pools, and for that reason it must be the same monitor which fails/recovers the node. For example; you wouldn't want a ping monitor recovering a web server after it had been failed by a Full HTTP monitor. We do cover this in SteelApp training course material, but unfortunately (it seems) not in the user manual.
I have raised a documentation bug to have the monitoring section expanded to include information about the node recovery process. I can only apologies for the inconvenience this has caused you.
Cheers,
Mark