cancel
Showing results for 
Search instead for 
Did you mean: 

pool.activenodes() is reliable?

SOLVED
fahimbfarook
Occasional Contributor

pool.activenodes() is reliable?

Hi, I'm using pool.activenodes() in my traffic script but for handling maintenance pages. Here;s my code:


sub handleMaintenancePage( $pool, $page ) {


   if( pool.activeNodes( $pool ) < 1 ) {


         log.warn( "No active nodes for pool ". $pool );       


         $contents = resource.get( $page );


         http.sendResponse( "200 OK",  "text/html",  $contents, "Pragma: no-cache\r\nCache-Control:no-cache\r\nExpires:0" );


     }


}


When I brought down a node pool.activenodes() behaves correctly and and decrements the node count for the pool. But when the same node is brought up back, pool.activenodes() does not increment its count. Also I have set up HTTP monitors which work correctly and identify that the node above is active again. Appreciate if you could help here to understand  pool.activenodes() behaviour.

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
markbod
Contributor

Re: pool.activenodes() is reliable?

HI Fahim,

This has always been the case for monitors in SteelApp up to and including the current release (9.7).

With SteelApp you can add multiple monitors to pools, and for that reason it must be the same monitor which fails/recovers the node. For example; you wouldn't want a ping monitor recovering a web server after it had been failed by a Full HTTP monitor. We do cover this in SteelApp training course material, but unfortunately (it seems) not in the user manual.

I have raised a documentation bug to have the monitoring section expanded to include information about the node recovery process. I can only apologies for the inconvenience this has caused you.

Cheers,

Mark

View solution in original post

5 REPLIES 5
aclarke
Frequent Contributor

Re: pool.activenodes() is reliable?

Fahim,

I did a simple test with STM 9.6r1 as follows:


$pool_activenodes = pool.activeNodes("pool_web");


log.info("There are: " . $pool_activenodes . " available to service this request");


I then tested, failed a node, retested, returned the node to service, and tested again.

As you can see from the log files, the pool.activenodes() is working as it should:


INFO19/Aug/2014:16:32:06 +1000INFORule pool_activenodes, Virtual Server vs_web: There are: 4 available to service this requeststm001
INFO19/Aug/2014:16:32:06 +1000INFORule pool_activenodes, Virtual Server vs_web: There are: 4 available to service this requeststm001
INFO19/Aug/2014:16:32:01 +1000INFOPool pool_web, Node 1.2.3.104:80: Node 1.2.3.104 is working againstm001
INFO19/Aug/2014:16:32:00 +1000INFOMonitor Simple HTTP: Monitor is working for node '1.2.3.104:80'.stm001
SERIOUS19/Aug/2014:16:31:47 +1000SERIOUSPool pool_web, Node 1.2.3.104:80: Node 1.2.3.104 has failed - A monitor has detected a failurestm001
WARN19/Aug/2014:16:31:46 +1000WARNMonitor Simple HTTP: Monitor has detected a failure in node '1.2.3.104:80': Timeout while waiting for valid server responsestm001
INFO19/Aug/2014:16:31:45 +1000INFORule pool_activenodes, Virtual Server vs_web: There are: 3 available to service this requeststm001
INFO19/Aug/2014:16:31:45 +1000INFORule pool_activenodes, Virtual Server vs_web: There are: 3 available to service this requeststm001
INFO19/Aug/2014:16:31:45 +1000INFORule pool_activenodes, Virtual Server vs_web: There are: 3 available to service this requeststm001
INFO19/Aug/2014:16:31:45 +1000INFORule pool_activenodes, Virtual Server vs_web: There are: 3 available to service this requeststm001
INFO19/Aug/2014:16:31:39 +1000INFORule pool_activenodes, Virtual Server vs_web: There are: 3 available to service this requeststm001
INFO19/Aug/2014:16:31:39 +1000INFORule pool_activenodes, Virtual Server vs_web: There are: 3 available to service this requeststm001
INFO19/Aug/2014:16:31:39 +1000INFORule pool_activenodes, Virtual Server vs_web: There are: 3 available to service this requeststm001
SERIOUS19/Aug/2014:16:31:39 +1000SERIOUSPool pool_web, Node 1.2.3.104:80: Node 1.2.3.104 has failed - Timeout while establishing connection (the machine may be down, or the network congested; increasing the 'max_connect_time' on the pool's 'Connection Management' page may help)stm001
INFO19/Aug/2014:16:31:35 +1000INFORule pool_activenodes, Virtual Server vs_web: There are: 4 available to service this requeststm001

Can I get you to modify your TS as follows (I added line #2 below) and see what the output is when you test:


sub handleMaintenancePage( $pool, $page ) { 


  log.info("There are: ". pool.activeNodes($pool) . " nodes available in the ". $pool . " pool."); # <-- Add this line here.


  if( pool.activeNodes( $pool ) < 1 ) { 


  log.warn( "No active nodes for pool ". $pool );         


  $contents = resource.get( $page ); 


  http.sendResponse( "200 OK",  "text/html",  $contents, "Pragma: no-cache\r\nCache-Control:no-cache\r\nExpires:0" ); 


  } 



--
Aidan Clarke
Pulse Secure vADC Product Manager
fahimbfarook
Occasional Contributor

Re: pool.activenodes() is reliable?

Thanks Aidan.

Im using ZXTm v 9.4 and I have a Full HTTP monitor and Passive monitoring turned on. I did some testing and found that when a node is marked as 'failed' by PASSIVE monitoring, it cannot be marked as active by an active monitor (i.e. Full HTTP monitor) - once the same node is active again.

Here's my test results.

  1. Removed Full Http (AM) monitor from pool. + Passive monitoring is ON
  2. Stopped the web server (i.e. only node in the pool)
  3. Sent a request to my app
  4. Passive monitor marked the node as failed.

              (Pool has no back-end nodes responding)

              (There are: 0 nodes available in the pool)

  1. Maintenance page shown
  2. Started the web server node
  3. Sent a request to my app
  4. Maintenance page shown - i.e. because I have -  if( pool.activeNodes( $pool ) < 1 ) {  show maintenance page }

               There are: 0 nodes available in the pool

               No active nodes for pool

  1. Added the Full Http (AM) monitor to the pool.
  2. Full HTTP monitor detected that the node is working

                Pool now has working nodes

  1. Sent a request to my app
  2. Still maintenance page shown

               There are: 0 nodes available in the pool

               No active nodes for pool

  1. Disabled passive monitoring for the pool.
  2. Sent a request to my app - and the home page is shown.

               Node XYZ: Node XYZ is working again

So in summary when a node is marked as 'failed' by PM and then marked as 'active' by AM, pool.activenodes() still considers it as failed. Im not sure whether its as issue with this perticular version of ZXTM. Aidan, could you please help here.

Thanks.

markbod
Contributor

Re: pool.activenodes() is reliable?

Hi Fahim,


Are you sure SteelApp shows the node has recovered? I suspect that it is the passive monitor which marked the node as down, and so it is the passive monitor which must recover it. The problem is that all traffic is being responded to by the TrafficScript rule and unless a request makes it back to the server it will never recover.

I would recommend trying one of the following approaches (in no particular order):

  • Allow requests from the local network to bypass the rule, then use event.emit() to launch an asynchronous script which makes a request to the index page of the site.
  • Instead of responding in the request rule use "request.setMaxReplyTime(1)" to set a 1 second reply time on the pool. Then allow the request to the backend. The short reply time will timeout the node quickly and you can then either use a response rule to deliver your maintenance page or set an error file on the virtual server.
  • Modify your rule to allow a percentage of the requests through to the back end.
  • Use a small 1x1 blank gif on the maintenance page and exclude that from the maintenance page rule. Optionally set a short timeout on this request.

If we went for the last option of using a blank.gif to bypass the rule, it might look like this:


sub handleMaintenancePage( $pool, $page ) { 


   if( pool.activeNodes( $pool ) < 1 ) { 


      log.warn( "No active nodes for pool ". $pool );


      if ( http.getPath() == "/blank.gif" ) {


         request.setMaxReplytime(1);


         break;


      }     


      contents = resource.get( $page );   


      http.sendResponse( "200 OK""text/html"$contents, "Pragma: no-cache\r\nCache-Control:no-cache\r\nExpires:0" ); 


  


}


If you are always delivering the same maintenance page for this VirtualServer, then you could set the resource as an error page under "Connection Management" for the Virtual Server, and then it will be used automatically. No TrafficScript required.

Cheers,

Mark

fahimbfarook
Occasional Contributor

Re: pool.activenodes() is reliable?

Thanks Mark. Unfortunately I didn't find


I suspect that it is the passive monitor which marked the node as down, and so it is the passive monitor which must recover it.


in the manuals or traffic script guide. Is that true with latest versions of TM or its only for 9.4?

markbod
Contributor

Re: pool.activenodes() is reliable?

HI Fahim,

This has always been the case for monitors in SteelApp up to and including the current release (9.7).

With SteelApp you can add multiple monitors to pools, and for that reason it must be the same monitor which fails/recovers the node. For example; you wouldn't want a ping monitor recovering a web server after it had been failed by a Full HTTP monitor. We do cover this in SteelApp training course material, but unfortunately (it seems) not in the user manual.

I have raised a documentation bug to have the monitoring section expanded to include information about the node recovery process. I can only apologies for the inconvenience this has caused you.

Cheers,

Mark