cancel
Showing results for 
Search instead for 
Did you mean: 

How to customize load balancing gracefully

SOLVED
sameh
Contributor

How to customize load balancing gracefully

Hello,

I want to use STM to load balance a pool of caching servers.

Because they are HTTP caching servers and I want to get the more bang for my buck, I need LARD (locality-aware request distribution) to be heavily used when balancing requests.

But since the caching application also relies on the locality of other assets, LARD cannot be effective if reading the full URI (there is a signature in the middle of the URI).

So, I need to implement the balancing by myself.

I don't want to use Session Persistence because I am scared by sessions shared across the STM cluster. (Shall I? Maybe acceptable if I use connection.setPersistenceKey() and my hash function?)

So what I did is write a simple Consistent Hashing implementation in TrafficScript, and then use it to balance requests according to their computed hash.

This way I get the LARD effect and redistribution would only affect the keys previously hosted by a failed cache server.

This is working well until a cache server fails: I use pool.use()/pool.select() to send the request to a specific server. If that server is down, STM will reply with a "HTTP/1.1 500 Internal Server Error/Service Unavailable" message.

This is fixed only when the CHash ring is recomputed without the failed server.

Is there a better way to handle this scenario? Basically I would like STM to either use the failure pool from the specified pool, or just fallback to the active nodes in the pool.

Even better, if there is a way to do a similar setup without the hassle of doing my own load balancing...

1 ACCEPTED SOLUTION

Accepted Solutions
aclarke
Frequent Contributor

Re: How to customize load balancing gracefully

Sameh Ghane are you able to post the TS you are using? It is always good to share this kid of thing so that others can borrow, learn and contribute to the community!

So pool.use() and pool.select() do not utilise the failure pool configuration in the config. If you are health checking your proxy servers, you can check for active healthy nodes before you select the pool to use.

Are they actually failing health checks when they fail? If they are being health checked properly, you should check to make sure there are at least >1 nodes available using pool.activenodes() against the pool before you use a pool.use() or pool.select() (remembering that pool.use() stops processing immediately and the TS exits - if you need more TS to follow on, use pool.select().


# Healthy Pool is "test_pool_103"


# Unhealthy Pool is "test_pool_106"



# This will result in an STM error being sent as "test_pool_106" is down:


pool.select("test_pool_106");



# So long as you are health checking your pool members,


# we can test to ensure the pool has available nodes before we use it:


if( pool.activeNodes( "test_pool_106" ) < 1 ) {


  log.warn( "Pool 106 is down, using the failure pool" );


  pool.select("test_failpool");


}



Additionally, with Stingray Traffic Manager 9.6, we introduced functions to let you list the nodes are are available, failed, draining or disabled:

  • pool.listallnodes()
  • pool.listdisablednodes()
  • pool.listdrainingnodes()
  • pool.listfailednodes()

With these functions, you can check if the particular node you want is up first, before you make a decision to use it with pool.use() or pool.select().

Does this help?

A.

--
Aidan Clarke
Pulse Secure vADC Product Manager

View solution in original post

2 REPLIES 2
aclarke
Frequent Contributor

Re: How to customize load balancing gracefully

Sameh Ghane are you able to post the TS you are using? It is always good to share this kid of thing so that others can borrow, learn and contribute to the community!

So pool.use() and pool.select() do not utilise the failure pool configuration in the config. If you are health checking your proxy servers, you can check for active healthy nodes before you select the pool to use.

Are they actually failing health checks when they fail? If they are being health checked properly, you should check to make sure there are at least >1 nodes available using pool.activenodes() against the pool before you use a pool.use() or pool.select() (remembering that pool.use() stops processing immediately and the TS exits - if you need more TS to follow on, use pool.select().


# Healthy Pool is "test_pool_103"


# Unhealthy Pool is "test_pool_106"



# This will result in an STM error being sent as "test_pool_106" is down:


pool.select("test_pool_106");



# So long as you are health checking your pool members,


# we can test to ensure the pool has available nodes before we use it:


if( pool.activeNodes( "test_pool_106" ) < 1 ) {


  log.warn( "Pool 106 is down, using the failure pool" );


  pool.select("test_failpool");


}



Additionally, with Stingray Traffic Manager 9.6, we introduced functions to let you list the nodes are are available, failed, draining or disabled:

  • pool.listallnodes()
  • pool.listdisablednodes()
  • pool.listdrainingnodes()
  • pool.listfailednodes()

With these functions, you can check if the particular node you want is up first, before you make a decision to use it with pool.use() or pool.select().

Does this help?

A.

--
Aidan Clarke
Pulse Secure vADC Product Manager
sameh
Contributor

Re: How to customize load balancing gracefully

Hello Aidan,

Thank you for your insights!

If there is no nicer way I will rely on checking the status of a node. Ideally I wanted something safer since it means requests to a failed node will fail until the node is marked as failed.


When I build the consistent hash ring I make sure to only include pool.listactivenodes(). I can either refresh the ring more often or only pick nodes from the ring that are considered active.

My code is shameful and I'm not even sure it works as it should, but I might share it if it does well

Cheers