What is Load Balancing?
Load Balancing is one of the many capabilities of Traffic Manager:
Load Balancing is one of the many capabilities of Traffic Manager
Load Balancing distributes network traffic across a ‘pool’ of servers (‘nodes’), selecting the most appropriate server for each individual request based on the current load balancing policy, session persistence considerations, node priorities and cache optimization hints. Under certain circumstances, if a request fails to elicit a response from a server, the request may be tried against multiple nodes until a successful response is received.
Load Balancing meets several primary goals:
Scalability: The capability to transparently increase or decrease the capacity of a service (by adding or removing nodes) without changing the public access point (IP address, domain name or URL) for the service;
Availability: The capability to route traffic to working nodes and avoid nodes that have failed or are under-performing; is the ability of a site to remain available and accessible even during the failure of one or more systems.
Manageability: By abstracting the server infrastructure from the end user, load balancing makes it easy to remove nodes for maintenance (software or hardware upgrades or scheduled reboots) without interrupting the user experience.
Load Balancing also addresses performance optimization, supporting Traffic Manager's ability to deliver the best possible service level from your server infrastructure.
How does load balancing work?
For each request, Traffic Manager will select a ‘pool’ of servers to handle that request. A pool represents a collection of servers (‘nodes’) that each performs the same function (such as hosting a web applications). The pool will specify a load-balancing algorithm that determines which of the nodes in that pool should be selected to service that request.
In some cases, Traffic Manager will then make a new connection to the selected node and forward the request across that connection. In the case of HTTP, Traffic Manager maintains a collection of idle ‘keepalive’ connections to the nodes, and will use one of these established connections in favor of creating a new connection. This reduces latency and reduces the connection-handling overhead on each server node.
What Load Balancing methods are available?
Traffic Manager offers multiple different load balancing algorithms:
Round Robin and Weighted Round Robin: With these simple algorithms, the traffic manager cycles through the list of server nodes, picking the next one in turn for each request that it load-balances. If nodes are assigned specific weights, then they are selected more or less frequently in proportion to their weights.
Random: The traffic manager selects a node from the pool at random each time it performs a load-balancing decision.
Least Connections and Weighted Least Connections: The traffic manager maintains a count of the number of ongoing transactions against each node. On each load balancing decision, the traffic manager selects the node with the fewest ongoing connections. Weights may be applied to each node to indicate that the node is capable of handing more or fewer concurrent transactions than its peers.
Fastest Response Time: The traffic manager maintains a rolling average of the response time of each node. When it makes a load-balancing decision, it selects the node with the lowest average response time.
Perceptive: The Perceptive method addresses undesired behaviors from the least connections and fastest algorithms, blending the information to predict the optimal node based on past performance and current load.
A TrafficScript rule can override the load balancing decision, using either the ‘named node’ session persistence method to specify which node in the pool should be used, or by using the ‘forward proxy’ capability to completely ignore the list of nodes in the pool and explicitly specify the target node (IP address and port) for the request.
What factors influence the load balancing decision?
Other than the method chosen and the weights, a number of other factors influence the load-balancing decision:
Health Monitoring: Traffic Manager monitors the health and correct operation of each node, using both synthetic transactions (built-in and user-defined) and passive monitoring of real transactions. If a node consistently fails to meet the health and operation parameters, Traffic Manager will temporarily remove it from future load-balancing decisions until health checks indicate that it is operating correctly again.
Session Persistence: Session Persistence policies override the load balancing decision and may be used easily to pin transactions within the same session to the same server node. This behavior is mandatory for stateful HTTP applications, and is useful for HTTP applications that share state but gain performance improvements if local state caches are used effectively.
Locality-aware Request Distribution (LARD): LARD is automatically used to influence the least-connections, fastest-response-time and perceptive load balancing decisions for HTTP traffic. If the metrics used for load-balancing decisions are finely-balanced (for example, several nodes have very similar response times or current connection counts), then Traffic Manager will also consider the specific URL being requested and will favor nodes that have served that URL recently. These nodes are more likely to have the requested content in memory or in cache, and are likely to be able to respond more quickly than nodes that have not serviced that request recently.
Past History: The perceptive algorithm builds a past history of node performance and uses this in its load balancing decision. If a new node is introduced into the cluster, or a failed node recovers, no history exists for that node. The Perceptive Algorithm performs a ‘gradual start’ of that node, slowly ramping up the amount of traffic to that node until its performance stabilizes. The ‘gradual restart’ avoids the problem that a node with unknown performance is immediately overloaded with more traffic than it can cope with, and the duration of the ramp up of the traffic adapts to how quickly and reliably the node responds.
What is connection draining?
To assist administrators who need to take a node out of service, Traffic Manager provides a ‘connection draining’ capability. If a node is marked as ‘draining’, Stingray will not consider it during the load balancing decision and no new connections will be made to that node. Existing connections can run to completion, and established, idle HTTP connections will be shut down.
However, session persistence classes override load balancing decisions. If any sessions have been established to the draining node, then requests in that session will use the node. There is no automatic way to determine when a client session has competed, but Traffic Manager provides a ‘most recently used’ report than indicates when a node was last used. For example, if you are prepared to time sessions out after 20 minutes, then you can safely remove the node from the pool once the ‘most recently used’ measure exceeds 20 minutes.
Administrators may also mark nodes as ‘disabled’. This has the same effect as ‘draining’, except that existing sessions are not honored and health-monitors are not invoked against ‘disabled’ nodes. Once a node is ‘disabled’, it can be safely shut down and reintroduced later.
What Load Balancing method is best?
Least Connections is generally the best load-balancing algorithm for homogeneous traffic, where every request puts the same load on the back-end server and where every back-end server is the same performance. The majority of HTTP services fall into this situation. Even if some requests generate more load than others (for example, a database lookup compared to an image retrieval), the ‘least connections’ method will evenly distribute requests across the machines and if there are sufficient requests of each type, the load will be very effectively shared. However, Least Connections is not appropriate when infrequent high-load requests cause significant slowdowns.
The Fastest Response Time algorithm will send requests to the server that is performing best (responding most quickly), but it is a reactive algorithm (it only notices slowdowns after the event) so it can often overload a fast server and create a choppy performance profile.
Perceptive is designed to take the best features of both ‘Least Connections’ and ‘Fastest Response’. It adapts according to the nature of the traffic and the performance of the servers; it will lean towards 'least connections' when traffic is homogeneous, and 'fastest response time' when the loads are very variable. It uses a combination of the number of current connections and recent response times to trend and predict the performance of each server. Under this algorithm, traffic is introduced to a new server (or a server that has returned from a failed state) gently, and is progressively ramped up to full operability. When a new server is added to a pool, the algorithm tries it with a single request, and if it receives a reply, it gradually increases the number of requests it sends the new server until it is receiving the same proportion of the load as other equivalent nodes in the pool. This ramping is done in an adaptive way, dependent on the responsiveness of the server. So, for example, a new web server serving a small quantity of static content will very quickly be ramped up to full speed, whereas a Java application server that compiles JSPs the first time they are used (and so is slow to respond to begin with) will be ramped up more slowly.
Least Connections is simpler and more deterministic than ‘Perceptive’, so should be used in preference when possible.
When are requests retried?
Traffic Manager monitors the response from each node when it forwards a request to it. Timeouts quickly detect failures of various types, and simple checks on the response body detect server failures.
Under certain, controlled circumstances, Traffic Manager will retry the request against another node in the pool. Traffic Manager will only retry requests that are judged to be ‘idempotent’ (based on guidelines in the HTTP specification – this includes requests that use GET and HEAD methods), or requests that failed completely against the server (no request data was written before the failure was detected). This goes a long way to avoiding undesired side effects, such as processing a financial transactions twice.
In rare cases, the guidelines may not apply. A administrator can easily indicate that all requests processed by a virtual server are non-idempotent (so should never be retried), or can selectively specify the status of each request to override the default decision.
Detecting and retrying when an application generates an error
Traffic Manager rules can also force requests to be retried. For example, a response rule might inspect a response, judge that it is not appropriate, and then instruct the traffic manager to re-try the request against a different node: Hiding Application Errors
Rules can also transparently prompt a client device to retry a request with a different URL (for example). For example, a rule could detect 404 Not Found errors and prompt the client to try requesting the parent URL, working up the URL hierarchy until the client receives a valid response or cannot proceed any further (i.e. past the root page at ‘/’): No more 404 Not Found...?
Global Load Balancing
Traffic Manager also provides a ‘Global Server Load Balancing’ capability that manages DNS lookups to load-balance users across multiple datacenters. This capability functions in a different fashion to the server load balancing described in this brief.
ADCs today provide much more granular control over all areas that affect application performance. The ability to deliver advanced layer 7 services and enhanced application performance with ADCs is based on the foundation of a basic load balancing technology.
Traffic Manager (vTM) is full software and virtual ADC that has been designed as a full-proxy, layer 7 load balancer. Traffic Manager's load balancing fabric enables applications to be delivered from any combination of physical, virtual or cloud-based datacenters.
View full article