Load Balancing is one of the many capabilities of Traffic Manager:
Load Balancing is one of the many capabilities of Traffic Manager
Load Balancing distributes network traffic across a ‘pool’ of servers (‘nodes’), selecting the most appropriate server for each individual request based on the current load balancing policy, session persistence considerations, node priorities and cache optimization hints. Under certain circumstances, if a request fails to elicit a response from a server, the request may be tried against multiple nodes until a successful response is received.
Load Balancing meets several primary goals:
Load Balancing also addresses performance optimization, supporting Traffic Manager's ability to deliver the best possible service level from your server infrastructure.
For each request, Traffic Manager will select a ‘pool’ of servers to handle that request. A pool represents a collection of servers (‘nodes’) that each performs the same function (such as hosting a web applications). The pool will specify a load-balancing algorithm that determines which of the nodes in that pool should be selected to service that request.
In some cases, Traffic Manager will then make a new connection to the selected node and forward the request across that connection. In the case of HTTP, Traffic Manager maintains a collection of idle ‘keepalive’ connections to the nodes, and will use one of these established connections in favor of creating a new connection. This reduces latency and reduces the connection-handling overhead on each server node.
Traffic Manager offers multiple different load balancing algorithms:
A TrafficScript rule can override the load balancing decision, using either the ‘named node’ session persistence method to specify which node in the pool should be used, or by using the ‘forward proxy’ capability to completely ignore the list of nodes in the pool and explicitly specify the target node (IP address and port) for the request.
Other than the method chosen and the weights, a number of other factors influence the load-balancing decision:
To assist administrators who need to take a node out of service, Traffic Manager provides a ‘connection draining’ capability. If a node is marked as ‘draining’, Stingray will not consider it during the load balancing decision and no new connections will be made to that node. Existing connections can run to completion, and established, idle HTTP connections will be shut down.
However, session persistence classes override load balancing decisions. If any sessions have been established to the draining node, then requests in that session will use the node. There is no automatic way to determine when a client session has competed, but Traffic Manager provides a ‘most recently used’ report than indicates when a node was last used. For example, if you are prepared to time sessions out after 20 minutes, then you can safely remove the node from the pool once the ‘most recently used’ measure exceeds 20 minutes.
Administrators may also mark nodes as ‘disabled’. This has the same effect as ‘draining’, except that existing sessions are not honored and health-monitors are not invoked against ‘disabled’ nodes. Once a node is ‘disabled’, it can be safely shut down and reintroduced later.
Least Connections is generally the best load-balancing algorithm for homogeneous traffic, where every request puts the same load on the back-end server and where every back-end server is the same performance. The majority of HTTP services fall into this situation. Even if some requests generate more load than others (for example, a database lookup compared to an image retrieval), the ‘least connections’ method will evenly distribute requests across the machines and if there are sufficient requests of each type, the load will be very effectively shared. However, Least Connections is not appropriate when infrequent high-load requests cause significant slowdowns.
The Fastest Response Time algorithm will send requests to the server that is performing best (responding most quickly), but it is a reactive algorithm (it only notices slowdowns after the event) so it can often overload a fast server and create a choppy performance profile.
Perceptive is designed to take the best features of both ‘Least Connections’ and ‘Fastest Response’. It adapts according to the nature of the traffic and the performance of the servers; it will lean towards 'least connections' when traffic is homogeneous, and 'fastest response time' when the loads are very variable. It uses a combination of the number of current connections and recent response times to trend and predict the performance of each server. Under this algorithm, traffic is introduced to a new server (or a server that has returned from a failed state) gently, and is progressively ramped up to full operability. When a new server is added to a pool, the algorithm tries it with a single request, and if it receives a reply, it gradually increases the number of requests it sends the new server until it is receiving the same proportion of the load as other equivalent nodes in the pool. This ramping is done in an adaptive way, dependent on the responsiveness of the server. So, for example, a new web server serving a small quantity of static content will very quickly be ramped up to full speed, whereas a Java application server that compiles JSPs the first time they are used (and so is slow to respond to begin with) will be ramped up more slowly.
Least Connections is simpler and more deterministic than ‘Perceptive’, so should be used in preference when possible.
Traffic Manager monitors the response from each node when it forwards a request to it. Timeouts quickly detect failures of various types, and simple checks on the response body detect server failures.
Under certain, controlled circumstances, Traffic Manager will retry the request against another node in the pool. Traffic Manager will only retry requests that are judged to be ‘idempotent’ (based on guidelines in the HTTP specification – this includes requests that use GET and HEAD methods), or requests that failed completely against the server (no request data was written before the failure was detected). This goes a long way to avoiding undesired side effects, such as processing a financial transactions twice.
In rare cases, the guidelines may not apply. A administrator can easily indicate that all requests processed by a virtual server are non-idempotent (so should never be retried), or can selectively specify the status of each request to override the default decision.
Detecting and retrying when an application generates an error
Traffic Manager rules can also force requests to be retried. For example, a response rule might inspect a response, judge that it is not appropriate, and then instruct the traffic manager to re-try the request against a different node: Hiding Application Errors
Rules can also transparently prompt a client device to retry a request with a different URL (for example). For example, a rule could detect 404 Not Found errors and prompt the client to try requesting the parent URL, working up the URL hierarchy until the client receives a valid response or cannot proceed any further (i.e. past the root page at ‘/’): No more 404 Not Found...?
Traffic Manager also provides a ‘Global Server Load Balancing’ capability that manages DNS lookups to load-balance users across multiple datacenters. This capability functions in a different fashion to the server load balancing described in this brief.
ADCs today provide much more granular control over all areas that affect application performance. The ability to deliver advanced layer 7 services and enhanced application performance with ADCs is based on the foundation of a basic load balancing technology.
Traffic Manager (vTM) is full software and virtual ADC that has been designed as a full-proxy, layer 7 load balancer. Traffic Manager's load balancing fabric enables applications to be delivered from any combination of physical, virtual or cloud-based datacenters.