A complex topic that relates to the many techniques that Stingray uses to accelerate services, offload work so they function more efficiently and rate-limits excessive transactions to maintain acceptable levels of performance.
Yes. Stingray handles slow WAN-side connections very efficiently, and terminates them completely.
A separate TCP connection is established with the server, with TCP options chosen to optimize the local link between Stingray and the server, and (generaly) minimal packet loss, latency and jitter.
This places the server into the optimal benchmark-like environment.
Yes. For HTTP, Stingray carefully manages a pool of connections to each node in a pool. When a request to a pool completes, provided the server does not close the connection, we then keep the connection open (it is idle). For subsequent requests, Stingray prefers to reuse an idle connection rather than creating a new one.
Stingray holds a maximum of max_idle_connections_pernode connections in the idle state (so that we don’t tie up too many resources (e.g. threads or processes) on the server) up to a limit of max_idle_connections in total (don’t use too many resources on the traffic manager), and we will only ever open a total of max_connections_pernode connections simultaneously (default – no limit) so that if the server has a concurrency limit (e.g. mpm_common - Apache HTTP Server: maxclients) we won’t overload it.
If the incoming request rate cannot be serviced within the max_connections_pernode limit, requests are queued internally in the traffic manager and released when a concurrency slot becomes available.
Full request buffering, up to the memory limits defined in max_client_buffer and max_server_buffer. We override these limits if you read a request or response using a trafficscript rule.
The implication is that if the client is slow, then we:
The connection to the node only lasts for steps 5-9, i.e. is very quick. This lets the nodes process connections as quickly as possible, offloading the slow TCP connection on the WAN side; this is one aspect of the acceleration we deliver (putting the node in the optimal environment so that you can get benchmark performance from the node).
If we do not read the entire response, and it exceeds the max_server_buffer, then we will read as much as we can, write to the client and refill the buffer as fast as possible.
Finally, don't forget the potential to use caching on the Load Balancer / Traffic Manager to reduce the number of transactions the servers must handle.