This document describes performance-related tuning you may wish to apply to a production Stingray Traffic Manager software, virtual appliance or cloud instance. For related documents (e.g. operating system tuning), start with the Tuning Pulse Virtual Traffic Manager article.
Traffic Manager will auto-size the majority of internal tables based on available memory, CPU cores and operating system configuration. The default behavior is appropriate for typical deployments and it is rarely necessary to tune it.
Several changes can be made to the default configuration to improve peak capacity if necessary. Collectively, they may give a 5-20% capacity increase, depending on the specific test.
Global settings are defined in the ‘System’ part of the configuration.
Most Virtual Server settings relating to performance tuning are to be found in the Connection Management section of the configuration.
General Global Settings:
Maximum File Descriptors (maxfds): File Descriptors are the basic operating system resource that Traffic Manager consumes. Typically, Traffic Manager will require two file descriptors per active connection (client and server side) and one file descriptor for each idle keepalive connection and for each client connection that is pending or completing.
Traffic Manager will attempt to bypass any soft per-process limits (e.g. those defined by ulimit) and gain the maximum number of file descriptors (per child process). There are no performance impacts, and minimal memory impact to doing this. You can tune the maximum number of file descriptors in the OS using fs.file-max
The default value of 1048576 should be sufficient. Traffic Manager will warn if it is running out of file descriptors, and will proactively close idle keepalives and slow down the rate at which new connections are accepted.
Listen queue size (listen_queue_size): this should be left to the default system value, and tuned using somaxconn (see above)
Number of child processes (num_children): this is auto-sized to the number of cores in the host system. You can force the number of child processes to a particular number (for example, when running Traffic Manager on a shared server) using the tunable ‘num_children’ which should be added manually to the global.cfg configuration file.
The default accept behavior is tuned so that child processes greedily accept connections as quickly as possible. With very large numbers of child processes, if you see uneven CPU usage, you may need to tune the multiple_accept, max_accepting and accepting_delay values in the Global Settings to limit the rate at which child processes take work.
The Global Settings values so_rbuff_size and so_wbuff_size are used to tune the size of the operating system (kernel-space) read and write buffers, as restricted by the operating system limits /proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max.
These buffer sizes determine how much network data the kernel will buffer before refusing additional data (from the client in the case of the read buffer, and from the application in the case of the write buffer). If these values are increased, kernel memory usage per socket will increase.
In normal operation, Traffic Manager will move data from the kernel buffers to its user-space buffers sufficiently quickly that the kernel buffers do not fill up. You may want to increase these buffer sizes when running under connection high load on a fast network.
The Virtual Server settings max_client_buffer and max_server_buffer define the size of the Traffic Manager (user-space) read and write buffers, used when Traffic Manager is streaming data between the client and the server. The buffers are temporary stores for the data read from the network buffers. Larger values will increase memory usage per connection, to the benefit of more efficient flow control; this will improve performance for clients or servers accessing over high-latency networks.
The value chunk_size controls how much data Traffic Manager reads and writes from the network buffers when processing traffic, and internal application buffers are allocated in units of chunk_size. To limit fragmentation and assist scalability, the default value is quite low (4096 bytes); if you have plenty of free memory, consider setting it to 8192 or 16384. Doing so will increase Traffic Manager's memory footprint but may reduce the number of system calls, slightly reducing CPU usage (system time).
You may wish to tune the buffer size parameters if you are handling very large file transfers or video downloads over congested networks, and the chunk_size parameter if you have large amounts of free memory that is not reserved for caching and other purposes.
Some modern ciphers such as TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 are faster than older ciphers in Traffic Manager.
SSL uses a private/public key pair during the initial client handshake. 1024-bit keys are approximately 5 times faster than 2048-bit keys (due to the computational complexity of the key operation), and are sufficiently secure for applications that require a moderate degree of protection.
SSL sessions are cached locally, and shared between all traffic manager child processes using a fixed-size (allocated at start-up) cache. On a busy site, you should check the size, age and miss-rate of the SSL Session ID cache (using the Activity monitor) and increase the size of the cache (ssl!cache!size) if there is a significant number of cache misses.
Timeouts are the key tool to controlling client-initiated connections to the traffic manager:
If you suspect that connections are dropped prematurely due to timeouts, you can temporarily enable the Virtual Server setting log!client_connection_failures to record the details of dropped client connections.
When processing HTTP traffic, Traffic Manager uses a pool of Keep-Alive connections to reuse TCP connections and reduce the rate at which TCP connections must be established and torn down. If you use a webserver with a fixed concurrency limit (for example, Apache with its MaxClients and ServerLimit settings ), then you should tune the connection limits carefully to avoid overloading the webserver and creating TCP connections that it cannot service.
Pool: max_connections_pernode: This setting limits the total number of TCP connections that this pool will make to each node; keepalive connections are included in that count. Traffic Manager will queue excess requests and schedule them to the next available server. The current count of established connections to a node is shared by all Traffic Manager processes.
Pool: max_idle_connections_pernode: When an HTTP request to a node completes, Traffic Manager will generally hold the TCP connection open and reuse it for a subsequent HTTP request (as a KeepAlive connection), avoiding the overhead of tearing down and setting up new TCP connections. In general, you should set this to the same value as max_connections_pernode, ensuring that neither setting exceeds the concurrency limit of the webserver.
Global Setting: max_idle_connections: Use this setting to fine-tune the total number of keepalive connections Traffic Manager will maintain to each node. The idle_connection_timeout setting controls how quickly keepalive connections are closed.You should only consider limiting the two max_idle_connections settings if you have a very large number of webservers that can sustain very high degrees of concurrency, and you find that the traffic manager routinely maintains too many idle keepalive connections as a result of very uneven traffic.
When running with very slow servers, or when connections to servers have a high latency or packet loss, it may be necessary to increase the Pool timeouts:
When streaming data between server and client, the general-purpose Virtual Server ‘timeout’ setting will apply. If the client connection times out or is closed for any other reason, the server connection is immediately discarded.
If you suspect that connections are dropped prematurely due to timeouts, you can enable the Virtual Server setting log!server_connection_failures to record the details of dropped server connections.
You should disable “Nagle’s Algorithm” for traffic to the backend servers, unless you are operating in an environment where the servers have been explicitly configured not to use delayed acknowledgements. Set the node_so_nagle setting to ‘off’ in the Pool Connection Management configuration.
If you notice significant delays when communicating with the back-end servers, Nagle’s Algorithm is a likely candidate.
Ensure that you disable or de-configure any Traffic Manager features that you do not need to use, such as health monitors, session persistence, TrafficScript rules, logging and activity monitors. Disable debug logging in service protection classes, autoscaling settings, health monitors, actions (used by the eventing system) and GLB services.
For more information, start with the Tuning Pulse Virtual Traffic Manager article.