cancel
Showing results for 
Search instead for 
Did you mean: 

Tuning the Linux operating system for Stingray Traffic Manager

This document describes some operating system tunables you may wish to apply to a production Stingray Traffic Manager instance.  Note that the kernel tunables only apply to Stingray Traffic Manager software installed on a customer-provided Linux instance; it does not apply to the Stingray Traffic Manager Virtual Appliance or Cloud instances.

Consider the tuning techniques in this document when:

  • Running Stingray on a severely-constrained hardware platform, or where Stingray should not seek to use all available resources;
  • Running in a performance-critical environment;
  • The Stingray host appears to be overloaded (excessive CPU or memory usage);
  • Running with very specific traffic types, for example, large video downloads or heavy use of UDP;
  • Any time you see unexpected errors in the Stingray event log or the operating system syslog that relate to resource starvation, dropped connections or performance problems

For more information on performance tuning, start with the Tuning Stingray Traffic Manager article.

Basic Kernel and Operating System tuning

Most modern Linux distributions have sufficiently large defaults and many tables are autosized and growable, so it is often not be necessary to change tunings.  The values below are recommended for typical deployments on a medium-to-large server (8 cores, 4 GB RAM).

Note: Tech tip: How to apply kernel tunings on Linux

File descriptors

# echo 2097152 > /proc/sys/fs/file-max

Set a minimum of one million file descriptors unless resources are seriously constrained.  See also the Stingray setting maxfds below.

Ephemeral port range

# echo "1024 65535" > /proc/sys/net/ipv4/ip_local_port_range

# echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

Each TCP and UDP connection from Stingray to a back-end server consumes an ephemeral port, and that port is retained for the ‘fin_timeout’ period once the connection is closed.  If back-end connections are frequently created and closed, it’s possible to exhaust the supply of ephemeral ports. Increase the port range to the maximum (as above) and reduce the fin_timeout to 30 seconds if necessary.

SYN Cookies

# echo 1 > /proc/sys/net/ipv4/tcp_syncookies

SYN cookies should be enabled on a production system.  The Linux kernel will process connections normally until the backlog grows , at which point it will use SYN cookies rather than storing local state.  SYN Cookies are an effective protection against syn floods, one of the most common DoS attacks against a server.

If you are seeking a stable test configuration as a basis for other tuning, you should disable SYN cookies. Increase the size of net/ipv4/tcp_max_syn_backlog if you encounter dropped connection attempts.

Request backlog

# echo 1024 > /proc/sys/net/core/somaxconn

The request backlog contains TCP connections that are established (the 3-way handshake is complete) but have not been accepted by the listening socket (Stingray).  See also the Stingray tunable ‘listen_queue_size’.  Restart the Stingray software after changing this value.

If the listen queue fills up because the Stingray does not accept connections sufficiently quickly, the kernel will quietly ignore additional connection attempts.  Clients will then back off (they assume packet loss has occurred) before retrying the connection.

Advanced kernel and operating system tuning

In general, it’s rarely necessary to further tune Linux kernel internals because the default values that are selected on a normal-to-high-memory system are sufficient for the vast majority of Stingray deployments, and most kernel tables will automatically resize if necessary.  Any problems will be reported in the kernel logs; dmesg is the quickest and most reliable way to check the logs on a live system.

Packet queues

In 10 GbE environments, you should consider increasing the size of the input queue:

# echo 5000 > net.core.netdev_max_backlog

TCP TIME_WAIT tuning

TCP connections reside in the TIME_WAIT state in the kernel once they are closed.  TIME_WAIT allows the server to time-out connections it has closed in a clean fashion.

If you see the error “TCP: time wait bucket table overflow”, consider increasing the size of the table used to store TIME_WAIT connections:

# echo 7200000 > /proc/sys/net/ipv4/tcp_max_tw_buckets

TCP slow start and window sizes

In earlier Linux kernels (pre-2.6.39), the initial TCP window size was very small.  The impact of a small initial window size is that peers communicating over a high-latency network will take a long time (several seconds or more) to scale the window to utilize the full bandwidth available – often the connection will complete (albeit slowly) before an efficient window size has been negotiated.

The 2.6.39 kernel increases the default initial window size from 2 to 10.  If necessary, you can tune it manually:


# ip route change default via 192.168.1.1 dev eth0 proto static initcwnd 10

If a TCP connection stalls, even briefly, the kernel may reduce the TCP window size significantly in an attempt to respond to congestion.  Many commentators have suggested that this behavior is not necessary, and this “slow start” behavior should be disabled:

# echo 0 > /proc/sys/net/ipv4/tcp_slow_start_after_idle

TCP options for Spirent load generators

If you are using older Spirent test kit, you may need to set the following tunables to work around optimizations in their TCP stack:

# echo 0 > /proc/sys/net/ipv4/tcp_timestamps

# echo 0 > /proc/sys/net/ipv4/tcp_window_scaling

[Note: See attachments for the above changes in an easy to run shell script] Aidan Clarke

irqbalance

Interrupts (IRQs) are wake-up calls to the CPU when new network traffic arrives. The CPU is interrupted and diverted to handle the new network data. Most NIC drivers will buffer interrupts and distribute them as efficiently as possible.  When running on a machine with multiple CPUs/cores, interrupts should be distributed across cores roughly evenly. Otherwise, one CPU can be the bottleneck in high network traffic.

The general-purpose approach in Linux is to deploy irqbalance , which is a standard package on most major Linux distributions.  Under extremely high interrupt load, you may see one or more ksoftirqd processes exhibiting high CPU usage.  In this case, you should configure your network driver to use multiple interrupt queues (if supported) and then manually map those queues to one or more CPUs using SMP affinity.

Receive-Side Scaling (RSS)

Modern network cards can maintain multiple receive queues. Packets within a particular TCP connection can be pinned to a single receive queue, and each queue has its own interrupt.  You can map interrupts to CPU cores to control which core each packet is delivered to. This affinity delivers better performance by distributing traffic evenly across cores and by improving connection locality (a TCP connection is processed by a single core, improving CPU affinity).

For optimal performance, you should:

  • Allow the Stingray software to auto-size itself to run one process per CPU core (two when using hyperthreading), i.e. do not modify the num_children configurable. 
  • Configure the network driver to create as many queues as you have cores, and verify the IRQs that the driver will raise per queue by checking /proc/interrupts.
  • Map each queue interrupt to one core using /proc/irq/<irq-number>/smp_affinity

The precise steps are specific to the network card and drivers you have selected.

This document from the Linux Kernel Source Tree gives a good overview, and you should refer to the technical documentation provided by your network card vendor.

[ Updated by Aidan Clarke to include a shell script to make it easier to deploy the changes above ]

[ Updated by Aidan Clarke to update the link from the old Google Code Page to the new repository in the Linux Kernel Source Tree after feedback of a outdated link from Rick Henderson ]

Version history
Revision #:
1 of 1
Last update:
‎02-21-2013 10:45:AM
Updated by:
 
Labels (1)
Comments
rhenderson_1

The Google Code article link is broken- the document may be found at http://lxr.free-electrons.com/source/Documentation/networking/scaling.txt

Thanks - I have updated the document to reflect the changes..