cancel
Showing results for 
Search instead for 
Did you mean: 

Tuning the Linux operating system for Traffic Manager

This document describes some operating system tunables you may wish to apply to a production Traffic Manager instance.  Note that the kernel tunables only apply to Traffic Manager software installed on a customer-provided Linux instance; it does not apply to the Traffic Manager Virtual Appliance or Cloud instances. Consider the tuning techniques in this document when:

  • Running Traffic Manager on a severely-constrained hardware platform, or where Traffic Manager should not seek to use all available resources;
  • Running in a performance-critical environment;
  • The Traffic Manager host appears to be overloaded (excessive CPU or memory usage);
  • Running with very specific traffic types, for example, large video downloads or heavy use of UDP;
  • Any time you see unexpected errors in the Traffic Manager event log or the operating system syslog that relate to resource starvation, dropped connections or performance problems

For more information on performance tuning, start with the Tuning Pulse Virtual Traffic Manager article.

Basic Kernel and Operating System tuning

 

Most modern Linux distributions have sufficiently large defaults and many tables are autosized and growable, so it is often not be necessary to change tunings.  The values below are recommended for typical deployments on a medium-to-large server (8 cores, 4 GB RAM).

Note: Tech tip: How to apply kernel tunings on Linux

File descriptors

# echo 2097152 > /proc/sys/fs/file-max

 

Set a minimum of one million file descriptors unless resources are seriously constrained.  See also the setting maxfds below.

Ephemeral port range

# echo "1024 65535" > /proc/sys/net/ipv4/ip_local_port_range

# echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

Each TCP and UDP connection from Traffic Manager to a back-end server consumes an ephemeral port, and that port is retained for the ‘fin_timeout’ period once the connection is closed.  If back-end connections are frequently created and closed, it’s possible to exhaust the supply of ephemeral ports. Increase the port range to the maximum (as above) and reduce the fin_timeout to 30 seconds if necessary.

SYN Cookies

# echo 1 > /proc/sys/net/ipv4/tcp_syncookies

SYN cookies should be enabled on a production system.  The Linux kernel will process connections normally until the backlog grows , at which point it will use SYN cookies rather than storing local state.  SYN Cookies are an effective protection against syn floods, one of the most common DoS attacks against a server.

If you are seeking a stable test configuration as a basis for other tuning, you should disable SYN cookies. Increase the size of net/ipv4/tcp_max_syn_backlog if you encounter dropped connection attempts.

Request backlog

# echo 1024 > /proc/sys/net/core/somaxconn

The request backlog contains TCP connections that are established (the 3-way handshake is complete) but have not been accepted by the listening socket (on Traffic Manager).  See also the tunable parameter ‘listen_queue_size’.  Restart the Traffic Manager software after changing this value.

If the listen queue fills up because the Traffic Manager does not accept connections sufficiently quickly, the kernel will quietly ignore additional connection attempts.  Clients will then back off (they assume packet loss has occurred) before retrying the connection.

Advanced kernel and operating system tuning

In general, it’s rarely necessary to further tune Linux kernel internals because the default values that are selected on a normal-to-high-memory system are sufficient for the vast majority of deployments, and most kernel tables will automatically resize if necessary.  Any problems will be reported in the kernel logs; dmesg is the quickest and most reliable way to check the logs on a live system.

Packet queues

In 10 GbE environments, you should consider increasing the size of the input queue:

# echo 5000 > net.core.netdev_max_backlog

TCP TIME_WAIT tuning

TCP connections reside in the TIME_WAIT state in the kernel once they are closed.  TIME_WAIT allows the server to time-out connections it has closed in a clean fashion.

If you see the error “TCP: time wait bucket table overflow”, consider increasing the size of the table used to store TIME_WAIT connections:

# echo 7200000 > /proc/sys/net/ipv4/tcp_max_tw_buckets

TCP slow start and window sizes

In earlier Linux kernels (pre-2.6.39), the initial TCP window size was very small.  The impact of a small initial window size is that peers communicating over a high-latency network will take a long time (several seconds or more) to scale the window to utilize the full bandwidth available – often the connection will complete (albeit slowly) before an efficient window size has been negotiated.

The 2.6.39 kernel increases the default initial window size from 2 to 10.  If necessary, you can tune it manually:

# ip route change default via 192.168.1.1 dev eth0 proto static initcwnd 10

If a TCP connection stalls, even briefly, the kernel may reduce the TCP window size significantly in an attempt to respond to congestion.  Many commentators have suggested that this behavior is not necessary, and this “slow start” behavior should be disabled:

# echo 0 > /proc/sys/net/ipv4/tcp_slow_start_after_idle

TCP options for Spirent load generators

If you are using older Spirent test kit, you may need to set the following tunables to work around optimizations in their TCP stack:

# echo 0 > /proc/sys/net/ipv4/tcp_timestamps

# echo 0 > /proc/sys/net/ipv4/tcp_window_scaling

[Note: See attachments for the above changes in an easy to run shell script] 

irqbalance

Interrupts (IRQs) are wake-up calls to the CPU when new network traffic arrives. The CPU is interrupted and diverted to handle the new network data. Most NIC drivers will buffer interrupts and distribute them as efficiently as possible.  When running on a machine with multiple CPUs/cores, interrupts should be distributed across cores roughly evenly. Otherwise, one CPU can be the bottleneck in high network traffic.

The general-purpose approach in Linux is to deploy irqbalance , which is a standard package on most major Linux distributions.  Under extremely high interrupt load, you may see one or more ksoftirqd processes exhibiting high CPU usage.  In this case, you should configure your network driver to use multiple interrupt queues (if supported) and then manually map those queues to one or more CPUs using SMP affinity.

Receive-Side Scaling (RSS)

Modern network cards can maintain multiple receive queues. Packets within a particular TCP connection can be pinned to a single receive queue, and each queue has its own interrupt.  You can map interrupts to CPU cores to control which core each packet is delivered to. This affinity delivers better performance by distributing traffic evenly across cores and by improving connection locality (a TCP connection is processed by a single core, improving CPU affinity).

For optimal performance, you should:

  • Allow the Traffic Manager software to auto-size itself to run one process per CPU core (two when using hyperthreading), i.e. do not modify the num_children configurable. 
  • Configure the network driver to create as many queues as you have cores, and verify the IRQs that the driver will raise per queue by checking /proc/interrupts.
  • Map each queue interrupt to one core using /proc/irq/<irq-number>/smp_affinity

You should also refer to the technical documentation provided by your network card vendor.

[Updates by Aidan Clarke and Rick Henderson ]

 

Version history
Revision #:
2 of 2
Last update:
‎03-25-2020 05:18:PM
Updated by:
 
Labels (1)
Contributors
Comments

The Google Code article link is broken- the document may be found at http://lxr.free-electrons.com/source/Documentation/networking/scaling.txt

Thanks - I have updated the document to reflect the changes..