Configure an 'error_file' for each virtual server | ✓ | |
Drain nodes before removing them from the configuration | ✓ | |
Configure Administration Server certificate | ✓ | |
Firewall off internal ports | ✓ | |
Use different user names for different people | ✓ | |
Integrate with your existing authentication systems | ✓ | |
Take regular backups | ✓ | |
Configure the Event Handling to send notifications of problems | ✓ | |
Ensure that you are ready to cope with failures and traffic bursts | ✓ | |
Ensure that your software is up-to-date | ✓ |
When a request can't be served by a pool, the traffic manager can respond in several ways. Firstly, it will try the failpool; failing that, it will use the error_file setting from the virtual server. If you haven't configured an error file a default "Service Unavailable" message will be sent to the client. While this works, it isn't best for the image of your site, so it is recommended that an error_file be configured.
The is configured on the VS > Edit > Connection Management page; see also the article Sending custom error pages
When you are performing infrastructure maintenance which requires you to remove nodes from a pool, you should drain the node before removing it. This allows existing connections to complete, and if you are using session persistence it allows existing sessions to complete.
If you don't have session persistence you may only have to wait a minute or so for existing connections to complete; with session persistence turned on you may have to wait for an hour or so for clients to finish using their sessions. In both cases you can see whether there are any existing connections, and when the node was last used on the Activity > Draining Nodes page.
By default the administration server is configured with a self-signed SSL certificate. This is vulnerable to man-in-the-middle attacks by an attacker who can intercept and modify the network trafic between the administrator and the admin server. If you anticipate accessing the admin server over an insecure network, you should replace the self-signed certificate with one signed by a known Certificate Authority; this could be an external authority, or an internal corporate authority. Alternatively, you could configure your browser to trust the self-signed certificate, and beware of situations where you are unexpectedly asked to confirm that the certificate is valid.
Stingray uses several ports for administration, discovery and intra-cluster communication. Although all of the traffic is encrypted or signed, it is advisable to firewall these ports off.
The administration server is also generally accessible from all IP addresses. It is possible to restrict the IP addresses that can access the administration server. For example, you could limit access to your 10.100.0.0/16 corporate network, ensuring that users outside your network cannot access the administration server.
The administration server security settings can be changed from the System > Security page.
HTTPS: 9090 and 9070 are used for administation traffic (web, SOAP, REST)
HTTPS: 9080 is used for internal communications
Multicast and UDP: 9090 used for discovery and cluster health checks
Refer to the System -> Security tab in the user interface, and the 'Security' chapter in the Stingray Product Documentation
While it is convenient to have a shared "admin" username for administering the traffic manager, it is not good practice. If an administrator leaves you may have to change the password, impacting everyone who shares the user login. It also means that the audit log does not track the activites of individual admin users.
It is recommended that different people have different usernames. Additional users can be created on the System > Users > Local Users page.
Even better than specifying different local usernames for different people is to integrate the administration server with your existing authentication infrastructure. This allows people to use the same password, and reduces chances that a system is forgotten about when an employee leaves your company.
You can delegate authentication to RADIUS, LDAP and TACACS+ systems. The authenticators are configured from the System > Users -> Authenticators pages.
Once you have integrated, it is possible to remove all local users, with the exception that at least one user must remain in the "admin" group (this need not be the user named "admin").
The traffic manager configuration is a vital component in maintaining the operation of your site. You should ensure that backups are created regularly. You can take a backup through the administration server, or automatically using the CLI or SOAP functions.
You should also export backups and store them on another machine in case of catastrophic hardware failure.
Stingray Traffic Manager includes a customizable alerting infrastructure. Using this functionality it is possible to let your system administrators know of problems that are occurring that are relevant to them.
It is recommended that at the very least the "Default Events" event type be used to send an email to your administrators. This event type contains all the events that are emitted when a critical failure occurs, and when things recover. If this isn't good enough, it is easy to copy the event type and customize it to just contain the relevant events for you.
Alerting is configured from the System > Alerting page.
While the traffic manager performance scales well with the CPU used, care should be taken to ensure your setup can cope with failures and traffic bursts (such as the slashdot effect - see Detecting and Managing Abusive Referers ).
In particular, it is not good practice to be running an active-active cluster with both machines running at close to 100% CPU usage. If one of the machines fails, the other machine wouldn't be able to take over all the remaining traffic, and you would end up with dropped connections and an overloaded infrastructure.
Traffic bursts are harder to handle, but one option would be to use selective short-term caching to ensure that a sudden burst doesn't overwhelm your web server layer, an example of this is described here: Cache your website - just for one second?
Last but by no means least, it is important to ensure that your software is up to date. Newer versions include security fixes and fixes to existing functionality, and so we recommend you use the latest version.
Notifications of released versions are sent to all supported customers and shared on the blog feed for the Stingray section of this site.