Do you ever face any of these requirements?
This article explains that to address these problems, you must consider the following questions:
It then describes some of the measures you can take to monitor performance more deeply and apply prioritization to nominated traffic:
Whether you are running an eCommerce web site, online corporate services or an internal intranet, there’s always the need to squeeze more performance from limited resources and to ensure that your most valuable users get the best possible levels of service from the services you are hosting.
Imagine that you are running a successful gaming service in a glamorous location. The usage of your service is growing daily, and many of your long-term users are becoming very valuable.
Unfortunately, much of your bandwidth and server hits are taken up by competitors’ robots that screen-scrape your betting statistics, and poorly-written bots that spam your gaming tables and occasionally place low-value bets. At certain times of the day, this activity is so great that it impacts the quality of the service you deliver, and your most valuable customers are affected.
Using Traffic Manager to measure, classify and prioritize traffic, you can construct a service policy that comes into effect when your web site begins to run slowly to enforce different levels of service:
Whether you are operating a gaming service, a content portal, a B2B or B2C eCommerce site or an internal intranet, this kind of service policy can help ensure that key customers get the best possible service, minimize the churn of valuable users and prevent undesirable visitors from harming the service to the detriment of others.
“I want to best-effort guarantee certain levels of service for certain users.”
“I want to prioritize some transactions over others.”
“I want to restrict the activities of certain users.”
To address these problems, you must consider the following questions:
One or more TrafficScript rules can be used to apply the policy. They take advantage of the following features:
When does the policy take effect?
How are users categorized?
How are they given different levels of service?
Feature Brief: TrafficScript is the key to defining traffic management policies to implement these prioritization rules. TrafficScript brings together functionality to monitor and classify behavior, and then applies functionality to impose the appropriate prioritization rules.
For example, the following TrafficScript request rule inspects HTTP requests. If the request is for a .jsp page, the rule looks at the client’s ‘Priority’ cookie and routes the request to the ‘high-priority’ or ‘low-priority’ server pools as appropriate:
$url = http.getPath(); if( string.endsWith( $url, ".jsp" ) ) { $cookie = http.getCookie( "Priority" ); if( $cookie == "high" ) { pool.use( "high-priority" ); } else { pool.use( "low-priority" ); } }
Generally, if you can describe the traffic management logic that you require, it is possible to implement it using TrafficScript.
Using Feature Brief: Service Level Monitoring, Traffic Manager can measure and react to changes in response times for your hosted services, by comparing response times to a desired time.
You configure Service Level Monitoring by creating a Service Level Monitoring Class (SLM Class). The SLM Class is configured with the desired response time (for example, 100ms), and some thresholds that define actions to take. For example, if fewer than 80% of requests meet the desired response time, Traffic Manager can log a warning; if fewer than 50% meet the desired time, Traffic Manager can raise a system alert.
Suppose that we were concerned about the performance of our Java servlets. We can configure an SLM Class with the desired performance, and use it to monitor all requests for Java servlets:
$url = http.getPath(); if( string.startsWith( $url, "/servlet/" ) { connection.setServiceLevelClass( "Java servlets" ); }
You can then monitor the performance figures generated by the ‘Java servlets’ SLM class to discover the response times, and the proportion of requests that fall outside the desired response time:
Once requests are monitored by an SLM Class, you can discover the proportion of requests that meet (or fail to meet) the desired response time within a TrafficScript rule. This makes it possible to implement TrafficScript logic that is only called when services are underperforming.
Suppose we had a TrafficScript rule that tested to see if a request came from a ‘high value’ customer.
When our service is running slowly, high-value customers should be sent to one server pool (‘gold’) and other customers sent to a lower-performing server pool (‘bronze’). However, when the service is running at normal speed, we want to send all customers to all servers (the server pool named ‘all servers’).
The following TrafficScript rule describes how this logic can be implemented:
# Monitor all traffic with the 'response time' SLM class, which is # configured with a desired response time of 200ms connection.setServiceLevelClass( "response time" ); # Now, check the historical activity (last 10 seconds) to see if it’s # been acceptable (more than 90% requests served within 200ms) if( slm.conforming( "response time" ) > 90 ) ) { # select the ‘all servers’ server pool and terminate the rule pool.use( "all servers" ); } # If we get here, things are running slowly # Here, we decide a customer is ‘high value’ if they have a login cookie, # so we penalize customers who are not logged in. You can put your own # test here instead $logincookie = http.getCookie( "Login" ); if( $logincookie ) { pool.use( "gold" ); } else { pool.use( "bronze" ); }
For a more sophisticated example of this technique, check out the article Dynamic rate shaping slow applications
There’s no limit to how you can inspect and evaluate your traffic. Traffic Manager lets you look at any aspect of a client’s request, so that you can then categorize them as you need. For example:
# What is the client asking for? $url = http.getPath(); # ... and the QueryString $qs = http.getQueryString(); # Where has the client come from? $referrer = http.getHeader( "Referer" ); $country = geo.getCountryCode( request.getRemoteIP() ); # What sort of browser is the client using? $ua = http.getHeader( "User-Agent" ); # Is the client trying to spend more than $49.99? if( http.getPath() == "/checkout.cgi" && http.getFormParam( "total" ) > 4999 ) ... # What’s the value of the CustomerName field in the XML purchase order # in the SOAP request? $body = http.getBody(); $name = xml.xpath.matchNodeSet( $body, "", "//Info/CustomerName/text()"); # Take the name, post it to a database server with a web interface and # inspect the response. Does the response contain the value ‘Premium’? $response = http.request.post( "http://my.database.server/query", "name=".string.htmlEncode( $name ) ); if( string.contains( $response, "Premium" ) ) { ... }
Often, it only takes one request to identify the status of a user, but you want to remember this decision for all subsequent requests. For example, if a user places an item in his shopping cart by accessing the URL ‘/cart.php’, then you want to remember this information for all of his subsequent requests.
Adding a response cookie is the way to do this. You can do this in either a Request or Response Rule with the ‘http.setResponseCookie()’ function:
if( http.getPath() == "/cart.php" ) { http.setResponseCookie( "GotItems", "Yes" ); }
This cookie will be sent by the client on every subsequent request, so to test if the user has placed items in his shopping cart, you just need to test for the presence of the ‘GotItems’ cookie in each request rule:
if( http.getCookie( "GotItems" ) ) { ... }
If necessary, you can encrypt and sign the cookie so that it cannot be spoofed or reused:
# Setting the cookie # Create an encryption key using the client’s IP address and user agent # Encrypt the current time using encryption key; it can only be decrypted # using the same key $key = http.getHeader( "User-Agent" ) . ":" . request.getRemoteIP(); $encrypted = string.encrypt( sys.time(), $key ); $encoded = string.hexencode( $encrypted ); http.setResponseHeader( "Set-Cookie", "GotItems=".$encoded ); # Validating the cookie $isValid = 0; if( $cookie = http.getCookie( "GotItems" ) ) { $encrypted = string.hexdecode( $cookie ); $key = http.getHeader( "User-Agent" ) . ":" . request.getRemoteIP(); $secret = string.decrypt( $encrypted, $key ); # If the cookie has been tampered with, or the ip address or user # agent differ, the string.decrypt will return an empty string. # If it worked and the data was less than 1 hour old, it’s valid: if( $secret && sys.time()-$secret < 3600 ) { $isValid = 1; } }
Having decided when to apply your service policy (using Service Level Monitoring), and classified your users (using Application Traffic Inspection), you now need to decide how to prioritize valuable users and penalize undesirable ones.
The Feature Brief: Bandwidth and Rate Shaping in Traffic Manager capability is used to apply maximum request rates:
You can construct a service policy that places limits on a wide range of events, with very fine grained control over how events are identified. You can impose per-second and per-minute rates on these events.
For example:
Request Rate Limits are imposed using the TrafficScript rate.use() function, and you can configure per-second and per-minute limits in the rate class. Both limits are applied (note that if the per-minute limit is more than 60-times the per-second limit, it has no effect).
Rate classes function as queues. When the TrafficScript rate.use() function is called, the connection is suspended and added to the queue that the rate class manages. Connections are then released from the queue according to the per-second and per-minute limits.
There is no limit to the size of the backlog of queued connections. For example, if 1000 requests arrived in quick succession to a rate class that admitted 10 per second, 990 of them would be immediately queued. Each second, 10 more requests would be released from the front of the queue.
While they are queued, connections may time out or be closed by the remote client. If this happens, they are immediately discarded.
You can use the rate.getBacklog() function to discover how many requests are currently queued. If the backlog is too large, you may decide to return an error page to the user rather than risk their connection timing out. For example, to rate shape jsp requests, but defer requests when the backlog gets too large:
$url = http.getPath(); if( string.endsWith( $url, ".jsp" ) ) { if( rate.getBacklog( "shape requests" ) > 100 ) { http.redirect( "http://mysite/too_busy.html" ); } else { rate.use( "shape requests" ); } }
In many circumstances, you may need to apply more fine-grained rate-shape limits. For example, imagine a login page; we wish to limit how frequently each individual user can attempt to log in to just 2 attempts per minute.
The rate.use() function can take an optional ‘key’ which identifies a specific instance of the rate class. This key can be used to create multiple, independent rate classes that share the same limits, but enforce them independently for each individual key.
For example, the ‘login limit’ class is restricted to 2 requests per minute, to limit how often each user can attempt to log in:
$url = http.getPath(); if( string.endsWith( $url, "login.cgi" ) ) { $user = http.getFormParam( "username" ); rate.use( "login limit", $user ); }
This rule can help to defeat dictionary attacks where attackers try to brute-force crack a user’s password. The rate shaping limits are applied independently to each different value of $user. As each new user accesses the system, they are limited to 2 requests per minute, independently of all other users who share the “login limit” rate shaping class.
For another example, check out The "Contact Us" attack against mail servers.
Of course, once you’ve classified your users, you can apply different rate settings to different categories of users:
# If they have an odd-looking user agent, or if there’s no host header, # the client is probably a web spider. Limit it to 1 request per second. $ua = http.getHeader( "User-Agent" ); if( ! string.startsWith( $ua, "Mozilla/" ) && ! string.startsWith( $ua, "Opera/" ) || ! http.getHeader( "Host" ) ) { rate.use( "spiders", request.getRemoteIP() ); }
If the service is running slowly, rate-shape users who have not placed items into their shopping cart with a global limit, and rate-shape other users to 8 requests per second each:
if( slm.conforming( "timer" ) < 80 ) { $cookie = request.getCookie( "Cart" ); if( ! $cookie ) { rate.use( "casual users" ); } else { # Get a unique id for the user $cookie = request.getCookie( "JSPSESSIONID" ); rate.use( "8 per second", $cookie ); } }
Feature Brief: Bandwidth and Rate Shaping in Traffic Manager allows Traffic Manager to limit the number of bytes per second used by inbound or outbound traffic, for an entire service, or by the type of request.
Bandwidth limits are automatically shared and enforced across all the Traffic Managers in a cluster. Individual Traffic Managers take different proportions of the total limit, depending on the load on each, and unused bandwidth is equitably allocated across the cluster depending on the need of each machine.
Like Request Rate Shaping, you can use Bandwidth shaping to limit the activities of subsets of your users. For example, you may have a 1 Gbits/s network connection which is being over-utilized by a certain type of client, which is affecting the responsiveness of the service. You may therefore wish to limit the bandwidth available to those clients to 20Mbits/s.
Like Request Rate Shaping, you configure a Bandwidth class with a maximum bandwidth limit. Connections are allocated to a class as follows:
response.setBandwidthClass( "class name" );
All of the connections allocated to the class share the same bandwidth limit.
The following example helps to mitigate the ‘Slashdot Effect’, a common example of a Flash Flood problem. In this situation, a web site is overwhelmed by traffic as a result of a high-profile link (for example, from the Slashdot news site), and the level of service that regular users experience suffers as a result.
The example looks at the ‘Referer’ header, which identifies where a user has come from to access a web site. If the user has come from ‘slashdot.org’, he is tagged with a cookie so that all of his subsequent requests can be identified, and he is allocated to a low-bandwidth class:
$referrer = http.getHeader( "Referer" ); if( string.contains( $referrer, "slashdot.org" ) ) { http.addResponseHeader( "Set-Cookie", "slashdot=1" ); connection.setBandwidthClass( "slashdot" ); } if( http.getCookie( "slashdot" ) ) { connection.setBandwidthClass( "slashdot" ); }
For a more in depth discussion, check out Detecting and Managing Abusive Referers.
Different levels of service can be provided by different traffic routing, or in extreme events, by dropping some requests.
For example, some large media sites provide different levels of content; high-bandwidth rich media versions of news stories are served during normal usage, and low-bandwidth versions which are served when traffic levels are extremely high. Many websites provide flash-enabled and simple HTML versions of their home page and navigation.
This is also commonplace when presenting content to a range of browsing devices with different capabilities and bandwidth.
The switch between high and low bandwidth versions could take place as part of a service policy: as the service begins to under-perform, some (or all) users could be forced onto the low-bandwidth versions so that a better level of service is maintained.
# Forcibly change requests that begin /high/ to /low/ $url = http.getPath(); if( string.startsWith( $url, "/high" ) ) { $url = string.replace( $url, "/high", "/low" ); http.setPath( $low ); }
Ticket Booking systems for major events often suffer enormous floods of demand when tickets become available.
You can use Stingray's request rate shaping system to limit how many visitors are admitted to the service, and if the service becomes overwhelmed, you can send back a ‘please try again’ message rather than keeping the user ‘on hold’ in the queue indefinitely.
Suppose the ‘booking’ rate shaping class is configured to admit 10 users per second, and that users enter the booking process by accessing the URL /bookevent?eventID=<id>. This rule ensures that no user is queued for more than 30 seconds, by keeping the queue length to no more than 300 users (10 users/second * 30 seconds):
# limit how users can book events $url = http.getPath(); if( $url == "/bookevent" ) { # How many users are already queued? if( rate.getBacklog( "booking" ) > 300 ) { http.redirect( "http://www.mysite.com/too_busy.html"); } else { rate.use( "booking" ); } }
In many cases, the resources are limited and when a site is overwhelmed, users’ requests still need to be served.
Consider the following scenario:
Our goal is to give all users the best possible level of service, but if customers begin to get a poor level of service, we want to prioritize them over casual visitors. We desire that more then 80% of customers get responses within 100ms.
This can be achieved by splitting the 4 servers into 2 pools: the ‘allservers’ pool contains servers 1 to 4, and the ‘someservers’ pool contains servers 1 and 2 only.
When the service is poor for the customers, we will restrict the casual visitors to just the ‘someservers’ pool. This effectively reserves the additional servers 3 and 4 for the customers’ exclusive use.
The following code uses the ‘response’ SLM class to measure the level of service that customers receive:
$customer = http.getCookie( "Cart" ); if( $customer ) { connection.setServiceLevelClass( "response" ); pool.use( "allservers" ); } else { if( slm.conforming( "response" ) < 80 ) { pool.use( "someservers" ); } else { pool.use( "allservers" ); } }
Some of Traffic Manager's features can be used to improve the end user’s experience, but they take up resources on the system:
All of these features can be enabled and disabled on a per-user basis, as part of a service policy.
Use the http.aptimizer.bypass() and http.aptimizer.use() TrafficScript functions to control whether Traffic Manager will employ the Aptimizer optimization module for web content.
Note that these functions only refer to optimizations to the base HTML document (e.g. index.html, or other content of type text/html) - all other resources will be served as is appropriate. For example, if a client receives an aptimized version of the base content and then requests the image sprites, Traffic Manager will always serve up the sprites.
# Optimize web content for clients based in Australia $ip = request.getRemoteIP(); if( geo.getCountry( $ip ) == "Australia" ) { http.aptimizer.use( "All", "Remote Users" ); }
Use the http.compress.enable() and http.compress.disable() TrafficScript functions to control whether or not Traffic Manager will compress response content to the remote client.
Note that Traffic Manager will only compress content if the remote browser has indicated that it supports compression.
On a lightly loaded system, it’s appropriate to compress all response content whenever possible :
http.compress.enable();
On a system where the CPU usage is becoming too high, you can selectively compress content:
# Don’t compress by default http.compress.disable(); if( $isvaluable ) { # do compress in this case http.compress.enable(); }
Traffic Manager can cache multiple different versions of a HTTP response. For example, if your home page is generated by an application that customizes it for each user, Traffic Manager can cache each version separately, and return the correct version from the cache for each user who accesses the page.
Traffic Manager's cache has a limited size so that it does not consume too much memory and cause performance to degrade. You may wish to prioritize which pages you put in the cache, using the http.cache.disable() and http.cache.enable() TrafficScript functions.
Note: you also need to enable Content Caching in your Virtual Server configuration; otherwise the TrafficScript cache control functions will have no effect.
# Get the user name $user = http.getCookie( "UserName" ); # Don’t cache any pages by default: http.cache.disable(); if( $isvaluable ) { # Do cache these pages for better performance. # Each user gets a different version of the page, so we need to cache # the page indexed by the user name. http.cache.setkey( $user ); http.cache.enable(); }
A service policy can be complicated to construct and implement.
The TrafficScript functions log.info(), log.warn() and log.error() are used to write messages to the event log, and so are very useful debugging tools to assist in developing complex TrafficScript rules.
For example, the following code:
if( $isvaluable && slm.conforming( "timer" ) < 70 ) { log.info( "User ".$user." needs priority" ); }
… will append the following message to your error log file:
$ tail $ZEUSHOME/zxtm/log/errors [20/Jan/2013:10:24:46 +0000] INFO rulename rulelogmsginfo vsname User Jack needs priority
You can also inspect your error log file by viewing the ‘Event Log’ on the Admin Server.
When you are debugging a rule, you can use log.info() to print out progress messages as the rule executes. The log.info() function takes a string parameter; you can construct complex strings by appending variables and literals together using the ‘.’ operator:
$msg = "Received ".connection.getDataLen()." bytes."; log.info( $msg );
The functions log.warn() and log.error() are similar to log.info(). They prefix error log messages with a higher priority - either “WARN” or “ERROR” and you can filter and act on these using the Event Handling system.
You should be careful when printing out connection data verbatim, because the connection data may contain control characters or other non-printable characters. You can encode data using either ‘string.hexEncode()’ or ‘string.escape()’; you should use ‘string.hexEncode()’ if the data is binary, and ‘string.escape()’ if the data contains readable text with a small number of non-printable characters.
Traffic Manager is a powerful toolkit for network and application administrators. This white paper describes a number of techniques to use tools in the kit to solve a range of traffic valuation and prioritization tasks.
For more examples of how Traffic Manager and TrafficScript can manipulate and prioritize traffic, check out the Top Examples of Traffic Manager in action on the Pulse Community.