We spend a great deal of time focusing on how to speed up customers' web services. We constantly research new techniques to load balance traffic, optimise network connections and improve the performance of overloaded application servers. The techniques and options available from us (and yes, from our competitors too!) may seem bewildering at times. So I would like to spend a short time singing the praises of one specific feature, which I can confidently say will improve your website's performance above all others - caching your website.
"But my website is uncacheable! It's full of dynamic, changing pages. Caching is useless to me!"
We'll answer that objection soon, but first, it is worth a quick explanation of the two main styles of caching:
Most people's experience of a web cache is on their web browser. Internet Explorer or Firefox will store copies of web pages on your hard drive, so if you visit a site again, it can load the content from disk instead of over the Internet.
There's another layer of caching going on though. Your ISP may also be doing some caching. The ISP wants to save money on their bandwidth, and so puts a big web cache in front of everyone's Internet access. The cache keeps copies of the most-visited web pages, storing bits of many different websites. A popular and widely used open-source web cache is Squid.
However, not all web pages are cacheable near the client. Websites have dynamic content, so for example any web page containing personalized or changing information will not be stored in your ISP's cache. Generally the cache will fill up with "static" content such as images, movies, etc. These get stored for hours or days. For your ISP, this is great, as these big files take up the most of their precious bandwidth.
For someone running their own website, the browser caching or ISP caching does not do much. They might save a little bandwidth from the ISP image caching if they have lots of visitors from the same ISP, but the bulk of the website, including most of the content generated by their application servers, will not be cached and their servers will still have lots of work to do.
Here, the main aim is not to save bandwidth, but to accelerate your website. The traffic manager sits in your datacenter (or your cloud provider), in front of your web and application servers. Access to your website is through the Traffic Manager software, so it sees both the requests and responses. Traffic Manager can then start to answer these requests itself, delivering cached responses. Your servers then have less work to do. Less work = faster responses = fewer servers needed = saves money!
"But I told you - my website isn't cacheable!"
There's a reason why your website is marked uncacheable. Remember the ISP caches...? They mustn't store your changing, constantly updating web pages. To enforce this, application servers send back instructions with every web page, the Cache-Control HTTP header, saying "Don't cache this". Traffic Manager obeys these cache instructions too, because it's well-behaved.
But, think - how often does your website really change? Take a very busy site, for example a popular news site. Its front page may be labelled as uncacheable so that vistors always see the latest news, since it changes as new stories are added. But new additions aren't happening every second of the day. What if the page was marked as cacheable - for just one second? Visitors would still see the most up-to-date news, but the load on the site servers would plummet. Even if the website had as few as ten views in a second, this simple change would reduce the load on the app servers ten-fold.
This isn't an isolated example - there are plenty of others: Think twitter searches, auction listings, "live" graphing, and so on. All such content can be cached briefly without any noticable change to the "liveness" of the site. Traffic Manager can deliver a cached version of your web page much faster than your application servers - not just because it is highly optimized, but because sending a cached copy of a page is so much less work than generating it from scratch.
So if this simple cache change is so great, why don't people use this technique more - surely app servers can mark their web pages as cacheable for one or two seconds without Traffic Manager's help, and those browser/ISP caches can then do the magic? Well, the browser caches aren't going to be any use - an individual isn't going to be viewing the same page on your website multiple times a second (and if they keep hitting the reload button, their page requests are not cacheable anyway). So how about those big ISP caches? Unfortunately, they aren't always clever enough either. Some see a web page marked as cacheable for a short time and will either:
Also, by leaving the caching to the client-side, the cache hit rate gets worse. A user in France isn't going to be able to make use of a cached copy of your site stored in a US ISP's cache, for instance.
If you use Traffic Manager to do the caching, these issues can be solved. First, the cache is held in one place - your datacenter, so it is available to all visitors. Second, Traffic Manager can tweak the cache instructions for the page, so it caches the page while forcing other people not to. Here is what's going on:
Result: a much faster web site, yet it still serves dynamic and constantly updating pages to viewers. Give it a try - you will be amazed at the performance improvements possible, even when caching for just a second. Remember, almost anything can be cached if you configure your servers correctly!
On the admin server, edit the virtual server that you want to cache, and click on the "Content Caching" link. Enable the cache. There are options here for the default cache time for pages. These can be changed as desired, but are primarily for the "ordinary" content that is cacheable normally, such as images, etc. The "webcache!control_out" setting allows you to change the Cache-Control header for your pages after they have been cached by the Traffic Manager software, so you can put "no-cache" here to stop others from caching your pages.
The "webcache!refresh_time" setting is a useful extra here. Set this to one second. This will smooth out the load on your app servers. When a cached page is about to expire (i.e. it's too old to stay cached) and a new request arrives, Traffic Manager will hand over a single request to your app servers, to see if there is a newer page available. Other requests continue to get served from the cache. This can prevent 'waves' of requests hitting your app servers when a page is about to expire from the cache.
Now, we need to make Traffic Manager cache the specific pages of your site that the app server claims are uncacheable. We do this using the RuleBuilder system for defining rules, so click on the "Catalogs" icon and then select the "Rules" tab. Now create a new RuleBuilder rule.
This rule needs to run for the specific web pages that you wish to make cacheable for short amounts of time. For an example, we'll make "/news" cacheable. Add a condition of "HTTP:URL Path" to match "/news", then add an action to set a HTTP response header. The rule should look like this:
Finally, add this rule as a response rule to your virtual server. That's it! Your site should now start to be cached. Just a final few words of caution: