On 27th February 2006, we took part in VMware's launch of their Virtual Appliance initiative. Riverbed Stingray (or 'Zeus Extensible Traffic Manager / ZXTM') was the first ADC product that was packaged as a virtual appliance and we were delighted to be a launch partner with VMware, and gain certification in November 2006 when they opened up their third party certification program.
We had to synchronize the release of our own community web content with VMware's website launch, which was scheduled for 9pm PST. That's 5am in our Cambridge UK dev labs!
With a simple bit of TrafficScript, we were able to test and review our new web content internally before the release, and make the new content live at 5am while everyone slept soundly in their beds.
The community website we operated was a reasonably sophisticated website. It was based on a blogging engine, and the configuration and content for the site was split between the filesystem and a local database. Content was served up from the database via the website and an RSS feed. To add a new section with new content to the site, it was necessary to coordinate a number of changes to the filesystem and the database together.
We wanted to make the new content live for external visitors at 5am on Monday 27th, but we also wanted the new content to be visible internally and to selected partners before the release, so that we could test and review it.
The obvious solution of scripting the database and filesystem changes to take place at 5am was not a satisfactory solution. It was hard to test on the live site, and it did not let us publish the new site internally beforehand.
if( $usenew ) { pool.use( "Staging server" ); } else { pool.use( "DMZ Server" ); }
We had a couple of options.
We have a staging system that we use to develop new code and content before putting it on the live site. This system has its own database and filesystem, and when we publish a change, we copy the new settings to the live site manually. We could have elected to run the new site on the staging system, and use Stingray to direct traffic to the live or staging server as appropriate:
However, this option would have exposed our staging website (running on a developer's desktop behind the DMZ) to live traffic, and created a vulnerable single-point-of-failure. Instead, we modified the current site so that it could select the database to use based on the presence of an HTTP header:
$usenew = 0; # For requests from internal users, always use the new site $remoteip = request.getremoteip(); if( string.ipmaskmatch( $remoteip, "10.0.0.0/8" ) ) { $usenew = 1; } # If its after 5am on the 27th, always use the new site # Fix this before the 1st of the following month! if (sys.time.monthday() == 27 && sys.time.hour() >= 5 ) { $usenew = 1; } if( sys.time.monthday() > 27 ) { $usenew = 1; } http.removeHeader( "NEWSITE" ); if( $usenew ) { http.addHeader( "NEWSITE", "1" ); }
// The php code selects overrides the database host $DB_HOST = "livedb.internal.zeus.com"; if( isset( $_ENV['HTTP_NEWSITE'] ) && ( $_ENV['HTTP_HOST'] == 'knowledgehub.zeus.com' ) ) { $DB_HOST = "stagedb.internal.zeus.com"; }
This way, only the secured DMZ webserver processed external traffic, but it would use the internal staging database for the new content.
Of course it did! Because we used Stingray to categorize the traffic, we could safely test the new content, confident that the switchover would be seamless. No one was awake at 5am when the site went live, but traffic to the site jumped after the launch: