cancel
Showing results for 
Search instead for 
Did you mean: 

Hex-encoded URLs, TrafficScript, and All That

This article examines how legitimate URLs are built, how malicious URLs can be made harder to detect by hexadecimal encoding, and, of course, how Stingray and TrafficScript can be used to filter and reject them.

Introduction

What is and what is not a legal URL is described in RFC 3986. To summarize, a URL consists of several components separated by  "reserved" characters (e.g. '/', ':', and '@'), which themselves must only contain unreserved characters (letters, numbers and some punctuation like '_'). If a reserved character is to be part of a component, it has to be "escaped". Escaping in this context means taking the numerical value of the ASCII representation, converting that value into a two-digit hexadecimal and preceding the result with a percent sign ('%').

Most of us have come across a '%20' in a URL, which is an escaped whitespace character. Another common example is the escaped percent (%25 -- the percent sign must always be escaped because otherwise it would indicate the beginning of an escape sequence).

But escaping can be used for internationalization of URLs as well: The recommended way of doing that is to encode the character as UTF-8 and then escape with %hh. For more information about this method, see Guidelines for new URL Schemes, RFC 2718.

The dark side

darth-vader-face1.jpgOf course, nobody would care to go through the pain of encoding characters that are perfectly legal in a URL, would they? Well, probably not unless they've got something to hide.

Consider the following situation: A bored high-school student, Harry Acker, knows that the web-admin of his school is running the school's web-site off MS IIS which is known to be vulnerable to a number of exploits. Now Harry has received a worse grade than he had expected and is thinking of revenge. He reads about an easy way to bring IIS down, the infamous CodeRed exploit (see this security notification) which involves sending a request for a file with the extension .ida. He follows the instructions, but gets back an angry web page telling him to stop attacking the server.

What has happened? The request was intercepted by an intrusion detection system (IDS) which looks for well-known signatures of exploits. Now Harry has an idea: if he encodes the request, it could pass the IDS, get through to the web server and bring it down. Passing the URL '/nuke.id%61', he finally has his questionable revenge (this particular URL is used here as an example only and is actually not dangerous as shown).

Using TrafficScript to identify malicious requests

The story of Harry A. clearly shows that the URL has to be un-escaped before being analyzed. The TrafficScript function that achieves this is called, not surprisingly, string.unescape( escaped_string ). To protect against this type of scheme, the following request rule could be set up:


$exploit = "nuke.ida"; # just an example



$rawurl = http.getRawURL();


$cookedurl = string.unescape($rawurl);



if( string.contains($rawurl, $exploit) ) {


  $nuke = 1;


  $code = "403 You belong in jail";


  $reply = "Nice try...";


} else if ( string.contains($cookedurl, $exploit) ) {


  $nuke = 1;


  $code = "403 You really belong in jail";


  $reply = "Very nice try...";


}



if ( $nuke ) {


  http.sendResponse(


    $code, "text/plain",


    string.append( $reply, "\nYou requested ", $rawurl, " (", $cookedurl, ")."),


    "Server: ZXTM/4.2"


  );


}



The http.getRawURL() function was used here for the purpose of illustration. TrafficScript also provides the convenient functions http.getPath() and http.getQueryString(), which un-escape the respective string before returning it. In real-world applications, these are usually the functions of choice for analyses of the kind sketched here.

More complex encoding

The woes of IIS users unfortunately do not end here. Apart from the standard %hh encoding described above, IIS supports a non-standard encoding called %u-encoding (see http://www.ciac.org/ciac/bulletins/l-139.shtml). This encoding allows for 16-bit values in the URL by using four hex values after '%u'. In this encoding scheme, for example, %u0061 corresponds to 'a'. When un-escaping them, UTF-8 has to be used because the result might not be representable in ASCII. Stingray also understands this extended hex-encoding mechanism and decodes it in the string.unescape() function and with http.getPath().

But what is to be assumed as the encoding scheme for those %u values? Stingray assumes UCS-2 (which is a 16-bit unicode standard) and converts the result into UTF-8. IIS might, however, divert from that mapping if its so-called code-pages are used. Using these code pages, a given 16-bit number can be mapped to any arbitrary character. Stingray does not support the use of code pages.

Another common attempt at smuggling dangerous URLs past the security check is to use characters after a '%' that do not correspond to a valid hexadecimal number, for example 'Q' or 'G'. If Stingray's unescape() function encounters such a string, it does not convert the illegal 'hex'-value, but replaces the '%' by a '_' and sets the variable $1 to the value zero (0). So by adding a the following code, this type of attack can be caught.


$cookedurl = string.unescape($rawurl);



if( $1 == 0 ) {


  $nuke = 1;


  $code = "403 Grow up";


  $reply = "Lame try...";


} else if (...) # as above



Useful links

Version history
Revision #:
1 of 1
Last update:
‎03-14-2013 05:13:AM
Updated by:
 
Labels (1)