cancel
Showing results for 
Search instead for 
Did you mean: 

TrafficScript regex to match the root domain name.

SOLVED
cignul9
Occasional Contributor

TrafficScript regex to match the root domain name.

I want to create a rule that gets the hostHeader and strips off all but the root domain (the last two levels).  For example I want to take www.example.com and grab example.com for my $domain variable.  I created a regular expression in a proof-of-concept script:

$host = http.getHostHeader();

$domain = string.regexSub($host, '([^.]+.[^.]+)$', "$1");

log.info("Root domain: " . $domain);

An example domain of www.example.com matches www.example.com.  I don't see how but the first bit [^.]+. seems to match www.example, even though my understanding of regex means that I want to exclude any string of characters having a dot in it.  I've also tried "([^\.]+\.[^\.]$)" which is a more kosher version of regex where I'm escaping the dot.  In either case the result is the same.  Isn't there a way to negate a character in TrafficScript's version of regex ([^<somechar>])?  Or is there a way to toggle greedy matching?

I have developed a workaround that splits the domain on the dots and simply fetches and re-concatenates the last two strings with another dot.  Even though it works I feel all yucky inside.  Besides I want to understand how TrafficScript implements regex better and this opportunity has so far taught me very little.

Any help is appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
aclarke
Frequent Contributor

Re: Re: TrafficScript regex to match the root domain name.

Shawn, I think the problem is with your regex group maching..

Try this - it uses regex capture groups to make the first element $1, and everything else $2:


$host = http.getHostHeader();


$domain = string.regexSub($host, '([0-9a-z-]{2,})\.(.*)', "$2");


log.info("Request was for: " . $host . " Root domain is: " . $domain);


Log output looks like it works to me:

Virtual Server vs_web: Request was for: www.app1.subapp2.company.com Root domain is: app1.subapp2.company.com

Virtual Server vs_web: Request was for: www.app1.company.com Root domain is: app1.company.com

Virtual Server vs_web: Request was for: www.company.com Root domain is: company.com

--
Aidan Clarke
Pulse Secure vADC Product Manager

View solution in original post

6 REPLIES 6
aclarke
Frequent Contributor

Re: TrafficScript regex to match the root domain name.

A quick google found this stackexchange reference that points to the following regex:

/([0-9a-z-]{2,}\.[0-9a-z-]{2,3}\.[0-9a-z-]{2,3}|[0-9a-z-]{2,}\.[0-9a-z-]{2,3})$/i

Using regexr, I tested it and it seems to work:

SRWare IronScreenSnapz138.png

--
Aidan Clarke
Pulse Secure vADC Product Manager
cignul9
Occasional Contributor

Re: TrafficScript regex to match the root domain name.

Those lengthy regex tests work -- and maybe even better than mine for the set of all possible domain names -- if you use the regex tester you link.  I was using a different but similar website to test my regex.  The real problem is that TrafficScript matches all of the hostname using the very same regex.  That's really the point of my post.  Is there a TrafficScript-specific regex tester like the general ones we all know and love?  I'm trying to find out how TrafficScript's implementation is different from everyone else's.  My working theory is that it matches greedily, or that it doesn't know what the negate character is.  Or something.

aclarke
Frequent Contributor

Re: Re: TrafficScript regex to match the root domain name.

Shawn, I think the problem is with your regex group maching..

Try this - it uses regex capture groups to make the first element $1, and everything else $2:


$host = http.getHostHeader();


$domain = string.regexSub($host, '([0-9a-z-]{2,})\.(.*)', "$2");


log.info("Request was for: " . $host . " Root domain is: " . $domain);


Log output looks like it works to me:

Virtual Server vs_web: Request was for: www.app1.subapp2.company.com Root domain is: app1.subapp2.company.com

Virtual Server vs_web: Request was for: www.app1.company.com Root domain is: app1.company.com

Virtual Server vs_web: Request was for: www.company.com Root domain is: company.com

--
Aidan Clarke
Pulse Secure vADC Product Manager
aclarke
Frequent Contributor

Re: TrafficScript regex to match the root domain name.

Shawn Magill have you had a chance to test the code I posted above? I am keen to see if it has resolved your issue..

A.

--
Aidan Clarke
Pulse Secure vADC Product Manager
cignul9
Occasional Contributor

Re: TrafficScript regex to match the root domain name.

Yes, sorry for the late reply.  Looks like it works!

aclarke
Frequent Contributor

Re: TrafficScript regex to match the root domain name.

Glad to hear!

--
Aidan Clarke
Pulse Secure vADC Product Manager