Hi all
(first up: please forgive me for not using proper Juniper terminology in all the right places. I've been in networking for almost 15yrs, but the IVE is not my primary field of work)
Problem: We've been observing a strange issue that nagged some users for half a day, then went away by itself and did not return; we're looking for an explanation.
One plausible explanation I could think of was this: the IVE has a built-in session/connection rate limit, that prevents the Unit from being overwhealmed by a flood of connection attempts. A bit like SYN flood protection we knew from our ScreenOS devices.
QUESTIONS:
Is there such a thing as an anti-DoS feature on an IVE?
If yes - what numbers and information does it base its decision on? (Src IP, username, combination...)
If yes - can its parameters be looked at or even tuned?
and last: what (else?) could cause an IVE to throw a user out after <5seconds, even if authentication, auhorization, role assignment, IP assignmend, failover to ESP etc all work perfectly well?
Thanks for sharing your thoughts, ideas and knowledge.
Please find the details below.
Marc
Disclaimer: The setup as a whole (Client PC, SSCM-distributed software package, HostChecker-Policy) does have a few wrinkles that need ironing out. We're aware and we'll take care of it.
ENVIRONMENT:
- IVE Virtual Machine running 7.4R9.3 (build 30667), licensed for 200 Users
- Users have an SCCM deployed software package containing Pulse 5.<something> and HostChecker. You'll note the absence of the Juniper Networks Setup Service (wrinkles, you remember?)
- User's client PCs are well maintained and patched, and quite literally a slim fresh install only a few days old. User's do *not* have administrative priviledge on their computers. (The user's computers are actally meant to be a form of beefy thin client).
- HostChecker Policy mandates FW and AV, and that AV Update be no more than 5 Updates behind.
- Checkbox on IVE to download new lists of update versions is not checked (another big wrinkle, probably).
- Users authenticate via LDAP/AD, with their AD credentials (nota: not via a form of OTP)
- Users may activate checkbox to "save settings" in their Pulse client (in extenso: cache the password) (did someone say "wrinkle"?)
THE STORY:
- in the early hours of friday (around 01:30), the IVE starts refusing client connections. The reason given is that the HostChecker refused the connection because the Virus definitioins on the client were too old.
- only a few users are affected
- on friday morning around 08:00, some 15-20 users start their day with a connection attempt. All of them (except the 3rd party MAC users, who don't have an AV policy) are refused.
- a storm of logon attempts is being logged: it seems that the pulse clients are frantically trying to connect, only to be instantly refused.
- ca 1.5hrs into this storm, we get alerted and get a grasp of what is happening and why.
- we reconfigure the HC policy to be "no older than 30 days"
- most of the users have given up and/or closed their notebooks
- only ~5 users remain.
- those 5 users are constantly trying to reconnect, and their logon attempt looks perfectly ok in the log (see attachment). Primary auth successfuly, gets assigned an IP, SSL first, the switchover to ESP, and then "Closed connection" after 3-5 seconds. Some have 0 seconds, some are seen to last 30seconds. (cf Log Excerpt)
- Even a reboot of the PC doesn't cure the issue. As soon as the user restarts the Pulse client and attempts the first login, the connection is closed and the pulse client keeps trying to reconnect.
- We get the users to connect via the IVE's web portal and launch the Pulse Client from there - this brings the last two users online, the other three had given up in the meantime.
- later in the afternoon, we completely fail at reproducing the issue with any form of client. Every connection attempt is instantly successful and stable.
- later on Friday and throughout the weekend, all connection attempts are successful and lead to a longish and stable session - from the same class of notebooks that had been fighting for connectivity throughout friday morning.
- Client side event logs show entries that say that the connection was terminated by the server side.
(edit) Windows Event Viewer on client shows Event ID 308, as per https://www.juniper.net/techpubs/en_US/junos-pulse4.0/topics/concepts/a-c-c-event-log-overview.html (/edit)
HYPOTHESIS
We assume that there is some form of Anti-DoS or connection setup/time limit on the IVE.
The argumentation follows this line...
1. HC AV policy starts to fail and refuses logins. This is undesired, but explicable (and can be fixed)
2. The Pulse clients with cached passwords constantly keep trying to reconnect.
3. The usernames, src-IPs or both of the constantyl reconnecting pulse clients are getting (temporarily) blacklisted on the IVE
4. after we change the HC policy, some users are no longer trying. Their blacklist entry cools out of the blacklist after a certain time.
5. The pulse clients that keep trying to reconnect into the hours after the change to HC policy have no chance to be removed from the blacklist, and keep getting refused.
6. a few hours later, the blacklist has cooled down and everyone can connect.
The one uncertain item in this list is: Wouldn't proper blacklisting prevent the setup of a session from the start? Accept the username, check username/srcIP againts the blacklist, then return some form of HTTP 500 status while still in SSL/HTTPS mode? According to the log, something different is happening.
LOG EXCERPT:
(see attachment)
Solved! Go to Solution.
Hi all
It turns out it wasn't the Anti-DoS/Lockout feature that hurt us, as there were no log messages as described in http://kb.pulsesecure.net/InfoCenter/index?page=content&id=KB16569.
The users were being constantly logged out and sessions got terminated after a few seconds because of the non-uniqe GUIDs of the Junos Pulse Clients (which were disk-cloned instead of properly installed).
The topic is described in
http://kb.pulsesecure.net/InfoCenter/index?page=content&id=KB16569
http://kb.pulsesecure.net/InfoCenter/index?page=content&id=KB25581
cheers and thx to the JTAC engineer helping us out.
Marc
(Update)
After a normal weekend and a flawless tuesday, the issue has returned out of nowhere this wednesday morning
We're opening a TAC case anyway.
(/Update)
Oh well, yes, the Anti-DoS feature does exists indeed, the JTAC engineer working on Case 2014-0611-0138 pointed us towards it:
http://kb.pulsesecure.net/InfoCenter/index?page=content&id=KB20843
I'll now go ahead and figure out if it actually was this feature keeping the users locked out.
cheers
Marc
Hi all
It turns out it wasn't the Anti-DoS/Lockout feature that hurt us, as there were no log messages as described in http://kb.pulsesecure.net/InfoCenter/index?page=content&id=KB16569.
The users were being constantly logged out and sessions got terminated after a few seconds because of the non-uniqe GUIDs of the Junos Pulse Clients (which were disk-cloned instead of properly installed).
The topic is described in
http://kb.pulsesecure.net/InfoCenter/index?page=content&id=KB16569
http://kb.pulsesecure.net/InfoCenter/index?page=content&id=KB25581
cheers and thx to the JTAC engineer helping us out.
Marc