We have 2 x SSL Juniper 4500 Fips in Active mode version 6.5R2 (build 14951). We use netconnect with no split tunnelling.
We have a simple setup as when we connect we go to a URL for Citrix and browse applications from there. We also have Fat client outlook active. User are complaining that they get disconnected at random times but it still says connected and some go into a disconnected state. We have found we can reproduce the problem if we open lots of applications in Citrix or we try to stream the BBC in citrixs, or if we try to pull down files from citrix to the locate laptop. It appears the more bandwth we use causes netconnect to freeze and sometimes reconnect.
We have a call open with JTAC and this was there response ' SSL is not as reliable as oNCP ESP4500 UDP. But we can not use that protocol on a Fips device.
Our external web link is 20meg and we are only using 10meg.
Has anyone else had this problem?
We have a somewhat similar problem. 2 x SA6000 FIPS. For us it is the WSAM session that appears to freeze or lock up. The interesting thing is that it only happens at one office. One would think that this indicates a network problem at that office and yet all of their other applications are fine at the time of freeze. Also, if we take one of these users and have them bypass the Juniper to access the application they run into no problems whatsoever. If we direct one of the users to our lab 2 x SA2000 cluster setup (in order to cut down on the amount of trace data) running the same level of code and definitions they also cannot reproduce the problem there. I have been poking around the TCP Dump taken when one of these events happened and I see the SSL-VPN on a number of occasions setting the TCP Window Size to zero to the backend application but I have no method of mapping that backend (internal interface connection) to the user on the external side reporting the problem so it may be a red herring. Like you the problem seems to occur when they are busy doing lots of transactions but the bandwidth utilization is still very very low. I also have a case open but it appears to be languishing. Unfortunately I have no answers and can only commiserate with you at this time.
If the SA is sending packets to reduce the tcp window size to zero is it possible that one of the recieve buffers on the SA is getting full. There was a recent bug that was fixed which would cause such symptoms i.e. specific connections between SA and WSAM client would freeze which ultimately result in full recive buffers on SA side (as backend traffic is not being sent to Client) and the full recieve buffers trigger the zero window packets to backend. The fix is being targeted for 6.5R6.
Below are some of the conditions under which one may expereince the issue of the SA sending TCP Zero Window packets to backend:
1. End users accessing backend applications via WSAM
2. SA running 6.5R1 or higher
And I should mention that the issue was very sporadic so not all users may expereince it.
Note: I'm not sure if this is exactly what your end users are expereincing as you have not been able to tie the tcp zero window packets to that user, however if the time of occurunce and backend servers that were being accessed match with the tcp zero window packets then maybe this is the culprit in you case.
6.5R6 is scheduled for August, however please work with JTAC to get the dates and release notifcation.
Hope it helps!
Thank you for this informaton ruc.
Your description of the problem seems to be a potential match. Is there some kind of reference number or case number I can refer JTAC to which would help speed things along?
Now having said the above there are two very odd things about what we are seeing.
1. It is only happening for users in one office location.
The networking group has practically replaced every network device there but that has not changed
anything. It was running clean prior to the change and it is running clean after. If the user is set up
so they can bypass the SSL-VPN then they have no trouble. All other applications work fine.
2. If we direct some of the affected users to our lab SSL-VPN (2 x SA2000, active/passive -
matching our 2 x SA6000 FIPS active/passive) then they work just fine. Now given that the
problem is sporadic this may not indicate much but even today before the user went over
to the lab SSL-VPN they were getting hung >5 times - after moving over to the lab - zero hangs.
Thank you again for this information. I really hope it is related.
As the issue would get triggered based on timing of certain events (the rate at which SA recieved data from backend, the rate at which SA sent data to the client, etc) I'm not surprised by this erratic behaviour of the issue only occuring on the production device and not on lab device. BTW the internal ref # is 521275 and you should be able to spot this in the release notes for 6.5R6 as well (when released)
Can you please send me your open JTAC case # via a private message?
Thanks for your comments Colin, Ruchit.
Whilst our problem is with NetConnect, I have tested IE via Citrix -> WSAM and found that the window froze and Citrix went into a 'Reconnecting' state. At the same time, Outlook connectivity was absolutely fine.
Based on these results, is it possible that the problem could be the TCP Window Size via Network Connect too... too much bandwidth even though we have very few users connecting via this method as it is verging on unusable for some people? Or is this limited to WSAM only?
Its hard to say for sure but based on the description it seems like the issue Colin is running into and the one you are describing are different. And the issue that Colin ran into has only been reported by WSAM users.
If you are able to replicate the issue then JTAC should be able to provide you detailed steps of what tcp traces and debug logs you can collect on the SA to get to the bottom of your issue.
We have had a case open for about a month - probably longer infact and all traces etc have been provided on a number of occasions.
Just thinking outside the box and wondered if others may have faced the same problem.
Hi Matt, Ruc
We only see it happening to WSAM users as we do not run Network Connect. Currently the only people using the SSL-VPN are accessing a backend web app via WSAM. (WSAM because of rewriting issues)
Alas it looks like the window closing PR may not be applicable in our case. I have half a dozen other traces now and the IVE closing the window shows up in only one of them.
We are planning on failing over to the currently passive member of the cluster on Wed evening to see if that helps. (That is, is the problem with the hardware somewhere (nic?) or is it because of some rot in memory)
JTAC has finally responded to my pings so maybe we will see some traction now.