This is just a guess, but I suspect that you're resolving to different addresses part way through the process so you're sending packets initially to box-A and then, part way through the process, you start sending other packets required to complete the process to box-B. Since box-B didn't see the first part of the exchange, it will probably drop the packets as part of an unrecognised session and box-A will sit there waiting patiently for the packets to arrive. When it works, I suspect that you've tried to create the connection to box-B and eventually given up and re-resolved and gone back to box-A, which says "where have you been" and finally completes the login.
To be honest, I really don't think that DNS round robin is a viable approach in this situation. You need something like Global Server Load Balancing that has sticky state for each session. This will guarantee that sessions are spread across the boxes but each session sticks to just one box.
I've used DNS Round Robin before for various things. It is the poor mans load balancer and it usually works well. For example, we utilize it with MX Records for SMTP. When someone tries to resolve our mail record, then get 1 of 2 possible IP Addresses. There is no 10 minute delay.
What should happen is, the client should query DNS, it should resolve to 1 of the 2 IP Addresses and it should then start a conversation with that IP.
I'd like to see a trace of the conversation. I'd make sure the DNS Cache is cleared first. "ipconfig /flushdns".
I've never heard of anyone setting up an IVE with DNS Round Robin. I wonder if the client gets confused.. Are they both using the same certificate ?
DNS round robin does work well, until there are devices that maintain state that you're round robining through. If you maintain state, you're going to have issues because you may well end up being pushed to different boxes, particularly if the device you are conversing with is a web server returning HTML containing numerous links to other *hostnames*. If you get sent to the hostname, you may well try to resolve it and end up somewhere else. :-(
With mail, you resolve the mailserver's name once and just setup a session to the IP address. That session doesn't return any information that requires another lookup so you're generally stuck to the one address for the duration of the session.
The best approach for this with any web based device is a load balancer that understands stickiness.
Are you using "clustered" SA's, or two "stand-alone" devices?
I'm not really how much session information is passed in the cluster, but I expect it would make a difference....
DNS Round Robin:
We have 2 SA4000 IVEs in different geographical locations. Each physical IVE has different external IPs. Our external DNS host entry (host.domain.com) has both devices' IPs associated with it. DNS by nature should alternate between the addresses, so every other request should be directed to a different IVE. This has worked well for us. We see a 55% to 45% load balance between the IVEs. We use IE to initiate\authenticate sessions. If an IVE is unavailable and it is the IVE that DNS directs the request to, IE will attempt the request again. The next time, the request should be directed to the other address. The sign-on page may take a few seconds longer to load because of the time it takes IE to make the second request, but we've never had anyone complain since connectivity is always available.
We have a true IVE Active\Active cluster. They are not stand-alone. So, session information spans both IVEs.