I was wondering if someone could point me at any documentation about the error log format for the Stingray Traffic Manager.
We have turned on error logging for some of our services. We now get quite a lot of output in /usr/local/zeus/log/errors.
I would like to be able to interpret this information as see if there were any system tunables that could be changed to reduce the error rates.
Specifically we are seeing large numbers of lines like below (removed IPs and service name):
[04/Jan/2012:02:38:46 +0000] INFO ServiceName connerror "Timing out new connection" REMOTEIP:25365 LOCALIP:80 "-" "-" - "-" R 0 0 0 0 16 17 0 0 - - -
Thanks for any info.
The format of the log messages is as follows:
The leading letter is interpreted as:
'a' - 'Closed'
'f' - 'Complete'
'R' - 'Client Read'
'W' - 'Client Write'
'X' - 'Request Rules'
'q' - 'Queued'
'c' - 'Node Connect'
'w' - 'Node Write'
'r' - 'Node Reading'
'C' - 'Client Close'
'K' - 'Keep-alive'
'I' - 'Idle'
The numbers each relate to the following, in this order:
Bytes from client
Bytes to server
Bytes from server
Bytes to client
Client Idle Time
Server Idle Time
So in your case, you have a client read which sent no data in any direction because it timed out.
I have the same problem on my site (i.e. lots and lots of "Timing out new connection" errors) ; actually I am not sure whether it is a real problem or some logging artifact that can be ignored.
This btw is a good question for a community site, so why don't we compare our numbers?
In my case I have a ratio between 1 : 30-50
i.e.for every 30-50 requests on a particular vhost on my Trafficmanager, I get 1 "timing out" error.
What are your numbers? Everybody else reading this - can you compare and tell us your findings?
Enable logging temporarily in
Virtual Servers > SERVERNAME > Connection Management >Connection Error Settings> log!client_connection_failures
My sites are primarily B2C sites (news, entertainment) so I have lots of residential users with dialup/DSL ;
further research has shown that most of the logged IPs had hostnames like dsl-something, cable-something etc., sounding like they were used for residential access.
Over the years, I have learned to safely ignore all those zillions of "connection reset by peer" errors found in the log of every busy webserver; that's something that just happens since you can't expect that each and every connection can be handled correctly. ""Timing out new connection"", is obviously a Zeus-specific message, so I am lacking this experience.
I have opened a support case for this issue and so far, they recommended to check out the listen queue size and other
tcp parameters as described in <a target="_blank" href="http://blogs.riverbed.com/stingray/2005/09/tuning-zeus-traffic-manager-for-maximum-performance.html"... That , however, has had no effect on the number of error messages.
>>So in your case, you have a client read which sent no data in any direction because it timed out.
Sounds like Chris is assuming a generic network related communication problem. If I knew that other sites have
the same percentage of log entries, I would think that they probably can be ignored.
Hi Michael, Ulrich,
These errors are related to client behaviour and in most cases, can be safely ignored.
The 'Timing out new connection' error is raised if the traffic manager accepts a new connection, but the client does not send any data within the virtual server 'connect_timeout' period (default 10 seconds). The traffic manager will close these connections to free up resources (generally a file descriptor and a few KB of memory).
You can demonstrate this by opening a telnet session to the traffic manager and not writing anything down it; after 10 seconds, the traffic manager will close the session and write the log (if you have the log!client_connection_failures setting enabled).
If you are concerned about the number of errors, you can increase connect_timeout, but this will probably not make any difference; any client that fails to write any data within 10 seconds is unlikely to do so in 30, or 60 seconds. These problems probably arise because the client's network is extemely congested or unreliable (interesting to note that many of the affected clients appear to come from residential dsl or cable connections), or they may be due to network probes or port scans.
One thing to highlight - some protocols such as FTP begin with a server hello statement. The client will wait until it hears the server hello before writing any data. If you are managing such a protocol, and you misconfigure the traffic manager to use a client-first protocol (http://community.riverbed.com/t5/Answers/Server-first-and-Client-first-Protocols/td-p/16478) then the connection will stall and you'll get this error every time.
Just a short follow-up for those who may find this thread later by looking for the "timing out new connection" message:
Like I said I had opened a support case to find out what was going on. Eventually, I received the
following explanation which seems logical to me:
Quoted from Support message -
The error messages you are seeing normally doesn't' indicate any problem, I will explain why.
Take this one for example:
30/Mar/2012:15:37:06 +0200] INFO vservers/minisites connerror "Timing out new
connection" 126.96.36.199:49219 188.8.131.52:80 "-" "-" - "-" R 0 0 0 0 11 12 0 0 - - -
This shows that the client establishes a TCP connection (after 3-way-handshake), and then
don't send any request on that established connection, the connection then is closed after the
timeout is hit. The 3 consecutive "-" normally shows the data sent from/to client and back-end
but here they are empty. One reason for such connections is the way Google Chrome works, which
establishes two connections every time a user tries to request a URL. This is a browser
feature just to speed up the process, i.e. if the first tcp connection is lost the second is
tried to send a request to the server, and if one of the connections succeeds the second is
ignored and hence on the server-side the connection times out. You can search this on the
internet, just an example see
<a target="_blank" href="http://stackoverflow.com/questions/4460661/what-to-do-with-chrome-sending-extra-requests.">http://stackoverflow.com/questions/4460661/what-to-do-with-chrome-sending-extra-requests.</a>
was just one example that I'm aware of that is causing these errors, but it could be some<br>other client using same technique or could be a buggy client.
end of quote -
This would explain the constant ratio between hits and error messages, and it also clarifies the reason what the log message means. My personal conclusion is to ignore theses messages in the future, like we all ignore those "connection reset by peer" and others. And BTW the support guys also recommended to enable the 'log!client_connection_failures' only temporarily for debugging.