I started getting numerous errors yesterday on my IVE. Nothing had changed recently and I confirmed it wasn't due to heavy traffic and the firewall is fine.
Minor NET24467 2008-11-18 16:54:17 - ive - [127.0.0.1] System() - internal gateway 'x.x.x.x' up.
Minor NET24467 2008-11-18 14:45:13 - ive - [127.0.0.1] System() - internal gateway 'x.x.x.x' up.
Minor NET24467 2008-11-18 09:26:18 - ive - [127.0.0.1] System() - internal gateway 'x.x.x.x' up.
Minor NET24467 2008-11-18 09:07:51 - ive - [127.0.0.1] System() - internal gateway 'x.x.x.x' up.
Minor NET24467 2008-11-18 07:42:03 - ive - [127.0.0.1] System() - internal gateway 'x.x.x.x' up.
Everytime this happens all traffice slows down to a crawl for about 5-10 minutes and the IVE doesn't log user/usage stats for that period.
I did notice the User Access log was full, but emptied it and it still happened last night once.
I also noticed this event from yesterday:
Critical 2008/11/18 09:19:01 - [127.0.0.1] - System() - Trace Info : * assertion in assert.cc:281, void DSLogSignalHandler(int), SIGSEGV, 13 frames [0xffffe420] /home/bin/web [0x807384e] [0xffffe420] /home/bin/web [0x8068e5f] /home/bin/web [0x8069228] /home/bin/web [0x8069439] /home/builds/bld13073/install/lib/libdsplibs.so [0xb77e69d6] /home/builds/bld13073/install/lib/libdsplibs.so(_ZN9DSEvntFds13runDispatcherEv+0x65) [0xb77e6c8d] /home/bin/web [0x8073df9] /lib/libc.so.6(__libc_start_main+0xa1) [0xb7c452d1] /home/bin/web [0x8060885] 0012a000-0013f000 r-xp 00000000 07:08 112742 /lib/ld-linux.so.2 0013f000-00140000 r--p 00015000 07:08 112742 /lib/ld-linux.so.2 00140000-00141000 rw-p 00016000 07:08 112742 /lib/ld-linux.so.2 0026f000-00271000 r-xp 00000000 07:08 112736 /lib/libuuid.so.1 00271000-00272000 rw-p 00001000 07:08 112736 /lib/libuuid.so.1 00294000-00296000 r-xp 00000000 07:08 ......(its very long, I won't post it all )
Critical 2008/11/18 09:19:01 - [127.0.0.1] - System() - Caught signal 11 (SIGSEGV)
Critical 2008/11/18 09:18:37 - System() - Watchdog restarting services (ws).
What does that mean?
seems like a memory dump of some sort. I would assume you dont have another 4000 kicking around you could switch out ? I would guess some kind of hardware problem but that's a guess.
Normally you should see also a system snapshot that was automatically generated under Maintenance -> Troubleshooting -> System Snapshot.
I'd download that + save the log (I'd save all logs as per my experience jtac allways wanted all logs eventhough from my point of view only e.g. the eventlog was needed) and open a jtac case.
We have already seen similar messages inthe Event log of our active/passive SA4000 cluster.
It is even worse, the passive does not take over VPN traffic unless we manually kick the primary node of the network.
After creating a ticket with Juniper and some analyses of our own we discoverd that in most of the cases the issue was caused by the fact that:
- the debug log was active (or)
- a policy trace was still running
This could also be caused by other troubelshooting processes, our SA4000 system has +-700 concurrent users on peak hours.
I can confirm what you said about the policy trace still running, I've just seen exactly the same symptoms and error and there was a policy trace that had probably been left running for weeks!
I have the same problem.
I use version 6.1r1-1
I contacted JTAC about this.
I looked in older release notes. There are more problems with watchdog. There where fix i new releases....