cancel
Showing results for 
Search instead for 
Did you mean: 

Cluster-Problems - Node 'Junp_Backup' is now unreachable from node 'Junp_Master'

rdit_
Regular Contributor

Cluster-Problems - Node 'Junp_Backup' is now unreachable from node 'Junp_Master'

Hi,

we have some sporadic problems with our cluster. without any changes made to the SA's, they are losing connection to the other clusternode for like 2 minutes. after that time everything works well again. its like 2 times a day. The log shows the following messages. IT BEGINS AT THE BOTTOM, means last entry at the top:

687949 SA-4000-10 local0 2009-04-28 15:58:37 Juniper Juniper: 2009-04-28 15:58:37 - Junp_Backup - [127.0.0.1] System()[] - Completed syncing state (33554436)
687950 SA-4000-10 local0 2009-04-28 15:58:37 Juniper Juniper: 2009-04-28 15:58:37 - Junp_Backup - [127.0.0.1] System()[] - Activated in cluster: 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'
687921 SA-4000-10 local0 2009-04-28 15:58:32 Juniper Juniper: 2009-04-28 15:58:32 - Junp_Master - [127.0.0.1] System()[] - Node 'Junp_Backup' activated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
687922 SA-4000-10 local0 2009-04-28 15:58:32 Juniper Juniper: 2009-04-28 15:58:32 - Junp_Backup - [127.0.0.1] System()[] - Started syncing state
687904 SA-4000-11 local0 2009-04-28 15:58:30 Juniper Juniper: 2009-04-28 15:58:30 - Junp_Backup - [127.0.0.1] System()[] - Node 'Junp_Master' activated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
687905 SA-4000-10 local0 2009-04-28 15:58:30 Juniper Juniper: 2009-04-28 15:58:30 - Junp_Backup - [127.0.0.1] System()[] - Node 'Junp_Master' activated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
687906 SA-4000-10 local0 2009-04-28 15:58:30 Juniper Juniper: 2009-04-28 15:58:30 - Junp_Master - [127.0.0.1] System()[] - Node 'Junp_Backup' activated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
687907 SA-4000-11 local0 2009-04-28 15:58:30 Juniper Juniper: 2009-04-28 15:58:30 - Junp_Backup - [127.0.0.1] System()[] - Node 'Junp_Backup' deactivated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
687908 SA-4000-10 local0 2009-04-28 15:58:30 Juniper Juniper: 2009-04-28 15:58:30 - Junp_Master - [127.0.0.1] System()[] - Node 'Junp_Backup' deactivated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
687853 SA-4000-10 local0 2009-04-28 15:58:23 Juniper Juniper: 2009-04-28 15:58:23 - Junp_Master - [127.0.0.1] System()[] - Node 'Junp_Backup' is now reachable from node 'Junp_Master'
686728 SA-4000-10 local0 2009-04-28 15:56:05 Juniper Juniper: 2009-04-28 15:56:02 - Junp_Master - [127.0.0.1] System()[] - Node 'Junp_Backup' is now unreachable from node 'Junp_Master'
686709 SA-4000-10 local0 2009-04-28 15:56:02 Juniper Juniper: 2009-04-28 15:56:00 - Junp_Master - [127.0.0.1] System()[] - Node 'Junp_Backup' deactivated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
686702 SA-4000-11 local0 2009-04-28 15:56:01 Juniper Juniper: 2009-04-28 15:56:01 - Junp_Backup - [127.0.0.1] System()[] - Node 'Junp_Master' is now reachable from node 'Junp_Backup'
686625 SA-4000-10 local0 2009-04-28 15:55:49 Juniper Juniper: 2009-04-28 15:53:37 - Junp_Backup - [127.0.0.1] System()[] - Activated in cluster: 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'
686313 SA-4000-10 local0 2009-04-28 15:54:57 Juniper Juniper: 2009-04-28 15:53:37 - Junp_Backup - [127.0.0.1] System()[] - Completed syncing state (328673)
685832 SA-4000-11 local0 2009-04-28 15:53:47 Juniper Juniper: 2009-04-28 15:53:47 - Junp_Backup - [127.0.0.1] System()[] - Node 'Junp_Master' is now unreachable from node 'Junp_Backup'
685817 SA-4000-11 local0 2009-04-28 15:53:44 Juniper Juniper: 2009-04-28 15:53:44 - Junp_Backup - [127.0.0.1] System()[] - Node 'Junp_Master' deactivated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
685617 SA-4000-10 local0 2009-04-28 15:53:14 Juniper Juniper: 2009-04-28 15:53:11 - Junp_Backup - [127.0.0.1] System()[] - Node 'Junp_Master' activated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
685618 SA-4000-10 local0 2009-04-28 15:53:14 Juniper Juniper: 2009-04-28 15:53:13 - Junp_Master - [127.0.0.1] System()[] - Node 'Junp_Backup' activated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
685619 SA-4000-10 local0 2009-04-28 15:53:14 Juniper Juniper: 2009-04-28 15:53:14 - Junp_Master - [127.0.0.1] System()[] - Node 'Junp_Backup' deactivated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
685620 SA-4000-10 local0 2009-04-28 15:53:14 Juniper Juniper: 2009-04-28 15:53:14 - Junp_Master - [127.0.0.1] System()[] - Node 'Junp_Backup' activated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
685621 SA-4000-10 local0 2009-04-28 15:53:14 Juniper Juniper: 2009-04-28 15:53:13 - Junp_Backup - [127.0.0.1] System()[] - Node 'Junp_Backup' deactivated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
685622 SA-4000-10 local0 2009-04-28 15:53:14 Juniper Juniper: 2009-04-28 15:53:14 - Junp_Master - [127.0.0.1] System()[] - Node 'Junp_Backup' deactivated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
685626 SA-4000-10 local0 2009-04-28 15:53:14 Juniper Juniper: 2009-04-28 15:53:14 - Junp_Master - [127.0.0.1] System()[] - Node 'Junp_Backup' activated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
685627 SA-4000-10 local0 2009-04-28 15:53:14 Juniper Juniper: 2009-04-28 15:53:14 - Junp_Backup - [127.0.0.1] System()[] - Started syncing state
685597 SA-4000-11 local0 2009-04-28 15:53:11 Juniper Juniper: 2009-04-28 15:53:11 - Junp_Backup - [127.0.0.1] System()[] - Node 'Junp_Master' activated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
685598 SA-4000-11 local0 2009-04-28 15:53:11 Juniper Juniper: 2009-04-28 15:53:11 - Junp_Backup - [127.0.0.1] System()[] - Node 'Junp_Backup' deactivated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
685545 SA-4000-10 local0 2009-04-28 15:53:03 Juniper Juniper: 2009-04-28 15:53:03 - Junp_Master - [127.0.0.1] System()[] - Node 'Junp_Backup' is now reachable from node 'Junp_Master'
684738 SA-4000-10 local0 2009-04-28 15:51:12 Juniper Juniper: 2009-04-28 15:51:12 - Junp_Master - [127.0.0.1] System()[] - Endpoint Assurance: [Connection ID: 3] NotifyConnectionChange returned error.
684631 SA-4000-10 local0 2009-04-28 15:50:55 Juniper Juniper: 2009-04-28 15:50:55 - Junp_Master - [127.0.0.1] System()[] - Endpoint Assurance: [Connection ID: 7] NotifyConnectionChange returned error.
684608 SA-4000-10 local0 2009-04-28 15:50:52 Juniper Juniper: 2009-04-28 15:50:51 - Junp_Master - [127.0.0.1] System()[] - ReceiveMessage: Invalid Connection ID.

684493 SA-4000-10 local0 2009-04-28 15:50:37 Juniper Juniper: 2009-04-28 15:49:04 - Junp_Master - [127.0.0.1] System()[] - Node 'Junp_Backup' deactivated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.
684494 SA-4000-10 local0 2009-04-28 15:50:37 Juniper Juniper: 2009-04-28 15:49:14 - Junp_Master - [127.0.0.1] System()[] - Node 'Junp_Backup' is now unreachable from node 'Junp_Master'
684486 SA-4000-11 local0 2009-04-28 15:50:36 Juniper Juniper: 2009-04-28 15:50:36 - Junp_Backup - [127.0.0.1] System()[] - Node 'Junp_Master' is now reachable from node 'Junp_Backup'
683180 SA-4000-11 local0 2009-04-28 15:46:55 Juniper Juniper: 2009-04-28 15:46:55 - Junp_Backup - [127.0.0.1] System()[] - Node 'Junp_Master' is now unreachable from node 'Junp_Backup'
683135 SA-4000-11 local0 2009-04-28 15:46:47 Juniper Juniper: 2009-04-28 15:46:47 - Junp_Backup - [127.0.0.1] System()[] - Node 'Junp_Master' deactivated in cluster 'Junp_Cluster [Junp_Backup, 192.168.1.2] [Junp_Master, 192.168.1.1]'.

the nodes itself are reachable all the time. only in the cluster they dont see each other for some reason.

I really dont have an idea what to do here. any help?

Version 6.0R3

Message Edited by rdit on 04-28-2009 11:13 PM
2 REPLIES 2
Felix-BDG_
Occasional Contributor

Re: Cluster-Problems - Node 'Junp_Backup' is now unreachable from node 'Junp_Master'

Dear rdit,

I am not a expert but I had this problem at customer site some days ago.

Within a cluster I reboot SA-1; SA-2 was master. As SA-1 was up again the device could not re-join the cluster. The same entries in the log for more then 30 minutes.

This caused a continuous automatic failover between the nodes. To solve this I shut down SA-1; SA-2 was master and every think fine.

The customer checked their switch and router environment and made some changes. After that SA-1 was started again. Now everythink works fine.

Maybe you can check the switch regarding fast port detection or spanning tree.

Regards,

Felix

rdit_
Regular Contributor

Re: Cluster-Problems - Node 'Junp_Backup' is now unreachable from node 'Junp_Master'

hey, thanks for your reply!

i checked the switch and i can see 150 giants on the switchport, but thats it.

today i had that behaviour again for like 6-7 minutes. both nodes are pingable and the cluster IP aswell, but they do not serve any users anymore, you can not connect to the ive's, not as admin, not as user. even console is really slow.

after a while it works again for some reason...but im pretty scared of that....cause thats not like a cluster should work. if one fails and the other starts working - OK, but if both nodes are available and the cluster does not serve any users and nothing is working - then the cluster is kinda useless.

any ideas on that?