Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reconnect to ZooKeeper service, session expired #7564

Open
QiulanHuang opened this issue May 3, 2024 · 5 comments
Open

Unable to reconnect to ZooKeeper service, session expired #7564

QiulanHuang opened this issue May 3, 2024 · 5 comments

Comments

@QiulanHuang
Copy link

Dear all,

Recently, we noticed some pools failed to reconnect to ZooKeeper Service complaining the session expired. It needs to restart the pool to fix it. The error log is listed below.

03 May 2024 10:11:46 (System) [] Session 0x10041c40b758d24 for server dczoo01.usatlas.bnl.gov/10.42.34.241:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
org.apache.zookeeper.ClientCnxn$SessionExpiredException: Unable to reconnect to ZooKeeper service, session 0x10041c40b758d24 has expired
        at org.apache.zookeeper.ClientCnxn$SendThread.onConnected(ClientCnxn.java:1432)
        at org.apache.zookeeper.ClientCnxnSocket.readConnectResult(ClientCnxnSocket.java:154)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:86)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1283)
03 May 2024 10:11:46 (System) [] Invalid config event received: {server.2=dczoo03.usatlas.bnl.gov:2888:3888:participant, server.1=dczoo02.usatlas.bnl.gov:2888:3888:participant, server.0=dczoo01.usatlas.bnl.gov:2888:3888:participant, version=0}

We are using dCache 9.2.17. The issue also happened in the old version 9.2.6, btw.
It's not clear the root cause. I didn't noticed the network problem between the pools and the Zookeeper servers.

Regards,
Qiulan

@lemora
Copy link
Member

lemora commented May 7, 2024

Hi Qiulan.

Which ZooKeeper version are you using? You have a cluster of 3 standalone ones, right?
The line Invalid config event received indicates that you ZooKeeper configuration is wrong, please check it.
Since 6.2, dCache requires the associated ZooKeeper servers to be of version 3.5 or above, but with ZooKeeper 3.5.0 the configuration format has changed. Previously, the zoo.conf files could contain a separate line stating the client port, clientPort=2181, which now needs to be at the end of the line: server.1=os-zk-node01:2888:3888;2181. As far as I know, newer versions were backward compatible and this was just a warning, but maybe this has changed.

As to the SessionExpiredException: Unable to reconnect to ZooKeeper service: is your zookeeper cluster stable, is there anything logged on the zk side?

Lea

@QiulanHuang
Copy link
Author

Hello Lea,

Thank you for your reply.

We are using ZooKeeper 3.6.3. The logs on ZK side shows the session expired too.

2024-05-03 10:11:46,254 [myid:0] - INFO  [QuorumPeer[myid=0](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@1059] - Invalid session 0x10041c40b758d24 for client /10.42.38.120:43812, probably expired

Thanks,
Qiulan

@cfgamboa
Copy link

cfgamboa commented May 8, 2024

Yes we have a cluster of three nodes and it is stable

@kofemann
Copy link
Member

@QiulanHuang Does all dcache components run 9.2.17?

@QiulanHuang
Copy link
Author

Hello @kofemann We are running in mixed mode now. Most of the nodes are running 9.2.17 and around 10 new added pool nodes are running 9.2.20.

Thanks,
Qiulan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants