-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Call-Home #578
Comments
Slightly improved workaround, with 3 retries and disabled logging...
PS: I see in #561 that there was similar error reported, and so I adopted their idea to disable the errors during the call-home attempt. |
@sjd-xlnx, sorry for the long delay. Would you like to propose a PR? I'd also comment that a 500ms timeout (as you cite the netopeer agent to use) for attempting a call-home operation is quite aggressive and makes a number of assumptions about network topology and the prevailing RTT. |
Hello,
I have an issue with the Call-Home procedure. I am using netopeer2-server NETCONF server, and ncclient NETCONF client. Sometimes, about 25%-50% of the time, when listening for call-home, I get an error with the client:
When performing call-home, the netopeer2-server cycles through it's known clients with a timeout of 500ms per attempt, and 3 attempts per client. (If there's only one client, it will keep retrying the same one over and over with 500ms gap between each attempt).
From looking at captured pcaps, it looks like the ncclient accepts the socket connect right at the edge of the server's timeout, and the server issues a RST message which causes the ncclient to throw the above error.
If I start the ncclient "call-home" listen first and then start the server "call-home" it works perfectly. It also seems to work fine on Linux. The error only exists on Windows, and only when we catch the end of the server's timeout window.
Furthermore, when it does fail, the socket remains open in listening state, which means further attempts fail with the following error
OSError(10048, 'Only one usage of each socket address (protocol/network address/port) is normally permitted', None, 10048, None)
until the socket is closed.I looked at the call_home() and connect() code in manager.py, and ended-up writing my own version to help in debugging. Here's what I ended up with in my code:
I added
srv_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
and also the call to manager.connect() is in a try/except block to catch any error and close the socket.This has greatly improved the situation. I guess I could go further and re-try the whole process in event of failure.
Are there any other suggestions for fixing this problem?
The text was updated successfully, but these errors were encountered: