Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When openobserve as syslog server, tcp connection was closed repeatly almost every 60s #4067

Closed
kmephistoh opened this issue Jul 26, 2024 · 17 comments · Fixed by #4335
Closed
Assignees
Labels
☢️ Bug Something isn't working testing In Testing

Comments

@kmephistoh
Copy link

Which OpenObserve functionalities are the source of the bug?

ingestion

Is this a regression?

Yes

Description

When openobserve as syslog server, tcp connection was closed repeatly almost every 60s, client print these annoying logs. Even so, logs can still be collected.

Jul 26 15:34:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:34:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:34:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]
Jul 26 15:34:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:34:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:34:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]
Jul 26 15:34:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:34:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:34:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]
Jul 26 15:35:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:35:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:35:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]
Jul 26 15:36:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:36:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:36:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]
Jul 26 15:36:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:36:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:36:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]
Jul 26 15:37:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:37:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:37:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]
Jul 26 15:37:16 DX-C9 rsyslogd: omfwd: remote server at 10.11.10.24:5514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2406.0 try https://www.rsyslog.com/e/2027 ]
Jul 26 15:37:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2406.0 try https://www.rsyslog.com/e/2007 ]
Jul 26 15:37:16 DX-C9 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2406.0 try https://www.rsyslog.com/e/2359 ]

When i select rsyslog as a syslog server on the same server 10.11.10.24:514, everything is ok(no annoying logs).

image

tcpdump show me, the connection was RST, so i think the reason may was the server side. no firewall or load balancer.

Please provide a link to a minimal reproduction of the bug

No response

Please provide the exception or error you saw

No response

Please provide the version you discovered this bug in (check about page for version information)

openobserve v0.10.8-rc5 
os: Ubuntu 22.04.4 LTS

Anything else?

No response

@kmephistoh kmephistoh added the ☢️ Bug Something isn't working label Jul 26, 2024
@gaby
Copy link
Contributor

gaby commented Jul 26, 2024

I'm pretty sure this has to do with TCP keepalive, are you sending syslog during those 60seconds?

@kmephistoh
Copy link
Author

I'm pretty sure this has to do with TCP keepalive, are you sending syslog during those 60seconds?

Sorry to reply too late, sometime it has log during this 60, The result is still the same。

image

As can be seen from the above test, the same log is printed each time.

@doychi
Copy link

doychi commented Aug 21, 2024

I am getting similar issuse when logging to OpenObserver as a syslog server running under Unraid 6.12.11, except I'm getting more frequent errors:

Aug 22 07:52:47 unraid01 rsyslogd: omfwd: remote server at 192.168.3.172:1514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
Aug 22 07:52:47 unraid01 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2102.0 try https://www.rsyslog.com/e/2007 ]
Aug 22 07:52:47 unraid01 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2102.0 try https://www.rsyslog.com/e/2359 ]
Aug 22 07:52:49 unraid01 kernel: eth0: renamed from veth42b5b63
Aug 22 07:52:49 unraid01 rsyslogd: omfwd: remote server at 192.168.3.172:1514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
Aug 22 07:52:49 unraid01 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2102.0 try https://www.rsyslog.com/e/2007 ]
Aug 22 07:52:49 unraid01 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2102.0 try https://www.rsyslog.com/e/2359 ]
Aug 22 07:52:49 unraid01 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethb277e8f: link becomes ready
Aug 22 07:52:49 unraid01 kernel: docker0: port 2(vethb277e8f) entered blocking state
Aug 22 07:52:49 unraid01 kernel: docker0: port 2(vethb277e8f) entered forwarding state
Aug 22 07:53:12 unraid01 kernel: veth42b5b63: renamed from eth0
Aug 22 07:53:12 unraid01 rsyslogd: omfwd: remote server at 192.168.3.172:1514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
Aug 22 07:53:12 unraid01 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2102.0 try https://www.rsyslog.com/e/2007 ]
Aug 22 07:53:12 unraid01 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2102.0 try https://www.rsyslog.com/e/2359 ]
Aug 22 07:53:12 unraid01 kernel: docker0: port 2(vethb277e8f) entered disabled state
Aug 22 07:53:12 unraid01 kernel: docker0: port 2(vethb277e8f) entered disabled state
Aug 22 07:53:12 unraid01 kernel: device vethb277e8f left promiscuous mode
Aug 22 07:53:12 unraid01 kernel: docker0: port 2(vethb277e8f) entered disabled state
Aug 22 07:53:12 unraid01 rsyslogd: omfwd: remote server at 192.168.3.172:1514 seems to have closed connection. This often happens when the remote peer (or an interim system like a load balancer or firewall) shuts down or aborts a connection. Rsyslog will re-open the connection if configured to do so (we saw a generic IO Error, which usually goes along with that behaviour). [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
Aug 22 07:53:12 unraid01 rsyslogd: action 'action-2-builtin:omfwd' suspended (module 'builtin:omfwd'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2102.0 try https://www.rsyslog.com/e/2007 ]
Aug 22 07:53:12 unraid01 rsyslogd: action 'action-2-builtin:omfwd' resumed (module 'builtin:omfwd') [v8.2102.0 try https://www.rsyslog.com/e/2359 ]

@hengfeiyang
Copy link
Contributor

@kmephistoh @doychi do you mind we give you a dev version for help testing the fix?

@doychi
Copy link

doychi commented Aug 27, 2024 via email

@hengfeiyang
Copy link
Contributor

@doychi we only have docker image for dev version. maybe let's wait for some days, we will release a binary version then i will back to you.

@hengfeiyang hengfeiyang reopened this Aug 27, 2024
@kmephistoh
Copy link
Author

@kmephistoh @doychi do you mind we give you a dev version for help testing the fix?
ok, i'm waiting here.

@hengfeiyang
Copy link
Contributor

@kmephistoh can you try this version:

public.ecr.aws/zinclabs/openobserve-dev:v0.11.0-rc3-e7e01c5

@kmephistoh
Copy link
Author

public.ecr.aws/zinclabs/openobserve-dev:v0.11.0-rc3-e7e01c5
image
Can't download

@hengfeiyang
Copy link
Contributor

@kmephistoh this is docker image, you can access it by browser, you can use

docker pull public.ecr.aws/zinclabs/openobserve-dev:v0.11.0-rc3-684dd39

or directly run it:

docker run -tdi --name o2-test -e ZO_ROOT_USER_EMAIL="[email protected]" -e ZO_ROOT_USER_PASSWORD="Complexpass#123" -e ZO_DATA_DIR="/data" -p 5080:5080 -p 5514:5514 public.ecr.aws/zinclabs/openobserve-dev:v0.11.0-rc3-684dd39

Please try this version, we added some debug logs.

@doychi
Copy link

doychi commented Aug 30, 2024

I have installed the image and will keep an eye on it today and let you know how it goes.

@doychi
Copy link

doychi commented Sep 1, 2024

The new image seems to have resolved the issue for me.

@doychi
Copy link

doychi commented Sep 1, 2024

Can you let me know when this is likely to make it the latest stable release, so I can switch back to the stable line?

@hengfeiyang
Copy link
Contributor

sure, will make release next week

@kmephistoh
Copy link
Author

kmephistoh commented Sep 2, 2024

@kmephistoh this is docker image, you can access it by browser, you can use

docker pull public.ecr.aws/zinclabs/openobserve-dev:v0.11.0-rc3-684dd39

or directly run it:

docker run -tdi --name o2-test -e ZO_ROOT_USER_EMAIL="[email protected]" -e ZO_ROOT_USER_PASSWORD="Complexpass#123" -e ZO_DATA_DIR="/data" -p 5080:5080 -p 5514:5514 public.ecr.aws/zinclabs/openobserve-dev:v0.11.0-rc3-684dd39

Please try this version, we added some debug logs.

I try this version, the issue's problem was resolved. But other normal log are loss sometime.
Thank you for your help

@hengfeiyang
Copy link
Contributor

We released v0.11.0, both of you can test it now.

@kmephistoh
Copy link
Author

kmephistoh commented Sep 4, 2024

We released v0.11.0, both of you can test it now.
fc2df0a3-003d-4a92-b86c-81b4436cb7ff

The problem was resolved, thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
☢️ Bug Something isn't working testing In Testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants