Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Background periodic publisher doesn't recover from a network exception #8

Open
coderfromhere opened this issue Apr 9, 2021 · 2 comments

Comments

@coderfromhere
Copy link

coderfromhere commented Apr 9, 2021

If a trace collector is temporarily down, a background thread that tries to reach it is expected to survive flushSpans throwing ConnectionFailure:

HttpExceptionRequest Request {
  host                 = "localhost"
  port                 = 9411
  secure               = False
  requestHeaders       = [("content-type","application/json")]
  path                 = "/api/v2/spans"
  queryString          = ""
  method               = "POST"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
}
 (ConnectionFailure Network.Socket.connect: <socket: 54>: does not exist (Connection refused))
@mtth
Copy link
Owner

mtth commented Apr 11, 2021

Agreed - the current behavior is not great. The background thread should fail the whole process on error (relevant read) or continue publishing.

In the meantime here are a couple suggestions to work around this:

  1. Call publish manually with adequate error handling.
  2. (Untested) Specify a custom request manager which retries on a subset of exceptions.

@avanov
Copy link

avanov commented Apr 12, 2021

The background thread should fail the whole process on error (relevant read) or continue publishing.

Right, the only viable option in case of backend daemons is to carry on with (or without) delayed retrying to send the same payload again, as failing the entire process is hardly desirable. How about performing another forkIO with a retrying-only closure upon receiving a network exception? The number of retries could then be configured similarly to settingsPublishPeriod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants