Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up timeouts in the conn and add :max_concurrent_requests_per_connection #358

Merged
merged 7 commits into from
Mar 6, 2024

Conversation

whatyouhide
Copy link
Owner

@@ -392,6 392,20 @@ defmodule Xandra do
*Available since v0.18.0*.
"""
],
max_concurrent_requests_per_connection: [
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is here so that we can test it.

# Reset in-flight requests and timed out stream IDs. We just disconnected, so all the
# in-flight requests (including the timed-out ones) are now invalid and the server should
# have killed them anyway.
data = %__MODULE__{data | in_flight_requests: %{}, timed_out_ids: %{}}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this, we don't have to clean things up when disconnected.

lib/xandra/connection.ex Show resolved Hide resolved
Comment on lines 764 to 769
{nil, data} when is_map_key(timed_out_ids, stream_id) ->
:telemetry.execute(
[:xandra, :debug, :received_timed_out_response],
%{},
telemetry_meta(data, %{stream_id: stream_id})
)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to flatten the nesting.

@@ -265,8 263,7 @@ defmodule Xandra.Connection do
{:error, {:connection_crashed, reason}}
after
timeout ->
:telemetry.execute([:xandra, :client_timeout], %{}, telemetry_metadata)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this, clients can measure this themselves for now if they want. I want to first run this code in your codebase and see how things go before making this a public Telemetry event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can test this. And yeah, it is measurable from the client ofc. but I thought that it might be an interesting metric.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harunzengin it probably will be in the future, but I want to avoid committing to an API for the telemetry event until we're sure.

# for it that timed out on the caller's side. Let's just emit a
{nil, data} when is_map_key(timed_out_ids, stream_id) ->
:telemetry.execute(
[:xandra, :debug, :received_timed_out_response],
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, let's keep this a "debug" event so that you folks can measure this but we don't have to make it part of the public API.

Copy link
Contributor

@harunzengin harunzengin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on my side, I'll give this a try on our staging system.

@@ -265,8 263,7 @@ defmodule Xandra.Connection do
{:error, {:connection_crashed, reason}}
after
timeout ->
:telemetry.execute([:xandra, :client_timeout], %{}, telemetry_metadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can test this. And yeah, it is measurable from the client ofc. but I thought that it might be an interesting metric.

Copy link
Contributor

@harunzengin harunzengin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it and there are some issues.

end
end)
)
new_timed_out_ids =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 4e2ac5e, I get the following stacktrace, just trying to execute a query:

** (exit) exited in: :gen_statem.call(#PID<0.933.0>, {:checkout_state_for_next_request, #Reference<0.0.121219.3000101197.1395982337.31962>}, :infinity)
    ** (EXIT) an exception was raised:
        ** (BadMapError) expected a map, got: []
            (erts 14.2.1) :erlang.is_map_key(21374, [])
            (xandra 0.18.1) lib/xandra/connection.ex:900: Xandra.Connection.random_free_stream_id/2
            (xandra 0.18.1) lib/xandra/connection.ex:600: Xandra.Connection.connected/3
            (stdlib 5.2) gen_statem.erl:1395: :gen_statem.loop_state_callback/11
            (stdlib 5.2) proc_lib.erl:241: :proc_lib.init_p_do_apply/3
    (stdlib 5.2) gen.erl:246: :gen.do_call/4
    (stdlib 5.2) gen_statem.erl:923: :gen_statem.call/3
    (xandra 0.18.1) lib/xandra/connection.ex:158: Xandra.Connection.execute/4
    (xandra 0.18.1) lib/xandra/retry_strategy.ex:309: Xandra.RetryStrategy.run_on_cluster/5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it seems that preparing a query doesn't work and fails with :timeout

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harunzengin I fixed the bug and added a regression test for that, it wasn't the nicest test to add but I’m happy we have a test for this now 🙃

As for the prepare, I’m not sure what you mean. I have a test in there that exercises Xandra.prepare/3 and it's passing. Do you have an example of when preparing a query fails?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this out, the timeouts in prepared were probably related to the first case, it is resolved now.

Copy link
Contributor

@harunzengin harunzengin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All green 👍

@whatyouhide
Copy link
Owner Author

@harunzengin and no memory leaks or anything like that, correct?

@harunzengin
Copy link
Contributor

harunzengin commented Mar 6, 2024

@whatyouhide No, it runs since yesterday and no memory leaks. I guess we're ready for a patch release?

@whatyouhide whatyouhide merged commit bee3984 into main Mar 6, 2024
5 checks passed
@whatyouhide whatyouhide deleted the al/connection-improvements branch March 6, 2024 12:25
@whatyouhide
Copy link
Owner Author

Yep, right after I fix #359.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants