-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance in kvstore
benchmark on Linux
#162
Comments
Measuring performance with
These tests were run on a machine with other stuff running also, so we need to take these results with a grain of salt, but the conclusion seems safe that the problem exists for all recent versions of ghc (the result should be thousands of RPCs/sec, not tens). |
Running in profiling mode
we see
i.e., we're barely doing anything of interest; the time profile shows that the time that we do spend is spent in reasonable places
but again, this is out of a total time budget that amounts of basically nothing at all. |
When we analyse this with
the typical pattern we see is
where it takes ~40ms for the message to make it from the client to the handler, then the handler spending no time at all to process it, again taking ~40ms for the message to make it back from the handler to the client, and then the client spending no time at all dealing with the response. So something is going wrong with sending messages (which also explains the
where for whatever reason we don't get that 40ms delay. (Again, none of these delays are happening on OSX.) We get that same kind of "usually a ~40ms delay, but not quite always" on hte handler side also
versus
|
Digging a little deeper, enabling the cabal run ghc-events-util -- show-delta grapesy-kvstore.eventlog \
--show-from 3193512347 --show-until 3234416211 If we take a look at one of these slow runs, we unfortunately get no useful information; the only event that takes any time is the STM retry:
This means that if we are in fact blocked on some kind of foreign call, we cannot see it. Unfortunately, this may very well be because we currently cannot use the |
Running with
then, perhaps as expected, we basically see a whole lot of nothing in
This is not unexpected at this point of course because we already know that we spending most of our time waiting, and |
With
gives stuff like
(note that If we drop that
the four longer-lasting events here together take 40.26ms (with 38.56 just for the first four). |
Using a branch of
unfortunately does not give us much more useful information
|
We tried with different I/O managers on Linux (@FinleyMcIlwaine can perhaps document the details), but that did not make a difference either. (I thought I remembered that |
Even more worrying, we have noticed that a small |
This is due to not using |
The plan is to make |
Strangely, @FinleyMcIlwaine is observing much better numbers on Linux than I am (with |
The poor performance was due to |
This is a complicated problem; it happens only on Linux (not on OSX), and the usual (time) profiling don't reveal any obvious culprits. On this ticket we will collect some of the measurements, but the basic demonstration of the problem is that
reports 11.483 RPCs/sec, which should instead be as close to 25 RPCs/sec as possible (25 is the theoretical max, Java does 24.2), and
reports a measly 13.9 RPCs/sec, versus 4700 RPCs/sec for the Java benchmark (so like 2.5 orders of magnitude more). Again, on OSX this does not happen (
grapesy
is in fact roughly 30% faster than the Java benchmark there); we are getting unexpected delays on Linux somewhere.The text was updated successfully, but these errors were encountered: