Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netty server backend for http4s. #1831

Closed
wants to merge 25 commits into from
Closed

Conversation

jmcardon
Copy link
Contributor

@jmcardon jmcardon commented May 8, 2018

Netty support for http4s

Majority of inspiration a chunk of netty-helping code comes from the
Play Framework netty server backend.
Credits to them for a lot of the gluework for netty.

Progress

  • Server builder
  • SSL
  • MountService
    - [ ] (Maybe) Server header Not gonna be supported. Can be added with middleware if needed
  • Builder tests
  • Tests similar to http1 server stage
  • Websocket support
    - [ ] Websocket tests? Difficult to test at the current moment
  • Netty Channel options (Only AUTO_READ must be set as false)

@jmcardon
Copy link
Contributor Author

jmcardon commented May 8, 2018

Note on default values:

miku is being added for the sake of leveraging native transports... because speed n' stuff. Thus, I set the default transport option as Native, with default fallback onto nio in case it's not available.

@@ -32,7 32,7 @@ lazy val core = libraryProject("core")
scalaReflect(scalaOrganization.value, scalaVersion.value) % "provided",
scodecBits,
scalaCompiler(scalaOrganization.value, scalaVersion.value) % "provided"
),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MY BEAUTIFUL TRAILING COMMAS!!!

(I like these, because you can add to the list without bringing into the diff things that don't change. But I'm not going to die on this hill.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support re-adding these commas! :)

build.sbt Outdated
@@ -112,6 112,20 @@ lazy val blazeClient = libraryProject("blaze-client")
)
.dependsOn(blazeCore % "compile;test->test", client % "compile;test->test")

lazy val mikuServer = libraryProject("miku-server")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'll be the grump here. Discoverability is important. Can a user intuit what exists from naming conventions, scanning the directory structure, etc.?

  • Does http4s support circe? http4s-circe, org.http4s.circe
  • Does it support servlets? http4s-servlet, org.http4s.servlet
  • Does it support twirl? http4s-twirl, org.http4s.twirl
  • Does it support netty servers? http4s-miku, org.http4s.miku 😕

Now, it would be tragic if your ASCII shrine to miku were lost. Could we leave it as a whimsical banner in the server package so that the glory of Miku could be shared across all backends? Use it in a tut for how to configure a server? I'm all for having a laugh, but let's make sure it's not detracting from a high quality and oft-requested module.

🎶 I don't think Hank done it this way 🎶

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. i think I can rename to http4s netty and leave the glory of miku in my PR

import scala.collection.JavaConverters._
import scala.collection.immutable
import scala.concurrent.ExecutionContext
import scala.concurrent.duration.Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been sorting our imports lexicographically.

import scala.concurrent.duration.Duration

sealed trait NettyTransport
case object Jdk extends NettyTransport
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't Netty call this Nio?

For namespacing, maybe put these in a companion object of NettyTransport, since the names don't mean much on their own?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

import scala.concurrent.ExecutionContext
import scala.concurrent.duration.Duration

sealed trait NettyTransport
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like with Product with Serializable, if only because it gets people into the habit where this inference really tends to matter.

protected def getBootstrap(
serverChannelEventLoop: EventLoopGroup,
channelPublisher: HandlerPublisher[Channel],
address: InetSocketAddress): Bootstrap = transport match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks tangled with the transport logic in start. Is there a good way to do this transport sniffing once? Maybe it doesn't really matter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look to reorder the code for it, sure. I wrote that at 4am

* FIFO asynchronous queue.
*
*/
sealed abstract class MikuHandler[F[_]](
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make this package private, it's easier to change it in a binary compatible fashion when the bugs start rolling in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one I meant to do. I forgot. will change

.withZone(ZoneId.of("GMT"))

// Compute the formatted date string only once per second, and cache the result.
// This should help microscopically under load.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a technique that's applicable in blaze, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking if i could port over this performance hack in blaze as well actually. Maybe, if thw threading model is anything like netty, most likely, as this isnt even work lost usually, considering i think 99% of the time, people won't manually add a date header

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only time I've ever manually constructed a date header was in a test, trying to fool an expiring token system that I didn't control.

if (request.protocolVersion() == HttpVersion.HTTP_1_1)
HV.`HTTP/1.1`
else
HV.`HTTP/1.0`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've covered 99.99999999% of people not causing trouble, but, is this right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codec can't even respond to http 2.0 messages if you wanted it to. It doesn't handle them properly. I can add a test though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @rossabaker's point, there is nothing restricting the Netty HTTP verison to 1.0 and 1.1:

scala> io.netty.handler.codec.http.HttpVersion.valueOf("HTTP/1.2")
res1: io.netty.handler.codec.http.HttpVersion = HTTP/1.2

I can't remember where, but I do recall seeing a HTTP/2.0 floating around at $work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I was thinking. I think you'd respond correctly, for free. It's just about honesty in what we pass to the server. (This is a really, really stupid use case. It just caught my eye.)

More realistic, though slightly less in 2018: if the service generates an HTTP/1.0 response, I bet this server is supporting some invalid things. Blaze is pretty careful about the distinction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a counter chess move, I will point out that http2 has its own module.

I don't think the ResponseEncoder will do valid things for the Http2 spec.

lazy val nettyHandler = "io.netty" % "netty-handler" % "4.1.24.Final"
lazy val nettyHttpCodec = "io.netty" % "netty-codec-http" % nettyHandler.revision
lazy val nettyNativeTransport = "io.netty" % "netty-transport-native-epoll" % nettyHandler.revision classifier "linux-x86_64"
lazy val nettyUnixCommon = "io.netty" % "netty-transport-native-unix-common" % nettyHandler.revision
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I hate vertical alignment.

If we're blowing out the diff for netty-transport-native-epoll, might as well go all the way for this one, no?

@jmcardon jmcardon changed the title WIP Miku: Netty server backend for http4s. Netty server backend for http4s. May 14, 2018
@@ -67,9 72,13 @@ sealed abstract class MikuHandler[F[_]](

/**
* Handle the given request.
* Note: Handle implementations fork into user ExecutionContext
* Note: Handle implementations fork into user ExecutionContext?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is the question to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably added "?" by mistake.

//Benching this would be interesting
ec.execute(new Runnable {
def run(): Unit =
F.runAsync(cleanup(future.channel()))(_ => IO.unit).unsafeRunSync()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cats-effect has Logger.reportFailure for this. We should add that to our own internals. This _ => IO.unit swallows errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually, reconsidering here:

The cleanup action is either just unit or drainBody, neither of which will throw an exception.

Save a VM fatal error, drainBody doesn't fail/throw. I'd like to reconsider this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, but I also think I've wasted a lot of my life thinking I was right and finding the murder weapon was hidden in an empty exception handler. I guess I reflexively do this, if only to make it look even more jarring when people don't.

I won't die on this hill.

Copy link
Contributor Author

@jmcardon jmcardon May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's any consolation whatsoever:

This part I spent a lot of time on. In particular, because that cleanup action drains the buffers out of the reactive stream publisher if it wasn't used. This would be a huge memleak if it turned out to be an issue.

I spent hours debugging this and ensuring resources were released properly. I silently discard errors here because I came to the conclusion after a lot of moving around, spamming requests and observing output:

  • If even one element of the stream was consumed this no-ops, since when you exit the stream scope, no matter the error, it flushes out any buffered elements.
  • This is primarily concerned with resource release and it only fires after the channel write is complete. If it ever errors out (extremely low chance), it still means the stream was consumed somewhere else and thus released properly. It can only error if you consumed the stream and closed it and then tried to consume it again, in which case the upstream publisher is already done releasing resources.

I really hope the confidence that I gained by spending a long time on this doesn't come back to bite me, but I'm fairly confident this is right.

}(trampoline)
.recover[Unit] {
case error: Exception =>
logger.error(error)("Exception caught in channelRead future")
case _: Exception =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NonFatal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. Will do.

@@ -160,6 182,9 @@ sealed abstract class MikuHandler[F[_]](
// https://github.com/netty/netty/blob/netty-3.9.3.Final/src/main/java/org/jboss/netty/handler/codec/http/HttpHeaders.java#L1075-L1080
logger.debug(e)("Handling Header value error")
sendSimpleErrorResponse(ctx, HttpResponseStatus.BAD_REQUEST); ()

case InvalidMessageException =>
sendSimpleErrorResponse(ctx, HttpResponseStatus.UNPROCESSABLE_ENTITY); ()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this an internal server error? If the netty codec sent us a type we can't handle? This is blaming the client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, this is an internal server error.

@@ -1,4 1,5 @@
package org.http4s.miku
package org.http4s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our other backends are all in org.http4s.server or org.http4s.client. This one ought to be, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will move

@@ -254,64 272,73 @@ class MikuBuilder[F[_]](
/** A stream transformation that registers our channels and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really a stream transformation anymore. It's just an effectful function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair. I'll make that distinction

val pipeline = connChannel.pipeline()
getContext() match {
case Some((ctx, auth)) =>
val engine = ctx.createSSLEngine()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think passing around an F[SSLEngine] might be the right API for our backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly. I've been considering this too and I Want to expose that on master, but for 0.18.x maybe not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a bit more consistent with 0.18. Let's leave that exploration for master.

}

"Return the proper content length for a simple response" in {
val length = "waylonnn jenningssss".length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure Hank spelt it this way?

Copy link
Contributor Author

@jmcardon jmcardon May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL. This is the spelling after a few drinks.

lazy val logbackClassic = "ch.qos.logback" % "logback-classic" % "1.2.3"
lazy val metricsCore = "io.dropwizard.metrics" % "metrics-core" % "4.0.2"
lazy val metricsJson = "io.dropwizard.metrics" % "metrics-json" % metricsCore.revision
lazy val nettyHandler = "io.netty" % "netty-handler" % "4.0.56.Final"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align them all or don't touch them all.

@rossabaker
Copy link
Member

Appveyor is unhealthy. I think this build is legitimate.

@jmcardon
Copy link
Contributor Author

jmcardon commented May 18, 2018

Alright, so I have some performance tests here, for just a ping route, using native transports.

Blaze:

================================================================================
---- Global Information --------------------------------------------------------
> request count                                    6000000 (OK=5954785 KO=45215 )
> min response time                                      0 (OK=0      KO=0     )
> max response time                                  11040 (OK=11040  KO=0     )
> mean response time                                   507 (OK=511    KO=0     )
> std deviation                                        654 (OK=655    KO=0     )
> response time 50th percentile                        344 (OK=346    KO=0     )
> response time 75th percentile                        614 (OK=617    KO=0     )
> response time 95th percentile                       1580 (OK=1586   KO=0     )
> response time 99th percentile                       3091 (OK=3102   KO=0     )
> mean requests/sec                                37267.081 (OK=36986.242 KO=280.839)
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                       4978308 ( 83%)
> 800 ms < t < 1200 ms                              502200 (  8%)
> t > 1200 ms                                       474277 (  8%)
> failed                                             45215 (  1%)
---- Errors --------------------------------------------------------------------
> j.n.ConnectException: Failed to open a socket.                  41348 (91.45%)
> j.n.ConnectException: Cannot assign requested address: /52.43.   3765 ( 8.33%)
212.119:8080
> j.n.ConnectException: connection timed out: /52.43.212.119:808    102 ( 0.23%)
0
================================================================================

Netty (almost exactly as it is in this PR):

================================================================================
---- Global Information --------------------------------------------------------
> request count                                    6000000 (OK=5234048 KO=765952)
> min response time                                      0 (OK=0      KO=0     )
> max response time                                  62780 (OK=62770  KO=62780 )
> mean response time                                  3968 (OK=4548   KO=1     )
> std deviation                                       3021 (OK=2795   KO=256   )
> response time 50th percentile                       4301 (OK=4779   KO=0     )
> response time 75th percentile                       6079 (OK=6293   KO=0     )
> response time 95th percentile                       7868 (OK=8002   KO=0     )
> response time 99th percentile                      10270 (OK=10769  KO=0     )
> mean requests/sec                                12121.212 (OK=10573.834 KO=1547.378)
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                        533366 (  9%)
> 800 ms < t < 1200 ms                              218675 (  4%)
> t > 1200 ms                                      4482007 ( 75%)
> failed                                            765952 ( 13%)
---- Errors --------------------------------------------------------------------
> j.n.ConnectException: Failed to open a socket.                 548263 (71.58%)
> j.n.ConnectException: connection timed out: /52.43.212.119:808  91783 (11.98%)
0
> j.n.ConnectException: Cannot assign requested address: /52.43.  87734 (11.45%)
212.119:8080
> j.u.c.TimeoutException: Request timeout to not-connected after  29515 ( 3.85%)
 60000 ms
> j.u.c.TimeoutException: Request timeout to /52.43.212.119:8080   8657 ( 1.13%)
 after 60000 ms
================================================================================

So... Performance is crap. Blaze is positively obliterating netty in my load tests and it's an absurd difference. This is obviously not the fault of netty, so my code is crap.

I'd like to put this PR in the freezer as such while I perform heavy optimization on the handlers and overhaul it in general (maybe reopen it). In its current state it's not worth it to introduce into the repository.

@rossabaker
Copy link
Member

When 9% of the requests are failing on the one that's "winning," I don't really even care about the performance numbers. Why are they both so unstable in this test? Can you get a comparison at a load that is manageable?

@aeons
Copy link
Member

aeons commented May 18, 2018

It’s just 1% failing though, right?

And since most of the errors are ‘failed to open a socket’ it seems like a benchmark problem.

@rossabaker
Copy link
Member

Excuse me. I totally misread that report.

I still am wary of benchmarks that return errors. When one is clean and another is failing, that can say something. When they're both buckling, I begin to wonder whether the client is healthy enough to be producing useful metrics.

@jmcardon
Copy link
Contributor Author

@rossabaker The bench failing to open a socket is simply because it attempts to maintain a particular request per second rate, and that fails since it's trying for 50k and particular request haven't returned. It reuses connections though. The failure rate has more to do with the benchmark because I haven't entirely figured out gatling and It's weird

I'm trying to perform a ping benchmark to buckle the server on purpose. On an M5.2xlarge (8 vCPU) blaze doesn't even sweat at 50k rps.

Under the same conditions, my code is being destroyed and that's purely my fault because I'm relying on reactive-streams too much as a crutch, when I should be writing custom fs2 codecs.

In general though, this was a benchmark that was made on purpose to make it buckle and netty is buckling way harder than I expected, so I'd like to change my code and improve it to see what I can do about it.

Alternatively, we can merge, but we wouldn't be fully taking advantage of what netty can do until I clean up what is there.

@yanns
Copy link
Contributor

yanns commented Nov 12, 2018

What is the status of this PR?

I'm willing to help here, maybe updating that for http4s 0.20?

@jmcardon
Copy link
Contributor Author

Status was the overhead of reactive streams and fs2 combined was so much I decided to stop working on it until I had the time to write a custom thing for it.

I haven't had the time budget for open source at all, so I left it open for future "inspiration", but despite being functional I'm not a fan of the performance of this.

@yanns
Copy link
Contributor

yanns commented Nov 12, 2018

Thx for the update.
Personal opinion: as long as it stays a PR, it's quite difficult for others to contribute to it.
Maybe this could be merged and marked as "experimental with bad performances"?

@jmcardon
Copy link
Contributor Author

jmcardon commented Nov 12, 2018

Personally, I have a few opinions on this:

  1. I don't want to merge anything I've written that I consider subpar.
  2. Arguably, the part that needs to change is the meat of this PR. Anything merged will eventually be locked into binary compat and this makes fixing issues far more annoying.
  3. I think this should be used for inspiration, but not necessarily a fork directly. Also arguably: to fix the current issues I found within this PR, you will most likely need a new fresh take on it (from tweaking this extensively, this is my personal opinion).

@yanns
Copy link
Contributor

yanns commented Nov 12, 2018

OK I understand. Thx for the update.

@hamnis
Copy link
Contributor

hamnis commented May 13, 2020

I have a version of this in a private repo on my github. Would it be OK if I publish this under my own maven groupId? @jmcardon @rossabaker ? I have dropped TLS and websockets for now, as a way of getting started. If not, I can start a new branch or move to an own repo under this org. As an experiment I think it can be valuable if not included in the core.

@rossabaker
Copy link
Member

I'd be happy to host it here in the http4s org. (Similar to what we recently did with finagle, and have been doing with jdk-http-client). It's more likely to be found and attract other contributors when it's next to the related projects.

@hamnis
Copy link
Contributor

hamnis commented May 13, 2020

@rossabaker allright. How do we proceed from here? We can do this on gitter if that is preferable

@hamnis hamnis closed this Sep 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants