Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove file check to support proxied SSL connection #731

Open
HoneyryderChuck opened this issue Mar 20, 2024 · 6 comments
Open

remove file check to support proxied SSL connection #731

HoneyryderChuck opened this issue Mar 20, 2024 · 6 comments

Comments

@HoneyryderChuck
Copy link
Contributor

HoneyryderChuck commented Mar 20, 2024

Hi, I'm implementing HTTPS connect proxy support in httpx, which is supposed to connect via SSL to the proxy before sending the CONNECT request and renegotiate the SSL connection.

This seems to be impossible to do however, given that SSLSocket initialization checks whether IO is a file, rather than something that quacks as a file.

FWIW I'm trying to accomplish the same feature as this curl example here:

> curl -x https://proxyuser:password@proxyhost:9080 https://example.com/get --proxy-cacert /proxy/ca.crt
@rhenium
Copy link
Member

rhenium commented Mar 21, 2024

Currently, SSLSocket passes the file descriptor to OpenSSL and doesn't use Ruby's IO interface.

openssl/ext/openssl/ossl_ssl.c

Lines 1688 to 1689 in 45c731c

if (!SSL_set_fd(ssl, TO_SOCKET(rb_io_descriptor(io))))
ossl_raise(eSSLError, "SSL_set_fd");

OpenSSL uses an abstraction layer called BIO to interact with the underlying socket. Here is the default implementation (a "BIO method" - SSL_set_fd() will set up the BIO_s_socket()): https://github.com/openssl/openssl/blob/master/crypto/bio/bss_sock.c

It's more work than just removing a check, but it should be possible to define a custom BIO method that wraps an IO-ish object (as long as it supports non-blocking IO methods).

@HoneyryderChuck
Copy link
Contributor Author

HoneyryderChuck commented Mar 21, 2024

I was anticipating it to be much harder than just removing the check, no worries.

To better illustrate what I'm looking for, here's a practical example using docker: I've just pushed the "infra setup" changes to this MR branch. Once you download the repo and switch to the "https-proxy" branch:

  • run docker-compose -f docker-compose -f docker-compose-ruby-{MAJOR}-{MINOR}.yml run --entrypoint bash httpx to open a shell
  • run these and these commands

And you should be in a position to replicate the issue. You can run:

> curl -x https://proxyuser:password@httpsproxy:9080 \
          --proxy-cacert test/support/ci/certs/ca.crt \
          https://nghttp2/get -v

to verify what/how curl does it. To replicate what at least I'd expect to work in ruby, you can run this script:

require "socket"
require "openssl"
require "base64"


AUTH = Base64.strict_encode64("proxyuser:password")

CONNECT_REQ = <<-CONNECT.gsub("\n", "\r\n")
CONNECT nghttp2:443 HTTP/1.1
Host: nghttp2
Proxy-Authorization: Basic #{AUTH}

CONNECT

REQ = <<-REQ.gsub("\n", "\r\n")
GET /get HTTP/1.1
User-Agent: smth
Accept: */*
Connection: close
Host: nghttp2

REQ

tcp = TCPSocket.new("httpsproxy", 9080)
ctx = OpenSSL::SSL::SSLContext.new
ssl = OpenSSL::SSL::SSLSocket.new(tcp, ctx)
ssl.hostname = "httpsproxy"
ssl.sync_close = true
ssl.connect
ssl.post_connection_check("httpsproxy")

ssl.write(CONNECT_REQ)

response = ssl.readpartial(2048)

puts "response from proxy connect"
puts response.inspect

raise unless response.start_with?("HTTP/1.1 200")

ssl.write(REQ)

puts ssl.readpartial(2048)

which currently breaks on the last line with ...openssl/buffering.rb:157:in sysread': end of file reached (EOFError)`

@HoneyryderChuck
Copy link
Contributor Author

@rhenium just opened an MR to explore a possible approach to solving this. LMK what you think, and whether you think there are other alternatives worth pursuing.

@rhenium
Copy link
Member

rhenium commented Mar 27, 2024

I was exploring the approach to implement a BIO_METHOD that simply translates those Ruby IO methods. It's not reviewed carefully, but here is a WIP PR: #736

Thanks for the pointer to usage of the memory BIO in Python's ssl library. I didn't know about it, and that approach is interesting. While it is slightly inefficient due to one additional data copy, if I understand it right, it would let us interact with the IO-like object outside of SSL_{read,write,..}() and reduce the amount of rb_protect() state variables churn. It's quite a mess in my WIP PR.

@HoneyryderChuck
Copy link
Contributor Author

hey @rhenium , after a bit of back-and-forth with my approach, and hitting some limitations of my openssl C API expertise, and also researching other implementations, I'm leaning towards your approach of BIO_METHOD as being the most viable.

To begin with, it works! I've tested it with the script pasted above against a server behind-a-squid-behind-stunnel, and even if still a rough patch, it worked the first time. So that's something I haven't been able to produce.

I've also been looking at how curl works on top of openssl, and it seems that it also ships with its own BIO_METHOD. The main advantage seems to be that SSL_connect usage seems to "just work", whereas you'd have to "drop down" a layer of abstraction if maintaining bio-mems around (ssl connect was the hurdle I was never to pass from in my attempt). This sees to be how cpython works with openssl, where different APIs are used (SSL_write_ex instead of SSL_write, and SSL_do_handshake instead of SSL_connect, just to name a couple).

I also find that approach to be a better template to implement on top of openssl QUIC connections; not sure how much you've considered introducing them at this point though.

About rb_protect usage, I'm not sure if you really need it, considering you're using the "exception: false" variants of the non-blocking APIs? Moreover (and a bit tangential), having researched the cpython implementation (a bit superficially, I admit), I was wondering why can't the gem release the GVL more often, for example when calling the SSL_* family of APIs (python examples here and here), as I believe roughly the same constraints apply?

Thx for the help sofar, and LMK how I can help move this forward! 🙏

@Aubermean
Copy link

Nudging this, any ideas @rhenium ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants