add_soup(): Don't match Content-type with in
#374
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Don't use Python's
in
operator to match Content-Types, since that isa simple substring match.
It's obviously not correct since a Content-Type string can be
relatively complicated, like
Content-Type: application/xhtml xml; charset="utf-8"; boundary="This is not text/html"
Although that's rather contrived, the prior test
"text/html" in response.headers.get("Content-Type", "")
would return True here, incorrectly.
Also, the existance of subtypes with 's means that using the prior
test for "application/xhtml" would match the above example when it
probably shouldn't.
Instead, leverage requests's code, which comes from the Python
Standard Library's
cgi.py
.Clarify that we don't implement MIME sniffing, nor
X-Content-Type-Options: nosniff
instead we do our own thing.
I was looking at this code because of #373.
I've marked this as a draft, because I'm not quite sure this is the way to go, both because of the long discursive comment, the use of a
_
function fromrequests
(versuscgi.py
'sparse_header()
).Also, I'm kind of perplexed what's going on here:
Like…why does the presence of
charset=utf-8
in theContent-Type
header mean that we should trustrequests
'sencoding
field? Oh, I see, it's because sometimesrequests
does some sniffing-ish-stuff and sometimes it doesn't (in which case it parses theContent-Type
) and we need to know which, and we're backing out a conclusion about its heuristics? Probably seems like maybe we should parse it ourselves if so. idk.Maybe we should be doing more formal mime sniffing. And maybe we should be honoring
X-Content-Type-Options: nosniff
. And… … …I'm also not sure what kind of test coverage is really appropriate here, if anything additional. Seems like the answer shouldn't be "zero," so…