Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urlsplit manufactures hostnames because it strips off tabs before validating them #122761

Open
gh-andre opened this issue Aug 6, 2024 · 0 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@gh-andre
Copy link

gh-andre commented Aug 6, 2024

Bug report

Bug description:

import urllib.parse

# prints "abcxyz.test"
print(urllib.parse.urlsplit("http://abc\txyz.test/").netloc)

Current urlsplit is implemented according to this spec:

https://url.spec.whatwg.org/#concept-basic-url-parser

The spec does say in item 3 to strip tabs, but I believe there's a bug in the specification (perhaps they wanted to say leading/trailing whitespace) because the item 7 in host parsing says

If asciiDomain contains a forbidden domain code point, domain-invalid-code-point validation error, return failure.

, and tab is listed as a "forbidden domain code point". If tabs are stripped from the entire input before any other work is done, checking for tabs in host names wouldn't make much sense.

I created a bug in the specification project, so maybe they will provide some guidance later on.

whatwg/url#829

CPython versions tested on:

3.10

Operating systems tested on:

Linux, Windows

@gh-andre gh-andre added the type-bug An unexpected behavior, bug, or error label Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant