Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic URL parse requires stripping tabs before host state is entered, allowing bad hosts #829

Open
gh-andre opened this issue Aug 2, 2024 · 0 comments

Comments

@gh-andre
Copy link

gh-andre commented Aug 2, 2024

What is the issue with the URL Standard?

In this document:

https://url.spec.whatwg.org/#concept-basic-url-parser

Item 3 says:

Remove all ASCII tab or newline from input.

After this it proceeds to describe how different parsing states should be processed and in host state/hostname state it states that a bad host should result in a parsing termination error (points 3 and 4):

Let host be the result of host parsing buffer with url is not special.

If host is failure, then return failure.

In host parsing, it says that a forbidden code point should terminate parsing:

If asciiDomain contains a forbidden domain code point, domain-invalid-code-point validation error, return failure.

Finally, forbidden host code point includes tab as an invalid character, which should fail URL parsing or a manufactured host name will be produced.

This ordering of stripping all tabs from a URL and then not allowing tabs in host names prevents host names from being validated properly (i.e. invalid characters are removed before they can be evaluated).

This has an immediate effect on some of the current libraries. For example Python's urlsplit will take abc<tab>xyz.test and will manufacture a host name abcxyz.test, which happens because they remove tabs from the URL, before having a chance to validate the host name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant