Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returns null for tld.getDomain('http://github.io') #117

Open
makecontact opened this issue Feb 28, 2018 · 5 comments
Open

Returns null for tld.getDomain('http://github.io') #117

makecontact opened this issue Feb 28, 2018 · 5 comments
Labels

Comments

@makecontact
Copy link

makecontact commented Feb 28, 2018

Always returns null for github.io many others including

  • gitlab.io
  • ngrok.io
  • sandcats.io

etc...

var tld = require('tldjs');
console.log(tld.getDomain('http://github.io'));

Any takers?

I've tracked it down to any domain that is listed in tlds/rules.json

@remusao
Copy link
Collaborator

remusao commented Feb 28, 2018

Hi @makecontact

You are right to point out that any domain from tlds/rules.json (or from https://publicsuffix.org/list/effective_tld_names.dat directly), will have a null domain and public suffix of the value found in the list (e.g.: gitlab.io is a valid public suffix).

This is a long-standing and known issue which is not trivial to fix. It stems from the fact that the public suffix list was originally designed to check under which domains, sub-domains can be registered, and cookies can be set. In turn, it can lead to surprising/un-intuitive results such as the ones you encountered.

We've thought about this situation in the past, and I can see a few solutions, none of which is perfect. But maybe it would be "good enough":

  1. One hacky fix you can use right now without any update of TLD is to detect when domain is null, and instead use the value publicSuffix as the domain. This will work for a lot of domains (I will try to investigate more how many domain would return the wrong result with this solution, but I expect not so many).
  2. There are currently two parts in the public suffix list: ICANN and PRIVATE. I suspect that most of the surprising cases come from the PRIVATE part. We could add an option in tld.js to only take ICANN domains into account (this would fix the examples you found and many others).
  3. Combine both 1. and 2. (most of the counter-examples seem to be japan domain, but we need to investigate a bit more to see if there are some non-trivial cases).

None of the solution is perfect as there are known counter-examples. If this is an option for you, I would suggest you give a try to 1. and I will try to implement 2..

Also, as far as I know, this should be a limitation for all libraries using the public suffix lists unfortunately.

@makecontact
Copy link
Author

@remusao thank you for taking the time to write a good reply. I can appreciate that this is a problem that can't easily be solved but I'm happy with the work arounds you've suggested.

@mrlubos
Copy link

mrlubos commented Jul 16, 2018

1, also returns null for 1password.com or https://1password.com

@remusao
Copy link
Collaborator

remusao commented Jul 16, 2018

@lmenus Thanks for coming up with another breaking case. As far as I can tell, this would be fixed by #128. Hopefully we can move this work forward soon!

@mrlubos
Copy link

mrlubos commented Jul 17, 2018

@remusao Looks good, thank you for your work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants