Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/publicsuffix: entire host interpreted as TLD #24672

Closed
dankinder opened this issue Apr 3, 2018 · 3 comments
Closed

x/net/publicsuffix: entire host interpreted as TLD #24672

dankinder opened this issue Apr 3, 2018 · 3 comments

Comments

@dankinder
Copy link

(Sorry if this is not a legitimate issue, but it is odd behavior so seemed worth raising.)

The publicsuffix library currently matches the longest possible rule in the public suffix list. This seems to be according to spec but creates strange behavior in cases where the entire domain is valid (has a DNS entry) but is also a TLD.

The case at issue here is mg.gov.br. This is itself an entry in the full public suffix list, and gov.br is also an entry. Currently if you try to get the TLD+1 for mg.gov.br you get an error, because mg.gov.br is itself the suffix (see playground).

But in reality, for the domain mg.gov.br, the TLD is gov.br and the TLD+1 is mg.gov.br. At least, for our use, that is the output I would expect, and the docs don't say what happens when the entire domain is a suffix.

Which of these should be correct?

  1. publicsuffix.PublicSuffix("mg.gov.br") == "mg.gov.br"
  2. publicsuffix.PublicSuffix("mg.gov.br") == "gov.br"

The use case in our current system for this is to do some data partitioning by TLD+1, because doing it by TLD is way too coarse and simply by domain is way too granular and bad for sites with wildcard subdomains. I guess we could work around it by removing a leading domain part, checking if the remainder is itself a valid TLD, and if so treat that as the real TLD, but would like to avoid this if possible.

@gopherbot gopherbot added this to the Unreleased milestone Apr 3, 2018
@bradfitz
Copy link
Contributor

bradfitz commented Apr 4, 2018

/cc @vdobler @nigeltao

@vdobler
Copy link
Contributor

vdobler commented Apr 4, 2018

Which of these should be correct?

publicsuffix.PublicSuffix("mg.gov.br") == "mg.gov.br"
publicsuffix.PublicSuffix("mg.gov.br") == "gov.br"

This is the easy part: 1. is correct as per the spec. Step 4 in the algorithm described in https://publicsuffix.org/list/ requires to select the rule with the most labels.

The question about the "TLD" is more complicated.

Just to make sure we use the same language:

  • The TLD of any of gov.br, mg.gov.br and foobar.br is "br".
  • Below an effective TLD (eTLD) you can register a domain. "co.uk" is a standard example.
  • The eTLD+1 of a domain is the eTLD plus one more label.

The current package publicsuffix does not provide a function to directly compute the eTLD of a domain.
It seems as if the Brasilien registrar allow to register domains under gov.br (e.g. foobar.gov.br) as well as under mg.gov.br (e.g. quux.mg.gov.br) which makes both of them an eTLD.
This is the reason why

publicsuffix.EffectiveTLDPlusOne("mg.gov.br")

returns an error: mg.gov.br is an eTLD and there is no additional label left to return.

So I think package publicsuffix behaves in the only way possible in this case:

  • You may not set domain cookies for neither gov.br nor mg.gov.br so both are correctly returned by PublicSuffx()
  • Both, gov.br and mg.gov.br are effective TLDs (=allow registration of domains) so neither of them has an additional label (the +1) to return by EffectiveTLDPlusOne so both return an error.

Regarding your partitioning problem: I'm not sure if I understand the problem correctly but what prevents you to use EffeciveTLDPlusOne() or PublicSuffix? If it can determine an eTLD+1 use that if it returns an error you know your domain itself is an eTLD and falls into its own class?

  • foo.bar.mg.gov.br is classified as "bar.mg.gov.br"
  • guux.waz.gov.br is classified as "waz.gov.br"
  • mg.gov.br is classified as "mg.gov.br"

But of course you have to implement your own logic here, but I doubt package publicsuffix could help here in a consistent way.

@dankinder
Copy link
Author

Hi @vdobler thanks for the detailed explanation and fast response, that makes sense. And you're right, I was using TLD really to mean eTLD.

To clarify the issue, our code thus far has expected that all valid domains can have EffectiveTLDPlusOne called on them without erring (regardless of what we're using that eTLD+1 for). And it was surprising to find this strange case where that didn't hold, since something now can be both a valid eTLD and eTLD+1. And in our case, we want to treat mg.gov.br like it's an eTLD+1.

I think we'll just have to modify our code such that when EffectiveTLDPlusOne errs, we try stripping off one domain part and see if the remainder is a valid PublicSuffix, and if it is then we can treat the whole domain as an EffectiveTLDPlusOne.

I guess I don't have a better way publicsuffix should behave, given your explanation, so I'll close this.

@golang golang locked and limited conversation to collaborators Apr 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants