x/net/publicsuffix: entire host interpreted as TLD #24672

dankinder · 2018-04-03T23:34:03Z

(Sorry if this is not a legitimate issue, but it is odd behavior so seemed worth raising.)

The publicsuffix library currently matches the longest possible rule in the public suffix list. This seems to be according to spec but creates strange behavior in cases where the entire domain is valid (has a DNS entry) but is also a TLD.

The case at issue here is mg.gov.br. This is itself an entry in the full public suffix list, and gov.br is also an entry. Currently if you try to get the TLD+1 for mg.gov.br you get an error, because mg.gov.br is itself the suffix (see playground).

But in reality, for the domain mg.gov.br, the TLD is gov.br and the TLD+1 is mg.gov.br. At least, for our use, that is the output I would expect, and the docs don't say what happens when the entire domain is a suffix.

Which of these should be correct?

publicsuffix.PublicSuffix("mg.gov.br") == "mg.gov.br"
publicsuffix.PublicSuffix("mg.gov.br") == "gov.br"

The use case in our current system for this is to do some data partitioning by TLD+1, because doing it by TLD is way too coarse and simply by domain is way too granular and bad for sites with wildcard subdomains. I guess we could work around it by removing a leading domain part, checking if the remainder is itself a valid TLD, and if so treat that as the real TLD, but would like to avoid this if possible.

The text was updated successfully, but these errors were encountered:

bradfitz · 2018-04-04T01:54:04Z

/cc @vdobler @nigeltao

vdobler · 2018-04-04T06:02:21Z

Which of these should be correct?

publicsuffix.PublicSuffix("mg.gov.br") == "mg.gov.br"
publicsuffix.PublicSuffix("mg.gov.br") == "gov.br"

This is the easy part: 1. is correct as per the spec. Step 4 in the algorithm described in https://publicsuffix.org/list/ requires to select the rule with the most labels.

The question about the "TLD" is more complicated.

Just to make sure we use the same language:

The TLD of any of gov.br, mg.gov.br and foobar.br is "br".
Below an effective TLD (eTLD) you can register a domain. "co.uk" is a standard example.
The eTLD+1 of a domain is the eTLD plus one more label.

The current package publicsuffix does not provide a function to directly compute the eTLD of a domain.
It seems as if the Brasilien registrar allow to register domains under gov.br (e.g. foobar.gov.br) as well as under mg.gov.br (e.g. quux.mg.gov.br) which makes both of them an eTLD.
This is the reason why

publicsuffix.EffectiveTLDPlusOne("mg.gov.br")

returns an error: mg.gov.br is an eTLD and there is no additional label left to return.

So I think package publicsuffix behaves in the only way possible in this case:

You may not set domain cookies for neither gov.br nor mg.gov.br so both are correctly returned by PublicSuffx()
Both, gov.br and mg.gov.br are effective TLDs (=allow registration of domains) so neither of them has an additional label (the +1) to return by EffectiveTLDPlusOne so both return an error.

Regarding your partitioning problem: I'm not sure if I understand the problem correctly but what prevents you to use EffeciveTLDPlusOne() or PublicSuffix? If it can determine an eTLD+1 use that if it returns an error you know your domain itself is an eTLD and falls into its own class?

foo.bar.mg.gov.br is classified as "bar.mg.gov.br"
guux.waz.gov.br is classified as "waz.gov.br"
mg.gov.br is classified as "mg.gov.br"

But of course you have to implement your own logic here, but I doubt package publicsuffix could help here in a consistent way.

dankinder · 2018-04-04T17:35:05Z

Hi @vdobler thanks for the detailed explanation and fast response, that makes sense. And you're right, I was using TLD really to mean eTLD.

To clarify the issue, our code thus far has expected that all valid domains can have EffectiveTLDPlusOne called on them without erring (regardless of what we're using that eTLD+1 for). And it was surprising to find this strange case where that didn't hold, since something now can be both a valid eTLD and eTLD+1. And in our case, we want to treat mg.gov.br like it's an eTLD+1.

I think we'll just have to modify our code such that when EffectiveTLDPlusOne errs, we try stripping off one domain part and see if the remainder is a valid PublicSuffix, and if it is then we can treat the whole domain as an EffectiveTLDPlusOne.

I guess I don't have a better way publicsuffix should behave, given your explanation, so I'll close this.

gopherbot added this to the Unreleased milestone Apr 3, 2018

dankinder closed this as completed Apr 4, 2018

golang locked and limited conversation to collaborators Apr 4, 2019

gopherbot added the FrozenDueToAge label Apr 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x/net/publicsuffix: entire host interpreted as TLD #24672

x/net/publicsuffix: entire host interpreted as TLD #24672

dankinder commented Apr 3, 2018

bradfitz commented Apr 4, 2018

vdobler commented Apr 4, 2018 •

edited

dankinder commented Apr 4, 2018

x/net/publicsuffix: entire host interpreted as TLD #24672

x/net/publicsuffix: entire host interpreted as TLD #24672

Comments

dankinder commented Apr 3, 2018

bradfitz commented Apr 4, 2018

vdobler commented Apr 4, 2018 • edited

dankinder commented Apr 4, 2018

vdobler commented Apr 4, 2018 •

edited