New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/publicsuffix: entire host interpreted as TLD #24672
Comments
This is the easy part: 1. is correct as per the spec. Step 4 in the algorithm described in https://publicsuffix.org/list/ requires to select the rule with the most labels. The question about the "TLD" is more complicated. Just to make sure we use the same language:
The current package publicsuffix does not provide a function to directly compute the eTLD of a domain.
returns an error: mg.gov.br is an eTLD and there is no additional label left to return. So I think package publicsuffix behaves in the only way possible in this case:
Regarding your partitioning problem: I'm not sure if I understand the problem correctly but what prevents you to use EffeciveTLDPlusOne() or PublicSuffix? If it can determine an eTLD+1 use that if it returns an error you know your domain itself is an eTLD and falls into its own class?
But of course you have to implement your own logic here, but I doubt package publicsuffix could help here in a consistent way. |
Hi @vdobler thanks for the detailed explanation and fast response, that makes sense. And you're right, I was using TLD really to mean eTLD. To clarify the issue, our code thus far has expected that all valid domains can have EffectiveTLDPlusOne called on them without erring (regardless of what we're using that eTLD+1 for). And it was surprising to find this strange case where that didn't hold, since something now can be both a valid eTLD and eTLD+1. And in our case, we want to treat I think we'll just have to modify our code such that when EffectiveTLDPlusOne errs, we try stripping off one domain part and see if the remainder is a valid PublicSuffix, and if it is then we can treat the whole domain as an EffectiveTLDPlusOne. I guess I don't have a better way publicsuffix should behave, given your explanation, so I'll close this. |
(Sorry if this is not a legitimate issue, but it is odd behavior so seemed worth raising.)
The publicsuffix library currently matches the longest possible rule in the public suffix list. This seems to be according to spec but creates strange behavior in cases where the entire domain is valid (has a DNS entry) but is also a TLD.
The case at issue here is
mg.gov.br
. This is itself an entry in the full public suffix list, andgov.br
is also an entry. Currently if you try to get the TLD+1 formg.gov.br
you get an error, becausemg.gov.br
is itself the suffix (see playground).But in reality, for the domain
mg.gov.br
, the TLD isgov.br
and the TLD+1 ismg.gov.br
. At least, for our use, that is the output I would expect, and the docs don't say what happens when the entire domain is a suffix.Which of these should be correct?
publicsuffix.PublicSuffix("mg.gov.br") == "mg.gov.br"
publicsuffix.PublicSuffix("mg.gov.br") == "gov.br"
The use case in our current system for this is to do some data partitioning by TLD+1, because doing it by TLD is way too coarse and simply by domain is way too granular and bad for sites with wildcard subdomains. I guess we could work around it by removing a leading domain part, checking if the remainder is itself a valid TLD, and if so treat that as the real TLD, but would like to avoid this if possible.
The text was updated successfully, but these errors were encountered: