New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/publicsuffix: Not possible to check for suffix existence on list #31027
Comments
Change https://golang.org/cl/169079 mentions this issue: |
@ynadji You write
Could you explain in more detail what you are trying to use the PSL for? In which sense does the PSL help determining whether a domain name "can exist on the public internet"? I am not sure the PSL can be used as a authoritative source for this kind of question (albeit it might be a good heuristic if the PSL is a jour). |
cc @nigeltao |
@vdobler I would agree that it isn't authoritative, but it is a nice heuristic when syntax is not enough but querying a registrar or the DNS is too much. In my day to day, I work with a lot of domain data and it is nice to be able (1) to quickly prune unlikely domains from the system and (2) promote domains that could be registered into a queue for more expensive operations like querying. The PSL captures this, assuming it is up to date as you mentioned. On one side of the spectrum, most code validates domains syntactically, using something like On the other side, we can check that the domain is owned by someone by asking a registrar or resolving the domain to see if it's actually used, however, this may be too expensive, can cause machines to be blocked if done en masse, and answers a slightly different question: ownership, usage, and potential for ownership, respectively. It's possible this is a bad fit due to the update cadence compared to say publicsuffix-go. That said, I would use this to identify if a domain uses IANA public TLDs or not, which change far less frequently. The private section doesn't really matter here. |
I think there actually is a way, based on the fact that only ICANN-managed domains can be single-label, and Privately-managed domains must have multiple labels. See the suggested SuffixAndDivision function at: https://go-review.googlesource.com/c/net/+/153737/4#message-d558a752d9467d474ac28d0fc6efa4ee687e49a7 That code review discussed added a new "Division" type that represented Unknown / ICANN / Private. For you, IIUC, "presence on the list" is equivalent to (division != Unknown), so you could code that bool directly, without needing a Dvision type: "return icann || (strings.IndexByte(ps, '.') >= 0)". If that doesn't work for you, it's not that hard to parse the Public Suffix List for TLD's yourself, at whatever cadence you want: |
Ah thank you @nigeltao that snippet works swimmingly. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Because
PublicSuffix
defaults to"*"
as a rule if none match, there is no way to use the output of these functions to determine if a domain's public suffix was present on the list (i.e., is a private/public managed TLD) or is unmanaged. This is useful when domains that are syntactically valid but cannot exist on the public Internet at large must be filtered.What did you expect to see?
A function to perform such a check.
What did you see instead?
No such function exists.
The text was updated successfully, but these errors were encountered: