Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/publicsuffix: Not possible to check for suffix existence on list #31027

Closed
ynadji opened this issue Mar 25, 2019 · 6 comments
Closed

x/net/publicsuffix: Not possible to check for suffix existence on list #31027

ynadji opened this issue Mar 25, 2019 · 6 comments

Comments

@ynadji
Copy link

ynadji commented Mar 25, 2019

What version of Go are you using (go version)?

% go version
go version go1.11 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
% go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/yacinnadji/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/yacinnadji/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.11/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.11/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/v0/p64ld18s6v3d77t04_w0kykh0000gn/T/go-build513848839=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Because PublicSuffix defaults to "*" as a rule if none match, there is no way to use the output of these functions to determine if a domain's public suffix was present on the list (i.e., is a private/public managed TLD) or is unmanaged. This is useful when domains that are syntactically valid but cannot exist on the public Internet at large must be filtered.

What did you expect to see?

A function to perform such a check.

What did you see instead?

No such function exists.

@gopherbot
Copy link

Change https://golang.org/cl/169079 mentions this issue: publicsuffix: add HasListedSuffix func

@vdobler
Copy link
Contributor

vdobler commented Mar 26, 2019

@ynadji You write

This is useful when domains that are syntactically valid but cannot exist on the public Internet at large must be filtered.

Could you explain in more detail what you are trying to use the PSL for?

In which sense does the PSL help determining whether a domain name "can exist on the public internet"? I am not sure the PSL can be used as a authoritative source for this kind of question (albeit it might be a good heuristic if the PSL is a jour).

@vdobler
Copy link
Contributor

vdobler commented Mar 26, 2019

cc @nigeltao

@ynadji
Copy link
Author

ynadji commented Mar 26, 2019

@vdobler I would agree that it isn't authoritative, but it is a nice heuristic when syntax is not enough but querying a registrar or the DNS is too much. In my day to day, I work with a lot of domain data and it is nice to be able (1) to quickly prune unlikely domains from the system and (2) promote domains that could be registered into a queue for more expensive operations like querying. The PSL captures this, assuming it is up to date as you mentioned.

On one side of the spectrum, most code validates domains syntactically, using something like govalidator.IsDNSName, but this accepts domains with known "bad" TLDs, e.g., non-existent ones like foo.totallynotarealtld or invalid ones like bar.uk (before uk accepted children and everything was under co.uk). In a system that deals with a lot of domain data, this leaves a bit to be desired.

On the other side, we can check that the domain is owned by someone by asking a registrar or resolving the domain to see if it's actually used, however, this may be too expensive, can cause machines to be blocked if done en masse, and answers a slightly different question: ownership, usage, and potential for ownership, respectively.

It's possible this is a bad fit due to the update cadence compared to say publicsuffix-go. That said, I would use this to identify if a domain uses IANA public TLDs or not, which change far less frequently. The private section doesn't really matter here.

@nigeltao
Copy link
Contributor

there is no way to use the output of these functions to determine if a domain's public suffix was present on the list (i.e., is a private/public managed TLD) or is unmanaged

I think there actually is a way, based on the fact that only ICANN-managed domains can be single-label, and Privately-managed domains must have multiple labels. See the suggested SuffixAndDivision function at:

https://go-review.googlesource.com/c/net/+/153737/4#message-d558a752d9467d474ac28d0fc6efa4ee687e49a7

That code review discussed added a new "Division" type that represented Unknown / ICANN / Private. For you, IIUC, "presence on the list" is equivalent to (division != Unknown), so you could code that bool directly, without needing a Dvision type: "return icann || (strings.IndexByte(ps, '.') >= 0)".

If that doesn't work for you, it's not that hard to parse the Public Suffix List for TLD's yourself, at whatever cadence you want:

https://play.golang.org/p/ibJSJo0ZeaM

@ynadji
Copy link
Author

ynadji commented Apr 1, 2019

Ah thank you @nigeltao that snippet works swimmingly.

@ynadji ynadji closed this as completed Apr 1, 2019
@golang golang locked and limited conversation to collaborators Mar 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants