Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/publicsuffix: ICANN flag returned not aligned with public suffix location in the list #22985

Open
eminano opened this issue Dec 4, 2017 · 7 comments
Milestone

Comments

@eminano
Copy link

eminano commented Dec 4, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.9.2

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

goos: darwin
goarch: amd64

What did you do?

Call PublicSuffix with input domain "transurl.be"
https://play.golang.org/p/SLdVLEzpk4

What did you expect to see?

ICANN flag set to true

List contains "be" within the ICANN delimiters:
// be : https://en.wikipedia.org/wiki/.be
// Confirmed by registry tech@dns.be 2008-06-08
be
ac.be

and "*.transurl.be" outside.

// TransIP : htts://www.transip.nl
// Submitted by Rory Breuk rbreuk@transip.nl
*.transurl.be
*.transurl.eu
*.transurl.nl

What did you see instead?

ICANN flag set to false

I am not sure if this behaviour is expected. Should transurl.be match *.transurl.be, given it doesn't have a 3rd level domain?
Also, since the public suffix domain returned is "be" , I would expect for that rule to be the one being matched, and therefore having ICANN set to true.

There are more occurrences of this scenario, such as "0emm.com".

@gopherbot gopherbot added this to the Unreleased milestone Dec 4, 2017
@meirf
Copy link
Contributor

meirf commented May 7, 2018

I think this behavior is expected at least from the perspective of the official PSL repo.
I forked the repo and added test cases.

Build 7 which has change:

+// TLD wildcard with subdomain: https://github.com/golang/go/issues/22985
+transurl.be transurl.be

results in failure.

Build 8 which has change:

-transurl.be transurl.be
+transurl.be null

results in success.

Build 9 which has change:

+c.transurl.be null
+b.c.transurl.be b.c.transurl.be
+a.b.c.transurl.be b.c.transurl.be

results in success.

Note that there were no other changes between these builds. Compare the cases here: https://github.com/meirf/list/blob/master/tests/tests.txt#L40-L49, note the pattern of transurl.be swapped with mm.

This answer is purely based on the official repo behavior. Perhaps @vdobler, @weppos can give some additional insights/correct my results.

@weppos
Copy link

weppos commented May 8, 2018

Thanks for bringing it to the attention.

At first glance, I'd say the match should be against *.transurl.be as this is the longest matching suffix. In fact, for wildcard rules the match is against the token without the *.transurl.be.

However, it's interesting we don't have a proper test case.

I also checked it against 2 libraries I maintain. The Go lib matches against the be (it currently uses a very naive sequential scan approach), whereas the Ruby lib matches against the private TLD (it uses a more optimized Hash-based approach).

I'll bring this issue to the PSL list to make sure the interpretation is correct and make sure we address it.

@vdobler
Copy link
Contributor

vdobler commented May 8, 2018

Fro my reading "transurl.be" would be matched only by the rule "be" and not by "*.transurl.be" as the nothing in "transurl.be" matches the *.

This icann should be true. So it is a real bug.

@weppos
Copy link

weppos commented May 8, 2018

Fro my reading "transurl.be" would be matched only by the rule "be" and not by "*.transurl.be" as the nothing in "transurl.be" matches the *.

I believe this is the source of the confusion.

From the formal algorithm:

A domain is said to match a rule if and only if all of the following conditions are met:

  • When the domain and rule are split into corresponding labels, that the domain contains as many or more labels than the rule.
  • Beginning with the right-most labels of both the domain and the rule, and continuing for all labels in the rule, one finds that for every pair, either they are identical, or that the label from the rule is "*".

Emphasis is mine. It means, given two rules:

  • be
  • *.example.be

If you try to match example.be, it should match both as the second rule falls into the very last case:

or that the label from the rule is "*".**

In other words, in case the rule contains ! or *., the match is performed against the token without the char.

  • foo.com
  • !foo.com
  • *.foo.com

should all match foo.com.

I think this is one of those cases where it's worth to invest some effort on the PSL side to reach a consensus and standardize all the implementations. I am definitely bringing it up for discussion (I'll add a reference to the ticker and/or thread here).

@vdobler
Copy link
Contributor

vdobler commented May 8, 2018

@weppos I doubt that this reading is correct.

*.example.be has three labels. From right to left these are "be", "example" and "*".

example.be has two labels: "be" and "example"

So the domain example.be can never match the rule *.example.be because the domain has fewer labels than the rule. The * is a label; it is not "a label or no label".

@vdobler
Copy link
Contributor

vdobler commented May 14, 2018

While trying to fix this bug I found one more thing I think is underspecified:

What is the icann flag for "bd"? There is just one rule for bd:

*.bd

(the rule is in the ICANN section).

With the official PSL algorithm the domain "be" would not match the rule ".bd"
due to "too few labels". But then PublicSuffix("be") would be handled by the
implied default rule "
" and I doubt that this should be considered a ICANN managed
domain
But maybe this is intentional and bd itself is not ICANN managed while
all subdomains like foo.bd are?

(Note: The strange rule "*.bd" seems to be a "fix" for some test case break.
The rules switch between

*.bd

and

// bd : http://en.wikipedia.org/wiki/.bd
bd
ac.bd
com.bd
edu.bd
gov.bd
mil.bd
net.bd
org.bd

Where the later with the explicit rule for "bd" seems saner.
)

@weppos
Copy link

weppos commented May 21, 2018

It looks like this is an old issue. I've resurrected the discussion.
https://groups.google.com/d/msg/publicsuffix-discuss/rY6yRK802Kg/ripeB6goCAAJ

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants