Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/doc: link detection in ToHTML is misbehaving #22285

Closed
Bo0mer opened this issue Oct 16, 2017 · 5 comments
Closed

go/doc: link detection in ToHTML is misbehaving #22285

Bo0mer opened this issue Oct 16, 2017 · 5 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@Bo0mer
Copy link

Bo0mer commented Oct 16, 2017

Please answer these questions before submitting your issue. Thanks!

What did you do?

Run the following commands on my machine (running macOS):

$ godoc -http :6060

navigated to http://localhost:6060/pkg/golang.org/x/tools/present and did a search for renee.

What did you expect to see?

Valid link to https://www.instagram.com/reneefrench named Renée French.

What did you see instead?

Invalid link pointing to https://www.instagram.com/reneefrench/][Ren.

Does this issue reproduce with the latest release (go1.9.1)?

Yes, even on https://godoc.org/golang.org/x/tools/present.

System details

go version go1.9 darwin/amd64
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/ivan/workspace/Go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/ps/1g_475591cv3rf9lkqnyp9qh0000gn/T/go-build145680146=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOROOT/bin/go version: go version go1.9 darwin/amd64
GOROOT/bin/go tool compile -V: compile version go1.9
uname -v: Darwin Kernel Version 16.7.0: Thu Jun 15 17:36:27 PDT 2017; root:xnu-3789.70.16~2/RELEASE_X86_64
ProductName:	Mac OS X
ProductVersion:	10.12.6
BuildVersion:	16G29
lldb --version: lldb-900.0.45
  Swift-4.0
@dmitshur
Copy link
Contributor

dmitshur commented Oct 16, 2017

What did you expect to see?

Valid link to https://www.instagram.com/reneefrench named Renée French.

I would not expect to see that. It's a <pre> block containing the following text:

...
.caption _Gopher_ by [[https://www.instagram.com/reneefrench/][Renée French]]
...

It's just text. It happens to be in present format. But godoc can't know that it should parse the square brackets and text inside as a link.

The best it could do is link all of https://www.instagram.com/reneefrench/][Renée instead of just https://www.instagram.com/reneefrench/][Ren, but whether that's better or not needs to be determined.

Also see https://golang.org/pkg/go/doc/#ToHTML:

A span of indented lines is converted into a <pre> block, with the common indent prefix removed.

URLs in the comment text are converted into links; if the URL also appears in the words map, the link is taken from the map (if the corresponding map value is the empty string, the URL is not converted into a link).

I think this is working as intended (except perhaps the URL matching stopping at the accented letter).

@dmitshur dmitshur changed the title godoc: Link rendering is misbehaving x/tools/cmd/godoc: Link rendering is misbehaving Oct 16, 2017
@gopherbot gopherbot added this to the Unreleased milestone Oct 16, 2017
@ianlancetaylor
Copy link
Contributor

CC @dsnet who has been looking at godoc links. But I tend to agree that this is working as intended.

@dsnet
Copy link
Member

dsnet commented Oct 16, 2017

The logic for URL detection is in go/doc.ToHTML. It uses a combination of regexps and special-case logic as a heuristic for what is a URL. In #5043, the logic was adjusted so that URLs can contain parenthesis. It was further adjusted in #18139 to include square brackets.

A reasonable adjustment to the URL heuristic is to forbid the URL from containing an closing parenthesis as the first parenthesis of that form.

Thus, http://example.com/some_resource[foo] is valid, but http://example.com/some_resource]blah is not valid.

@dsnet dsnet self-assigned this Oct 16, 2017
@dsnet dsnet changed the title x/tools/cmd/godoc: Link rendering is misbehaving go/doc: link detection in ToHTML is misbehaving Oct 16, 2017
@dsnet dsnet modified the milestones: Unreleased, Go1.10 Oct 17, 2017
@dsnet dsnet added the NeedsFix The path to resolution is known, but the work has not been done. label Nov 8, 2017
@dsnet dsnet modified the milestones: Go1.10, Go1.11 Nov 13, 2017
@gopherbot
Copy link

Change https://golang.org/cl/94876 mentions this issue: go/doc: simplify and robustify link detection logic

@dmitshur
Copy link
Contributor

An update on what I said earlier. ] is not one of the safe characters that can be included in a URL without encoding (according to https://www.ietf.org/rfc/rfc1738.txt). RFC 1738 says it must to be encoded. So, it should be safe to stop detecting a URL at ] and not consider it to be a part of the URL.

Following the same logic, I'm not sure if http://example.com/some_resource[foo] should be detected with the [foo] suffix. Perhaps it'd be better to stop before the first [? If that was a correctly encoded URL, wouldn't it be written as http://example.com/some_resource%5Bfoo%5D?

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.12 Jun 29, 2018
@golang golang locked and limited conversation to collaborators Dec 14, 2019
@rsc rsc unassigned dsnet Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants