Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: builders and TryBots should check documentation for broken links #37047

Open
bcmills opened this issue Feb 5, 2020 · 8 comments
Open

all: builders and TryBots should check documentation for broken links #37047

bcmills opened this issue Feb 5, 2020 · 8 comments
Labels
Builders x/build issues (builders, bots, dashboards) Documentation help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Feb 5, 2020

We occasionally end up with broken or outdated links in documentation (see #46545, #37042 (comment), #27860, #21951, #19244).

Testing for broken links would not be entirely trivial, but nonetheless pretty straightforward to implement: we have an HTML parser in golang.org/x/net/html, and it is easy enough to issue a HEAD request for link targets to see if they resolve. (For links with anchors, we would probably want to also check the Content-Type header and then parse the actual linked HTML to ensure that the anchor exists.)

CC @golang/osp-team

@bcmills bcmills added Testing An issue that has been verified to require only test changes, not just a test failure. Documentation help wanted Builders x/build issues (builders, bots, dashboards) labels Feb 5, 2020
@bcmills bcmills added this to the Backlog milestone Feb 5, 2020
@dmitshur
Copy link
Contributor

dmitshur commented Feb 5, 2020

Thanks for opening an issue about this. I agree it's not entirely trivial, but if implemented in a way that doesn't cause too much noise or false positives, I also agree that this would be helpful.

I believe @jayconrod wanted this too. I can't find an existing issue; it may have been a comment in a CL.

Edit: Found it, it was this comment in a CL.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 6, 2020

Thinking about this some more: we have control over golang.org, go.dev, and github.com/golang URLs, but not over other URLs. Probably the TryBot test should only fail if the broken link is on one of those sites, but at least the longtest builders should also check external links.

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 6, 2020
@alain91
Copy link

alain91 commented Feb 7, 2020

Hello, I'm newbee in golang but want to dive in the core code.
I have experience in developping code but not in go.
I would have a look into this issue (as a use case) but my investigations could take time.
Is my contribution acceptable ?

@bcmills
Copy link
Contributor Author

bcmills commented Feb 7, 2020

@alain91, the tricky part of this issue will likely be getting the rendered documentation (in order to obtain the links it contains), and indexing that documentation (in order to be able to detect broken relative links). I'm not familiar with that part of the codebase myself, so I don't know how accessible it would be to a newcomer.

@alain91
Copy link

alain91 commented Feb 7, 2020

I'm only newcomer in go. I have experience with parser, html decoder...

@ShawnROGrady
Copy link

ShawnROGrady commented Mar 3, 2020

@bcmills I took a crack at implementing something like this. This just walks the filepath, generates documentation html, then calls HEAD on any tag values with an href key. Running against the src directory outputs:

$ godoc_link_validator -check '^golang\.org|go\.dev|github.com/golang$' ./src
2020/03/03 15:05:59 src/cmd/go/internal/lockedfile: http HEAD 'https://golang.org/issue/20803.' status code: 404

Without ignoring external links (just ignoring localhost and example URLs) I get:

$ godoc_link_validator -ignore '^localhost|example|code' ./src 
2020/03/03 15:08:21 src/cmd/go: error retrieving 'https://proxy.golang.org,direct': Head "https://proxy.golang.org,direct": dial tcp: lookup proxy.golang.org,direct: no such host
src/cmd/go/internal/lockedfile: http HEAD 'https://golang.org/issue/20803.' status code: 404
src/cmd/internal/obj/arm: http HEAD 'http://infocenter.arm.com/help/topic/com.arm.doc.ihi0040b/IHI0040B_aadwarf.pdf' status code: 404
src/cmd/internal/obj/arm64: http HEAD 'http://infocenter.arm.com/help/topic/com.arm.doc.ecm0665627/abi_sve_aadwarf_100985_0000_00_en.pdf' status code: 404

That proxy link represents the default GOPROXY value (found in docs here), and I'm not sure if its better to just ignore it or to update the doc generator to not include the ",direct" portion in the clickable link.

That issue link (found in docs here) returns a 404 because of a trailing period (created issue #37640 to track this).

Both of the ARM links are actually broken (here and here in docs).

I am not sure if/how this should be included in the trybots since I am not familiar with how exactly they are configured.

@rsc
Copy link
Contributor

rsc commented Dec 8, 2023

I sent go.dev/cl/548059 to check all links inside the go.dev server.
(That is, not checking external links.)
That covers all the issue @bcmills linked in the original report except for checking #fragment.
I think external links are and should be beyond the scope of builders and TryBots.

I will leave this open to track potentially checking #fragment too.

@gopherbot
Copy link

Change https://go.dev/cl/548059 mentions this issue: cmd/golangorg: check for invalid or broken links in served HTML

gopherbot pushed a commit to golang/website that referenced this issue Dec 8, 2023
Test that links are to /foo not https://go.dev/foo
and also check that the links actually point at real pages.
Manually fix problems that the test found.

For golang/go#37047.

Change-Id: I825eec3c2cadb9d259caff51cd893f3023ab533a
Reviewed-on: https://go-review.googlesource.com/c/website/+/548059
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) Documentation help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

7 participants