Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/route: TestRouteMessage failures on freebsd-386 builders #35513

Closed
bcmills opened this issue Nov 11, 2019 · 14 comments
Closed

x/net/route: TestRouteMessage failures on freebsd-386 builders #35513

bcmills opened this issue Nov 11, 2019 · 14 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 OS-FreeBSD release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Nov 11, 2019

--- FAIL: TestRouteMessage (0.00s)
    message_test.go:235: dst|gateway|ifp|ifa|brd (pmtu=0) (inet4 127.0.0.1) (link 2 <nil> <nil>) (link 2 lo0 <nil>) (inet4 127.0.0.1) (inet4 0.0.0.0)
    message_test.go:235: dst|gateway (pmtu=0) (inet4 127.0.0.1) (link 2 <nil> <nil>)
    message_test.go:223: {0 4 0 0 1160 3 <nil> [(inet6 0000:0000:0000:0000:0000:0000:0000:0001 0) <nil> <nil> <nil> (link 0 <nil> <nil>) (inet6 0000:0000:0000:0000:0000:0000:0000:0000 0) <nil> (inet6 0000:0000:0000:0000:0000:0000:0000:0000 0)] 0 []}: invalid address
FAIL
FAIL	golang.org/x/net/route	0.006s

2019-11-09T02:19:31-daa7c04/freebsd-386-11_2
2019-11-09T02:19:31-daa7c04/freebsd-386-12_0

CC @tklauser @mikioh @bradfitz

@bcmills bcmills added OS-FreeBSD NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Nov 11, 2019
@bcmills bcmills added this to the Backlog milestone Nov 11, 2019
@bradfitz
Copy link
Contributor

Only 386? Always or a flake?

@bcmills
Copy link
Contributor Author

bcmills commented Nov 11, 2019

Always or a flake?

Looks like a recurring flake, but getting a better answer to that question is part of my motivation for filing #35515.

@bcmills
Copy link
Contributor Author

bcmills commented Oct 19, 2020

2020-10-16T16:51:38-7b1cca2/freebsd-386-11_2
2020-10-09T03:24:41-dbdefad/freebsd-386-11_2
2020-10-06T15:34:59-a7d1128/freebsd-386-12_0
2020-09-30T14:50:03-4acb6c0/freebsd-386-11_2
2020-09-27T03:25:02-5d4f700/freebsd-386-11_2
2020-09-27T03:25:02-5d4f700/freebsd-386-12_0
2020-09-23T18:22:12-328152d/freebsd-386-11_2
2020-09-23T18:22:12-328152d/freebsd-386-12_0
2020-09-04T19:48:48-62affa3/freebsd-386-11_2
2020-09-04T19:48:48-62affa3/freebsd-386-12_0
2020-08-13T13:45:08-3edf25e/freebsd-386-11_2
2020-07-07T03:43:11-ab34263/freebsd-386-11_2
2020-07-07T03:43:11-ab34263/freebsd-386-12_0
2020-06-25T00:16:55-4c52546/freebsd-386-11_2
2020-06-25T00:16:55-4c52546/freebsd-386-12_0
2020-06-02T11:40:24-627f964/freebsd-386-11_2
2020-05-05T04:18:28-1ed2336/freebsd-386-12_0
2020-05-01T05:30:45-e0ff5e5/freebsd-386-11_2
2020-05-01T05:30:45-e0ff5e5/freebsd-386-12_0
2020-04-25T23:01:54-ff2c4b7/freebsd-386-11_2
2020-04-21T23:12:49-e086a09/freebsd-386-11_2
2020-03-24T14:37:07-d3edc99/freebsd-386-11_2
2020-03-01T02:21:30-244492d/freebsd-386-12_0
2020-02-26T12:10:28-0de0cce/freebsd-386-11_2
2020-02-22T12:55:58-5a598a2/freebsd-386-11_2
2019-12-06T10:30:17-1ddd1de/freebsd-386-12_0
2019-12-04T02:50:24-5ee1b9f/freebsd-386-11_2
2019-11-26T23:54:20-ef20fe5/freebsd-386-11_2
2019-11-25T08:49:36-ffdde10/freebsd-386-12_0
2019-11-19T07:31:36-fc4aabc/freebsd-386-11_2
2019-11-19T07:31:36-fc4aabc/freebsd-386-12_0
2019-11-18T18:34:10-d06c31c/freebsd-386-11_2
2019-11-18T18:34:10-d06c31c/freebsd-386-12_0
2019-11-16T16:09:21-f9c8255/freebsd-386-12_0

@bcmills bcmills changed the title x/net/route: TestRouteMessage failures on freebsd builders x/net/route: TestRouteMessage failures on freebsd-386 builders Feb 2, 2021
@bcmills
Copy link
Contributor Author

bcmills commented May 7, 2021

@golang/release: it seems like this test either needs an owner to fix it, or ought to be skipped on this builder. The freebsd-386-12_2 builder is currently providing little value for x/net because it's ~always red.

(Marking as release-blocker for Go 1.17 via #11811.)

@bcmills bcmills modified the milestones: Backlog, Go1.17 May 7, 2021
@heschi
Copy link
Contributor

heschi commented May 13, 2021

Weekly check-in: this needs to be investigated before beta 1.

cc @neild

@neild
Copy link
Contributor

neild commented May 14, 2021

I have not yet figured out what's going on here, but it's not 1.17 specific. The problem is easily reproducible with -count=1000 and 1.16.

@heschi
Copy link
Contributor

heschi commented May 18, 2021

Removing release-blocker, then.

@bcmills
Copy link
Contributor Author

bcmills commented May 19, 2021

@heschi, I believe the freebsd-12_2-386 builder is new as of the Go 1.17 cycle.

Unless I am mistaken, from #11811 it follows that part of the process of setting up a new builder needs to include either fixing or adding skips for the existing tests that are already failing on that builder.

Since the builder is new for Go 1.17, I believe this issue should still be a 1.17 release-blocker.

@bcmills bcmills added Builders x/build issues (builders, bots, dashboards) okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 release-blocker labels May 19, 2021
@bcmills
Copy link
Contributor Author

bcmills commented May 19, 2021

Note that the failure rate on the freebsd-12_2-386 builder is qualitatively, if not quantitatively, different from the failure rate on the previous 11_2 and current 11_4 builders:

image

I cannot explain why the qualitative appearance of the dashboard does not seem to match the more even distribution of failures in the logs downloaded by fetchlogs. That discrepancy may be worth investigating further, but I don't plan to pursue it at this time.

2021-05-10T12:01:50-4163338/freebsd-386-11_4
2021-05-10T12:01:50-4163338/freebsd-386-12_2
2021-05-10T09:51:57-81045d8/freebsd-386-11_2
2021-05-10T09:51:57-81045d8/freebsd-386-12_2
2021-05-08T05:16:33-16afe75/freebsd-386-12_2
2021-05-05T21:49:59-0714010/freebsd-386-11_4
2021-05-05T21:49:59-0714010/freebsd-386-12_2
2021-05-05T02:47:14-0287a6f/freebsd-386-12_2
2021-05-04T13:21:25-bbd867f/freebsd-386-11_4
2021-05-04T13:21:25-bbd867f/freebsd-386-12_2
2021-05-03T06:03:51-7fd8e65/freebsd-386-11_2
2021-05-03T06:03:51-7fd8e65/freebsd-386-11_4
2021-05-03T06:03:51-7fd8e65/freebsd-386-12_2
2021-05-02T03:00:24-e590880/freebsd-386-12_2
2021-05-01T22:26:12-f8dd838/freebsd-386-11_2
2021-05-01T22:26:12-f8dd838/freebsd-386-12_2
2021-05-01T14:20:56-aec3718/freebsd-386-11_4
2021-05-01T14:20:56-aec3718/freebsd-386-12_2
2021-04-28T14:07:49-89ef3d9/freebsd-386-12_2
2021-04-27T23:12:57-85d9c07/freebsd-386-12_2
2021-04-23T18:45:38-5f58ad6/freebsd-386-11_4
2021-04-23T18:45:38-5f58ad6/freebsd-386-12_2
2021-04-23T17:40:36-e997de6/freebsd-386-12_2
2021-04-21T23:01:15-4e50805/freebsd-386-12_0
2021-04-21T23:01:15-4e50805/freebsd-386-12_2
2021-04-20T21:01:06-798c215/freebsd-386-11_2
2021-04-15T23:10:46-e915ea6/freebsd-386-11_2
2021-04-15T23:10:46-e915ea6/freebsd-386-12_0
2021-04-14T19:42:28-0645797/freebsd-386-11_2
2021-04-05T18:03:19-a5a99cb/freebsd-386-11_2
2021-03-31T21:22:08-0fccb6f/freebsd-386-11_2
2021-03-31T06:09:03-cb1fcc7/freebsd-386-12_0
2021-03-30T21:00:36-cd0ac97/freebsd-386-11_2
2021-03-30T21:00:36-cd0ac97/freebsd-386-12_0
2021-03-26T22:08:43-6ef6e9b/freebsd-386-11_2
2021-03-26T06:03:03-6b15177/freebsd-386-11_2
2021-03-24T20:56:30-d1beb07/freebsd-386-11_2
2021-03-24T20:56:30-d1beb07/freebsd-386-12_0
2021-03-16T09:26:52-d523dce/freebsd-386-11_2
2021-02-26T17:20:49-e18ecbb/freebsd-386-11_2
2021-02-26T17:20:49-e18ecbb/freebsd-386-12_0
2021-02-26T10:14:13-39120d0/freebsd-386-11_2
2021-02-24T08:20:22-3d97a24/freebsd-386-12_0
2021-02-22T17:17:44-9060382/freebsd-386-12_0
2021-02-20T03:31:24-5f55cee/freebsd-386-11_2

@neild
Copy link
Contributor

neild commented May 19, 2021

The failing test reads routing information from an AF_ROUTE socket and parses it.

The information read from the socket does not match the expected format. The test passes most of the time, because most of the time the parser doesn't realize there's a problem.

You can easily reproduce this on the freebsd-386-11_2 builder with go test ./route -run=RouteMessage/2 -count=1000 -failfast.

Here are the bytes provided to ParseRIB for an example failure:

rt_msghdr
48010504 02000000 45003000 33000000 5f080000 03000000 00000000 00000000
00000000 00000000 00000000 00000000 00400000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 01000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000

sockaddr
                                                      1c1c0000 00000000
00000000 00000000 00000000 00000001 00000000

unknown data
                                             ffffffff 36120200 18000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 38120200 18030000 6c6f3000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000

sockaddr
                  1c1c0000 00000000 00000000 00000000 00000000 00000001
00000000

unknown data
         ffffffff

This data is expected to be an rt_msghdr struct, followed by an array of sockaddrs. The number of sockaddrs is specified in the header--2 in this case.

There are what appear to be two IPv6 sockaddrs in the data here, each consisting of 28 bytes starting with 1c1c.

There is also a chunk of data between the sockaddrs--everything from the ffffffff to the next 1c1c. We parse the first sockaddr, and then attempt to parse the ffffffff as a sockaddr and fail. The times the test passes, the chunk of data contains some different data which parses correctly.

I've poked through the source code for FreeBSD utilities which parse this same data, and none of them seem to handle this case. For example the code here looks pretty similar to that in x/net/route:
https://github.com/freebsd/freebsd-src/blob/main/usr.bin/netstat/route.c#L317

I'm very puzzled by what's going on here. This test is definitely pointing out a real problem. It is flaky only because the "successful" runs aren't detecting that they're parsing invalid data.

@ianlancetaylor
Copy link
Contributor

I think you are misreading the data slightly. The number of addresses is not 2, but is determined by the number of bits set in the rtm_addrs field. In this case the value 0x33 indicates that there are four addresses. The first is an AF_INET6 address, the next two are AF_LINK addresses, and the last one is another AF_INET6 address. Viewed this way we can see that the 0xffffffff values appear after the AF_INET6 addresses, which strongly suggests that the address lengths are being rounded up to a multiple of 8 bytes, in this case rounding from 0x1c to 0x20. That makes me suspect that we are running a 386 program on an amd64 kernel. And that makes me suspect that https://golang.org/cl/139577 was mistaken, or perhaps that it simply doesn't apply to whatever system we are running. See what happens if you revert that CL.

@neild
Copy link
Contributor

neild commented May 21, 2021

You're right, I was misreading it.

If I'm following the FreeBSD code correctly, after this change, sockaddrs in routing messages returned by the sysctl syscall with the SCTL_MASK32 flag set are 32-bit aligned. The test here is reading routing messages from an AF_ROUTE routing socket. Empirically, this returns 64-bit aligned sockaddrs.

We do also make a sysctl syscall elsewhere in this test, which is also returning 64-bit aligned sockaddrs.

I'm not clear on who sets the SCTL_MASK32 flag (userspace? kernel?), but so far as I can tell all the routing messages we're reading are 64-bit aligned.

@gopherbot
Copy link

Change https://golang.org/cl/321869 mentions this issue: route: revert routing message alignment for FreeBSD 386 emulation

@golang golang locked and limited conversation to collaborators May 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 OS-FreeBSD release-blocker
Projects
None yet
Development

No branches or pull requests

6 participants