Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/trace: TestAnalyzeAnnotations failures on linux/riscv64 #46737

Closed
bcmills opened this issue Jun 14, 2021 · 17 comments
Closed

cmd/trace: TestAnalyzeAnnotations failures on linux/riscv64 #46737

bcmills opened this issue Jun 14, 2021 · 17 comments
Labels
arch-riscv Issues solely affecting the riscv64 architecture. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Jun 14, 2021

--- FAIL: TestAnalyzeAnnotations (0.01s)
    annotations_test.go:136: got task task 1:	task0
        	start: 496007 end: 768012 complete: true
        	2 goroutines
        	4 regions:
        		task0.region0(goid=12)
        		(goid=13)
        		task0.region2(goid=13)
        		task0.region1(goid=13)
        	1 children:
        		task1
        ; want {complete:true goroutines:2 regions:[task0.region0  task0.region1 task0.region2]}
FAIL
FAIL	cmd/trace	0.815s

2021-06-13T08:17:17-24cff0f/linux-riscv64-unleashed
2021-04-27T15:57:56-bc62887/linux-riscv64-jsing
2021-04-06T14:15:29-b345a30/linux-riscv64-jsing
2021-04-02T01:14:00-97b3ce4/linux-riscv64-jsing
2021-03-22T03:52:31-d8394bf/linux-riscv64-jsing
2020-05-05T05:13:26-9b18968/linux-riscv64-unleashed
2020-04-25T02:19:12-49f10f3/linux-riscv64-unleashed

CC @prattmic @mknyszek @4a6f656c @bradfitz

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. arch-riscv Issues solely affecting the riscv64 architecture. labels Jun 14, 2021
@bcmills bcmills added this to the Backlog milestone Jun 14, 2021
@bcmills bcmills changed the title cmd/trace: TestAnalyzeAnnotations failures on linux-riscv64 cmd/trace: TestAnalyzeAnnotations failures on linux/riscv64 Jun 14, 2021
@mengzhuo
Copy link
Contributor

mengzhuo commented Jun 21, 2021

This affected Hifive unmatched too.
I've tried to bisect unfortunately it is bad since go1.14(20a838a)

@bcmills
Copy link
Contributor Author

bcmills commented Jun 22, 2021

Would it make sense to skip this test on the riscv64 builders until it can be diagnosed? The failure rate is high enough that it could be masking other failures on these builders.

@gopherbot
Copy link

Change https://golang.org/cl/330849 mentions this issue: cmd/trace: ensure at least 1 tick between events

@gopherbot
Copy link

Change https://golang.org/cl/333449 mentions this issue: cmd/trace: ensure at least 1 tick between events

@bcmills
Copy link
Contributor Author

bcmills commented Jul 12, 2021

@hyangah, could you take a look at the fix for this? It's at +1, and would be nice to get in a fix for the underlying problem before the Go 1.17 release.

@bcmills bcmills modified the milestones: Backlog, Go1.17 Jul 12, 2021
@mknyszek mknyszek modified the milestones: Go1.17, Go1.18 Aug 18, 2021
@mengzhuo
Copy link
Contributor

mengzhuo commented Nov 1, 2021

Unfortunately It start failing again.

https://build.golang.org/log/d2f46348571205149b5bb4480bbdc4a202f1f73e

It seems differ with original error that goid is some how higher than previous failure logs.

I'll take a look into it.

@bcmills bcmills reopened this Nov 19, 2021
@bcmills
Copy link
Contributor Author

bcmills commented Nov 19, 2021

The new failure mode has a region with a goid that lacks a corresponding region name:

--- FAIL: TestAnalyzeAnnotations (0.01s)
    annotations_test.go:135: got task task 1:	task0
        	start: 299974 end: 1269973 complete: true
        	2 goroutines
        	4 regions:
        		task0.region0(goid=45)
        		(goid=46)
        		task0.region2(goid=46)
        		task0.region1(goid=46)
        	1 children:
        		task1
        ; want {complete:true goroutines:2 regions:[task0.region0  task0.region1 task0.region2]}
FAIL
FAIL	cmd/trace	0.354s

greplogs --dashboard -md -l -e 'FAIL: TestAnalyzeAnnotations' --since=2021-10-20

2021-11-18T22:40:42-2375b6e/linux-riscv64-unmatched
2021-11-12T20:20:57-76fbd61/linux-riscv64-unmatched
2021-11-06T19:41:12-7ca772a/linux-riscv64-unmatched
2021-11-01T02:47:30-fde4cc2/linux-riscv64-unmatched
2021-10-23T20:23:35-8dbf3e9/linux-riscv64-unmatched

@bcmills bcmills added okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 release-blocker labels Nov 19, 2021
@bcmills
Copy link
Contributor Author

bcmills commented Nov 19, 2021

(This is a release-blocker for Go 1.18 via #11811. If it isn't feasible to fix for the 1.18 release, we should add a skip on linux/riscv64.)

@toothrot
Copy link
Contributor

toothrot commented Dec 8, 2021

If this has been bad since 1.14, we should likely add a skip. @4a6f656c

@cherrymui cherrymui removed the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Dec 14, 2021
@cherrymui
Copy link
Member

This is still failing

2021-12-08T18:30:00-61011de/linux-riscv64-unmatched
2021-12-06T17:34:53-2cb9042/linux-riscv64-unmatched
2021-12-02T20:21:16-469f030/linux-riscv64-unmatched
2021-11-26T13:18:11-7703804/linux-riscv64-unmatched

Any update on this? Thanks. (checking in as this is a release blocker)

Yeah, if it fails since 1.14, it is probably not a release blocker.

@cherrymui
Copy link
Member

This is a release-blocker for Go 1.18 via #11811

This fails on the main repo. I'm not sure how this is related to #11811

@bcmills
Copy link
Contributor Author

bcmills commented Dec 15, 2021

#11811 is basically “tests should be passing”. I suppose that the main repo is technically not a subrepo, but we still ought to have the builders passing tests before we cut a release. 😅

(Is there a reason not to add a skip in this case?)

@cherrymui
Copy link
Member

cherrymui commented Dec 15, 2021

Thanks.

(Is there a reason not to add a skip in this case?)

No. Skipping is fine.

@gopherbot
Copy link

Change https://golang.org/cl/373034 mentions this issue: runtime: invalid negative frequency while tracing

@mengzhuo
Copy link
Contributor

@cherrymui @bcmills @bcmills I've run 10000 TestAnnotationAnalyze on CL 373034 without failure, please take a look, thanks.

gopherbot pushed a commit that referenced this issue Dec 29, 2021
The riscv64 Hifive Unmatched is the only platform that
failed on testcase TestAnalyzeAnnotations occasionally
after CL 332954 had merged. The failure happens when
ticks per second (freq) is over 1e12 which causing the timestamps
of two events are same.

There are 2 reasons causing big frequency:
1. RDCYCLE is HART based according to the riscv manual which makes
   negative ticks delta
2. negative float64 -> uint64 is undefined and "lucky" negative float
   is too big to handle for trace

For #46737

Change-Id: I1f3c1ac31aae249969000c719c32aaf5a66d29a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/373034
Trust: Zhuo Meng <mzh@golangcn.org>
Run-TryBot: Zhuo Meng <mzh@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@cherrymui
Copy link
Member

After the CL above is there anything else still needs to be done for this issue? Or we can close it? Thanks.

@cherrymui cherrymui reopened this Dec 29, 2021
@prattmic
Copy link
Member

prattmic commented Jan 7, 2022

There haven't been any more failures since https://golang.org/cl/373034, so I think this can be closed unless failures reoccur.

@prattmic prattmic closed this as completed Jan 7, 2022
@rsc rsc unassigned hyangah Jun 23, 2022
@golang golang locked and limited conversation to collaborators Jun 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-riscv Issues solely affecting the riscv64 architecture. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

8 participants