Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/types, x/tools/go/types, x/tools/analysis: encoding used by objectpath is inconsistent for use by compiler & tools/analysis #44195

Closed
amscanne opened this issue Feb 9, 2021 · 18 comments
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@amscanne
Copy link
Contributor

amscanne commented Feb 9, 2021

This issue is derived from dominikh/go-tools#924.

What version of Go are you using (go version)?

1.16rc1

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

Linux, ARM64.

What did you do?

When using analyzers implemented using the analysis package, I hit upon a case where facts being exported and imported were being matched up with the wrong methods. (This issue was a method-specific one, and the proposed fix is also method specific.)

For example, instrumenting the facts package in Encode and Decode for serialization and deserialization of facts:

2021/02/09 12:37:21 export: exported func (*pkg/sentry/vfs/vfs.DynamicBytesFileDescriptionImpl).StateFields() []string as DynamicBytesFileDescriptionImpl.M11

2021/02/09 12:18:54 import: imported DynamicBytesFileDescriptionImpl.M11 as func (*pkg/sentry/vfs/vfs.DynamicBytesFileDescriptionImpl).Read(ctx pkg/context/context.Context, dst pkg/usermem/usermem.IOSequence, opts google3/third_party/gvir/pkg/sentry/vfs/vfs.ReadOptions) (int64, error)

It turns out that the objectpath package is producing the name above, and this is what is used for keying the type in the exported fact data. During analysis, the type information for imported packages is sourced from the compiled artifact via gcexportdata, but the current package types are synthesized directly from the source files, and facts are derived from those types. However, this means that there is the possibility of facts being constructed using a different method ordering that what might appear in the compiler artifact, and therefore fact serialization may not key types correctly (and whoever is importing this fact is in for a bad time).

Since this logic is effectively built in to the compiler (which is often a binary package), this has the potential to cause issues for any analysis packages that may link against a different go/types package or perturb the ordering for NamedType.methods.

Possible Fix

While I realize that this API is not stable, the issue is effectively mitigated by making the Method ordering stable for NamedTypes by performing a sort on the first call to Method (and ensuring that no calls to AddMethod happen after that point). This should eliminate the sensitivity issue (but won't fix breakages in the case of genuine binary incompatibility, which is fine).

A draft of this fix is posted here: https://go-review.googlesource.com/c/go/+/290750

(But of course, I could be way off in my diagnose, or there could be a better solution.)

@prattmic
Copy link
Member

prattmic commented Feb 9, 2021

cc @griesemer @matloob

@prattmic prattmic added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Analysis Issues related to static analysis (vet, x/tools/go/analysis) labels Feb 9, 2021
@prattmic prattmic changed the title Encoding used by objectpath is inconsistent for use by compiler & tools/analysis go/types, x/tools/go/types, x/tools/analysis: encoding used by objectpath is inconsistent for use by compiler & tools/analysis Feb 9, 2021
@prattmic
Copy link
Member

@amscanne do the types constructed from gcexportdata always contain all of the methods that are present when analyzing from source (such as unexported methods)? If not, it seems that simply sorting the methods wouldn't be sufficient. Instead it seems like objectpath might need the actual method name rather than just an index?

@amscanne
Copy link
Contributor Author

Great question. I have no idea. I'll do a bit more digging. However, sorting does solve the issue in this case, so I suspect the answer might yes here (although that does not mean it is yes everywhere).

@prattmic
Copy link
Member

prattmic commented Mar 3, 2021

cc @timothy-king @guodongli-google as well, since y'all have been looking at analysis recently and may have thoughts here.

@amscanne
Copy link
Contributor Author

I think I have a better understanding of this issue.

The object file itself seems to contain rich type information (along with names). Based on my reading of the gcexportdata code, it seems like it is sorted in lexicographical order (presumably by the toolchain when emitting the object). Because of this, gcexportdata will always decode in the same lexicographical order. This is good, because it establishes a consistent ordering.

The problem is fundamentally with objectpath, which relies on this ordering. The objectpath package encodes methods in a way that relies on method ordering, as opposed to using names (which almost everything else does). Therefore, the objectpath encoding is susceptible to the method ordering on *types.Func objects. Note that *types.Interface objects are explicitly ordered during construction, so this indexing scheme works fine for interfaces -- just not methods.

The facts framework relies on objectpath to generate consistent keys for types. If two *types.Package objects are the same with the exception of method ordering on relevant *types.Func, then objectpath will generate two different paths for many objects (functions themselves, parameters, etc.).

For analysis, there are two different sources for the *types.Package: the current package comes from the source code (parsing and type checking), while the *types.Package for dependencies comes from gcexportdata. So the facts are saved using the source method ordering, while they are loaded using the gcexportdata method ordering (lexicographical).

It was suggested in [1] that type checking will sort methods, but I don't believe that is true. I think that ast parsing and type checking is sensitive to the ordering of files being parsed. This must be the case because the gcexportdata import does have a consistent lexicographical ordering imposed, as noted at the top.

As some evidence of this, my current workaround is to sort all input files [2] which seems to permute the types generated sufficiently to match in my case (though this is extremely fragile and works only for now). I've added sanity checking for the binary vs AST-derived types, and this files simply by removing the sort in this case.

So there are two reasonable solutions,

  1. Objectpath will always look at all methods and index into the sorted list.
  2. Type checking does sort these methods before completing a package.

Since (1) is unlikely to break anything (both uses of objectpath will get the exact same encoding, since gcexportdata will already be sorted and objectpath will index into that list), I think this makes more sense.

[1] https://go-review.googlesource.com/c/go/+/290750
[2] https://github.com/google/gvisor/blob/dc1b3884f30d96053dae550d3c40d035c8893d4b/tools/nogo/nogo.go#L465

@amscanne
Copy link
Contributor Author

I've sent a change to use a "canonical" ordering for objectpath:
https://go-review.googlesource.com/c/tools/+/331789

@amscanne
Copy link
Contributor Author

As expected, the workaround is too brittle to work correctly. While everything works in one configuration, another configuration (tsan/race) yields type conflicts. I think the proper objectpath fix is the way forward.

@mdempsky
Copy link
Member

Based on my reading of the gcexportdata code, it seems like it is sorted in lexicographical order (presumably by the toolchain when emitting the object).

Package-scope declarations are ordered lexicographically, but methods aren't declared at package scope. They're attached to their receiver type.

As expected, the workaround is too brittle to work correctly. While everything works in one configuration, another configuration (tsan/race) yields type conflicts.

How robust is objectpath expected to be about different build configurations? In general, changing build tags can arbitrarily change a type's definition, which I would think would necessarily invalidate objectpath strings.

@timothy-king
Copy link
Contributor

timothy-king commented Jul 13, 2021

Here is a description of the reproducers for this issue that I am aware of:

  1. A Go project is built using blaze (Google's internal version of bazel).
  2. The target contains a file foo/foo.go and a bazel rule to generate a file based on foo.go
  3. The generated file is appended to the GoFile list for the target, bazel-generated/foo/foo_generated.go.
  4. The target is a sent to a native blaze rule to compile it.
  5. GoFile list is sorted by that rule before sending it to the compiler. "bazel-generated/foo/foo_generated.go" < "foo/foo.go" so these now swap positions.
  6. The compiler writes out the gcexportdata in this order.
  7. A separate source analyzer based on (*types.Config).Check loads the files in the package using the original GoFile order.
  8. Both foo/foo.go and bazel-generated/foo/foo.go contain methods on an exported type T.
  9. The objectpath for the methods on T now disagree on what the i'th method of T is between gcexportdata and the types coming from (*types.Config).Check.

We can make method ids in objectpath agnostic to the GoFile order of these two paths by sorting them.

If "foo_generated.go" and "foo.go" contained init() functions, we would be changing the expected order these are executed in. This can potentially be resolved separately.

@findleyr
Copy link
Contributor

Hi, just catching up. As I commented on https://golang.org/cl/331789, I'm a little concerned about changing objectpath serialization. I think we should do it, but want to go over the compatibility implications.

Here's my analysis:

  • The objectpath package itself leaves it ambiguous whether its encoding can change.
  • go/analysis is explicit that its encoding is unspecified. That takes care of most uses.
  • pkgsite points to a few uses of objectpath outside of go/analysis. Obviously there could be more that are vendored or not open source. Notably, it looks like golangci-lint and honnef.co/go/tools use it, but both cases look OK (they're either operating on a transient cache, or are also relying on fact encoding).

Does anyone else have observations or concerns with respect to backwards compatibility of objectpath encoding? I have very little context on this package, but based on my analysis I think it's a gray area. Since this is fixing a bug, I think we should proceed. We should probably also make it explicit that objectpath encoding may change.

@mdempsky
Copy link
Member

The objectpath package itself leaves it ambiguous whether its encoding can change.

FWIW, the current encoding depends on implementation details of the export data format that I at least consider to be unstable. E.g., I've already landed CLs on the dev.typeparams branch that sort methods before exporting them, so that's going to change objectpath's method numbering anyway. (And the x/tools exporter already sorts them too, but using a different sort order.)

So to the extent that objectpath needs to be stable, it needs to be decoupled from the export data ordering of method's anyway.

If it's important to maintain historical sort order, it might be able to sort methods based on Pos. But I think sorting on Id is simpler / more robust.

@timothy-king
Copy link
Contributor

Does anyone else have observations or concerns with respect to backwards compatibility of objectpath encoding?

I had fairly similar concerns and went through roughly the same checklist. Most of my observations were left as comments on https://golang.org/cl/331789. I think we are okay.

The objectpath package itself leaves it ambiguous whether its encoding can change.

At the moment objectpath is determined by the order files are parsed. This means if two tools disagree about the file order the method ids are unstable anyways. We some evidence that this is happening. (See my previous comment for details.) My understanding of the objectpath documentation is that this was not the intention. So plausibly this is a bug in the implementation.

We should probably also make it explicit that objectpath encoding may change.

This would probably need to be similar to other interfaces like gcexportdata. This needs to be consistent while stored in the cache by the same tool while analyzing a different project.

Personally I think we may just want to stop using numbers for identifying the methods and switch to method names in the encoding. Quicker (asymptotically at least) writing and lookup times (after https://golang.org/cl/331789 goes in), removes the file ordering concerns and seems conceptually stable. It may also just be robust enough that clearly documenting the conditions that objectpath is stable w.r.t. is not worth it? The main cost that I can think of is additional memory/storage space.

it might be able to sort methods based on Pos.

I have not looked into the details of how token.FileSet are created, but Pos order might have the same set of problems: a different order of files passed to the tool creates a different order of Files, which creates different Pos orders. Positions might be doable but I am not sure if we can always rely on these.

@gopherbot
Copy link

Change https://golang.org/cl/331789 mentions this issue: go/types/objectpath: canonical order for methods

@gopherbot
Copy link

Change https://golang.org/cl/339689 mentions this issue: go/types/objectpath: use method names in objectpath.

copybara-service bot pushed a commit to google/gvisor that referenced this issue Aug 13, 2021
On Go tip (pre-1.18), http://golang.org/issue/44195 is making SA1019 mistake
uses of reflect.Value.Len for reflect.Value.InterfaceData, which is deprecated.
It is thus mistakenly raising deprecation errors on uses of reflect.Value.Len.

Suppress these errors by disabling SA1019 entirely. This is a bit overkill, but
it is unclear to me if we want hard errors on deprecation anyways. That can be
reevaluated when http://golang.org/issue/44195 is fixed.

Updates golang/go#44195

PiperOrigin-RevId: 390646221
copybara-service bot pushed a commit to google/gvisor that referenced this issue Aug 13, 2021
On Go tip (pre-1.18), http://golang.org/issue/44195 is making SA1019 mistake
uses of reflect.Value.Len for reflect.Value.InterfaceData, which is deprecated.
It is thus mistakenly raising deprecation errors on uses of reflect.Value.Len.

Suppress these errors by disabling SA1019 entirely. This is a bit overkill, but
it is unclear to me if we want hard errors on deprecation anyways. That can be
reevaluated when http://golang.org/issue/44195 is fixed.

The other staticcheck analyzers are moved to alphabetical order.

Updates golang/go#44195

PiperOrigin-RevId: 390646221
copybara-service bot pushed a commit to google/gvisor that referenced this issue Aug 13, 2021
On Go tip (pre-1.18), http://golang.org/issue/44195 is making SA1019 mistake
uses of reflect.Value.Len for reflect.Value.InterfaceData, which is deprecated.
It is thus mistakenly raising deprecation errors on uses of reflect.Value.Len.

Suppress these errors by disabling SA1019 entirely. This is a bit overkill, but
it is unclear to me if we want hard errors on deprecation anyways. That can be
reevaluated when http://golang.org/issue/44195 is fixed.

The other staticcheck analyzers are moved to alphabetical order.

Updates golang/go#44195

PiperOrigin-RevId: 390655918
@dominikh
Copy link
Member

What's the status of this? Have we decided on either of the two proposed fixes?

With Go tip it appears almost guaranteed to run into this bug, see the various issues referring to this issue.

@timothy-king
Copy link
Contributor

What's the status of this? Have we decided on either of the two proposed fixes?

The status is that we delayed https://go-review.googlesource.com/c/tools/+/331789/ until post release. Now that the tree is open for Go 1.18 we are no longer blocked. We now need to make a decision between this and the other option https://golang.org/cl/339689.

If folks have feedback on the candidate solutions, it would help to get this soon.

(And since I cannot leave well enough alone #47725 is related, but not necessary to make a decision on this.)

@gopherbot
Copy link

Change https://golang.org/cl/343390 mentions this issue: cmd/compile: only sort methods/interfaces during export for -d=unifiedquirks

gopherbot pushed a commit that referenced this issue Aug 18, 2021
…dquirks

These sorts are only important for 'toolstash -cmp' testing of unified
IR against -G=0 mode, but they were added before I added
-d=unifiedquirks to allow altering small "don't care" output details
like this.

This CL should help mitigate issues with #44195 until package
objectpath is updated and deployed.

Change-Id: Ia3dcf359481ff7abad5ddfca8e673fd2bb30ae01
Reviewed-on: https://go-review.googlesource.com/c/go/+/343390
Trust: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/354433 mentions this issue: go/types/objectpath: canonical order for methods

@golang golang locked and limited conversation to collaborators Oct 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants