Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: introduce a build configurations file #39005

Closed
dominikh opened this issue May 11, 2020 · 15 comments
Closed

proposal: cmd/go: introduce a build configurations file #39005

dominikh opened this issue May 11, 2020 · 15 comments
Labels
FrozenDueToAge Proposal Tools This label describes issues relating to any tools in the x/tools repository.
Milestone

Comments

@dominikh
Copy link
Member

dominikh commented May 11, 2020

(this is a joint proposal by @dominikh and @mvdan)

Abstract

We describe a file format for specifying a list of build configurations, where build configurations are characterized by environment variables and command-line arguments for the build system.

Background

Go has the notion of build tags, which control the set of files that make up a package under a given configuration. Tags can be user-defined and specified with the -tags flag, or they may be defined by the build system itself, bound to parameters such as the operating system and CPU architecture, overridable with environment variables such as GOOS and GOARCH.

Due to these tags, a single import path may effectively refer to a set of packages, each package differentiated by the active tags. While referring to a single build configuration is straightforward (by specifying the correct tags and environment variables), it is much more difficult to explore all relevant build configurations.

Many tools, however, would like to know the list of relevant build configurations, either for correctness reasons (static analysis) or for UI reasons (IDEs, …). A CI pipeline should execute the tests of all relevant build configurations, not just one. Static analysis tools such as staticcheck should analyze all relevant build configurations to detect issues under all viable code paths. Detecting unused functions needs to observe function calls under all relevant build configurations, not just one. A language server such as gopls needs to be able to provide accurate code intelligence and offer the user a list of build configurations to choose from. The list goes on.

Naively iterating through all unique combinations of tags quickly leads to combinatorial explosion. Go supports a dozen operating systems on a dozen CPU architectures, can be used with and without cgo support, and makes use of tags such as netgo and timetzdata to affect how the standard library gets built. On top of this, users define their own tags, for example for debug-only code. This results in thousands of possible build configurations, most of them unique due to a transitive dependency on the runtime package.

In practice, however, only a small fraction of possible build configurations are actually relevant to the user. For example, a project may only be interested in actively supporting Linux and Windows on amd64, never use any of the standard library's tags, and only differentiate their build based on whether it is a debug build or not. This reduces thousands of build configurations down to four.

Since many tools would benefit from knowing the list of relevant build configurations, and because it cannot be determined automatically, it is desirable to be able to explicitly list relevant build configurations in a format that can be shared between different tools.

Proposal

We propose a file format, as well as best practices for using files in this format.

File format

The format is line-based, with each non-empty line describing a build configuration. A build configuration consists of a name, a (possibly empty) set of environment variable assignments, followed by a (possibly empty) set of command-line arguments.

Names are separated from environment variables and command-line arguments by a colon followed by a space. Names can consist of Unicode letters, Unicode numbers, dashes (-) and underscores (_). Names must begin with a Unicode letter or Unicode number.

Quoted strings may be used for elements containing whitespace. The specific format for quoted strings will match that of GOFLAGS, which is currently TBD (see #26849.) Names must not use quoted strings.

Syntactically valid examples include

windows-release: GOOS=windows GOARCH=amd64 -tags=debug,feature1 -gcflags=-N
b1: GOOS=windows GOARCH=amd64
debug-feature: -tags=debug,feature1
debug: -tags=debug

A line is split into environment variables and command-line arguments at the first element that is not a valid environment variable assignment. Usually, this would be an element that begins with a dash, or one that does not contain an equal sign.

The process environment described in a build configuration is merged with the existing environment, with the existing environment taking precedence. Command-line arguments will be passed to the build system verbatim, but tools are free to add additional arguments, and it is not specified whether tools pass their own arguments before or after the arguments specified in a build configuration.

The format itself puts no restrictions on allowed environment variables or command-line arguments. However, it is strongly advised not to modify the workspace itself. That is, variables such as GOPATH or GO111MODULE should not be modified. It is assumed that build configurations are executed in the context of an already configured workspace. Furthermore, command-line arguments should only be used for passing flags and their values and not, for example, to specify additional packages.

Tools

Different tools have different requirements and may make use of files in this format in different ways, but they should keep the following points in mind.

Tools should allow specifying a file, but they may look for a default file name.

There are various reasons why a project may use more than one build configuration file. For example, it may want to build binary releases for only a small set of first class platforms, while still running static analysis for more platforms, to future-proof their code. Or, a parameter that meaningfully differentiates binary builds does not contribute anything to static analysis: compiling with and without -gcflags=-N will produce meaningfully different binaries, but statically analysing both versions would be a waste of time. Or it may have different lists of platforms to execute tests on for CI and local development. Many other reasons exist.

Therefore, tools should allow selection of the file to use.

It is, however, desirable to agree on a default file name to look for, so that every tool needn't be configured manually, especially for projects that can make do with a single file, and so that tools can use build configuration files by default. The default file is located at the top of the project, for example the top of a Go module. For build systems that do not have a notion of projects, such as Go in GOPATH mode, we don't define a default location at this moment.

Most tools should deduplicate build configurations to avoid unnecessary work

For most tools, it makes no sense to execute duplicate configurations. However, duplicate configurations may occur from concatenating files, or from on-the-fly generators that do not deduplicate configurations themselves. Therefore, tools should only execute unique configurations.

Tools should allow using the current build configuration

While tools may use existing build configuration files by default, they should also allow executing the active build configuration as specified by the user's current environment. In its simplest form this would be by ignoring build configuration files and operating as tools did before implementing this proposal. It may also take the form of manually or automatically appending the current configuration to the list of configurations to execute. For example, when executing staticcheck, the user would assume that their active configuration will be used, regardless of other configurations that may be used as well.

Tools may allow using specific build configurations

Depending on the tool, it may be useful to allow selection of individual build configurations, for example by their name.

@dominikh
Copy link
Member Author

dominikh commented May 11, 2020

Rationale

The line-based file format

The proposed format is the simplest imaginable format for describing a list of build configurations: it contains one line per configuration, with the configurations explicitly spelled out. Notably absent are any form of scripting, conditionals, or maths. For example, there is no automatic way of expressing all builds GOOS={windows, linux} × GOARCH={amd64} × -tags={debug,!debug}.

This simplicity provides several benefits:

  • The format is trivial to parse, requiring only word-splitting and awareness of quoting. Similarly, it is trivial to produce, by humans and machines alike.
  • Supporting the format does not require support for something like TOML or YAML.
  • By looking at a file, it is immediately obvious how many distinct build configurations exist: the number of non-empty lines. This is important to avoid accidental combinatorial explosion.

Most users will be content typing these files by hand. Projects with many dozens of similar build configurations, however, may opt to generate them instead, which is easy via go generate.

The line-based nature of the format makes it easy to manipulate with standard UNIX tools. Most notably, multiple files can simply be concatenated. They can also be sorted, grepped, and so on. This suggests the possibility of preprocessors. For example, a simple script could process a CI log file and filter a list of builds down to those that have failed.

Build names

We include mandatory build configuration names to aid the implementation of good UIs. An editor may display these names instead of the actual configuration, and command-line tools may support executing build configurations specified by name.

Alternatives for specifying names

We explored two other ways of specifying names:

# the-build-name
GOOS=windows -tags=...

and

GOOS=windows -tags=... # the-build-name

The first way loses the nice attribute that each build is described by a single line, and introduces ambiguities such as

# name1
# name2
GOOS=windows -tags=...

or the issue that a file may end with a name, which affects how the concatenation of two files is interpreted.

Even without these issues, users may confuse this syntax with general comments and attempt writing something like this:

# the windows builds
GOOS=windows -tags=tag1
GOOS=windows -tags=tag2

# the linux builds
...

The second way lost simply because names were no longer aligned.

Preferring the user's environment

Given that the build configuration specifies environment variables, there are three ways in which they can be applied:

  • Discard the user’s environment entirely, only use the environment defined in the build configuration
  • Merge with the user’s environment, preferring values from the build configuration
  • Merge with the user’s environment, preferring values from the user’s environment

Option 1 is not viable. The user environment contains many important variables that cannot be discarded nor will be defined by the build configuration, such as PATH.

Option 2 and 3 only differ in which value has higher precedence: the one in the file, or the one in the user’s environment. We believe that option 3 is overall the better option. It matches the common understanding that the environment is the most specific to a single invocation of a program, more specific than a configuration file. It also allows users to use a build configuration but change details of it, such as using the

windows-debug: GOOS=windows GOARCH=amd64 -gcflags=”-N -l” -tags=debug

build configuration, but changing GOOS to linux, without having to modify the file itself.

Both options 2 and 3 mean that the build configuration is not pure, since it is affected by the user’s environment. This is not a problem. This is already the case for all invocations of Go tools, and well-designed CI environments already account for important variables. Additionally, most environment variables that are worth setting in a build configuration are not normally defined in the user’s environment, unless the user explicitly wishes to override a default.

No restrictions on variables and command-line arguments

We do not restrict the format to only specifying environment variables and tags; instead, all command-line arguments are permitted. This makes the format useful for more tools than just static analysis. For example, a tool that builds binary distributions of projects might benefit from flags such as -gcflags or -ldflags. With this generalization, specifying tags just becomes another argument.

We do not attempt to implement a whitelist of environment variables, as different build systems use different environment variables. Even the list of environment variables that affect Go are so numerous that it would be easy to miss some of them, such as CC or AR.

We do not restrict command-line arguments to valid flags because, again, we do not know what the build system considers valid flags, nor what syntax it uses for passing flags. We count on users not to abuse this mechanism. For example, one concern is that someone might use the all argument, causing Go to process all packages, not just the ones specified on the command line. The solution to this is simple: don’t do it.

Open issues

The primary open issue is finding a name for the default build configuration file. Lacking a concrete suggestion at this point in time, we impose the following requirements for deciding a name:

  • The name should not begin with a dot
  • The file should be placed at the top of the project

Additional desirable attributes that have been asked for:

  • We may want the file name to begin with go, so that it sorts together with go.mod and go.sum
  • The file should likely use the .txt file extension. The format does not have enough syntax to warrant its own file extension. Not using any extension might complicate usability on operating systems that rely heavily on file extensions.

@dominikh dominikh changed the title placeholder proposal: introduce a build configurations file May 11, 2020
@gopherbot gopherbot added this to the Proposal milestone May 11, 2020
@dominikh dominikh added the Tools This label describes issues relating to any tools in the x/tools repository. label May 11, 2020
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) May 11, 2020
@Merovius
Copy link
Contributor

I don't really like the idea of having a file that causes tools to run with arbitrary environments and I don't really understand why it's needed here. ISTM that by putting flags into the build configuration, we already make that configuration specific to a certain build system - bazel or others don't really have a -gcflags flag, do they? And if the file was specific to the go tool, then ISTM a pretty easy thing to do for the environment whitelist would be to restrict to that mentioned by running go env. Though TBH, I don't really want even CC to be set from a local config. I could well imagine that it would be possible to have a repository contain a malicious config that sets CC or so to a binary shipped in the repo and have gopls execute this when I open a file in vim.

The other comment I have would be: Why not Make? I'm not a huge fan of needing Makefiles to build code, but ISTM that all this format needs to be a Makefile would be some linebreaks and tabs and a "go build" in the right spot. So, if we have this file anyway, why not converge on something existing?

@dominikh
Copy link
Member Author

ISTM that by putting flags into the build configuration, we already make that configuration specific to a certain build system - bazel or others don't really have a -gcflags flag, do they?

A given file would be specific to a build system, yes. Projects don't usually use multiple build systems. But the file format would be generic enough that any project could use it, regardless of chosen build system.

Why not Make?

Because executing a program multiple times isn't always a viable solution. For example, staticcheck wants to know the different combinations of GOOS, GOARCH and tags, in a single invocation. And gopls ideally doesn't want to depend on make and a Makefile following a certain structure.

This proposal is much more about tools being able to discover build configurations than it is about spawning executables.

I could well imagine that it would be possible to have a repository contain a malicious config that sets CC or so to a binary shipped in the repo and have gopls execute this when I open a file in vim.

I'm ashamed to say that I did not consider the security implications of this. We have similar problems with changing GOPATH or GOBIN or GOPROXY/GOSUMDB and probably many others. I don't have a solution to this.

@Merovius
Copy link
Contributor

A given file would be specific to a build system, yes. Projects don't usually use multiple build systems. But the file format would be generic enough that any project could use it, regardless of chosen build system.

I still don't understand this. You mention an example file:

windows-release: GOOS=windows GOARCH=amd64 -tags=debug,feature1 -gcflags=-N
b1: GOOS=windows GOARCH=amd64
debug-feature: -tags=debug,feature1
debug: -tags=debug

To clarify: Would that file look exactly the same if you'd use bazel, for instance? If so, how would I, as a project maintainer, use this to build my project?

Or would the file in this case look more like

windows-release: :main --platforms=windows
linux-release: :main --platforms=linux

and in that case, how would a tool that has never heard of bazel translate that into useful information?

@Merovius
Copy link
Contributor

I realize that I come off as intensely negative here, so I want to clarify: I'd really like having a mechanism like this. As elegant as I find the tags- and GOARCH/GOOS approach to conditional compilation the go tool has taken, the exact issues you are mentioning and are trying to solve have always bugged me. I'm all for explicitly listing a set of valid build configurations. I just don't see how build configurations can be listed agnostic to the build system used, while staying even remotely declarative.

If, OTOH, we were to restrict this to the go tool itself, this would open up possibilities for a purely declarative format. If, say, you could only specify combinations of GOARCH/GOOS and the build tags to use, this would still fulfill at least 90% of the use-cases I could think of, while being absolutely declarative with no security issues I could think of. And you could then have a detailed look at other flags or environment variables to whitelist them and might even be able to come up with a way to make it forwards- and/or backwards-compatible with future additions to that whitelist.

Of course, that would also mean that this file can't be used if your project can't be built with the go tool. Personally, I find that a small price to pay for far more well-defined semantics; at the end of the day, I don't really see it as the job of the go project to define its interactions with any and all third-party tools out there. And it would still be possible for projects like bazel or the like to provide a way to programmatically generate this new build format from a BUILD.bazel or vice versa (akin to what gazelle does) to take advantage of the benefits.

So, anyway, it's because of how much I'd like a solution to this problem that I'm trying to hammer the suggested solution into something I'd consider more workable :)

@mvdan
Copy link
Member

mvdan commented May 13, 2020

Would that file look exactly the same if you'd use bazel

The file would look different depending on what build system your project uses. This is the same approach that https://pkg.go.dev/golang.org/x/tools/go/packages takes, if you look at its Env and BuildFlags inputs.

The main use case here is tooling, so it makes sense to me that the design should be compatible with go/packages, and with how it can support more build systems than just go list.

how would a tool that has never heard of bazel translate that into useful information?

If you use go/packages as a tool on a project, the right build system should be chosen automatically. Following the same rule, if the project has a default "build configurations file", it should contain environment variables and flags for that default build system.

This is all a bit theoretical at the moment, as go/packages only supports one build system as of today.

I'm trying to hammer the suggested solution into something I'd consider more workable

For what it's worth, we did consider a more constrained and less generic format at first, where we could interpret each "build configuration" statically. However, that didn't make sense to us because it would mean only supporting a single build system. It could make sense for cmd/go itself, but it would not make sense for go/packages or the tooling ecosystem in general.

@mvdan
Copy link
Member

mvdan commented May 13, 2020

Also, I concur with the sentiment that this is a problem really worth solving, but it's also really hard to solve well. I also worry about potentially malicious or costly files being picked up automatically. Personally speaking, I don't care about build systems other than Go's own, but I don't think it's good long-term planning for tooling.

@jayconrod
Copy link
Contributor

A few loosely related thoughts:

  • This reminds me of .bazelrc files. Essentially, line in a .bazelrc file consists of a command (build, test, and so on), an optional configuration name, and a list of arguments. When running a command, Bazel acts as if those flags were passed on the command line. If the configuration has a name, you have to explicitly pass a flag like --config=remote to get the flags for that configuration (allowing you to easily switch configurations). bazelrc files can be project-specific (in the workspace root directory), user specific (in your home directory), or system-wide (in /etc).
  • Along that line of thought, it would be difficult for Bazel to support this directly. Automatically it to a .bazelrc file or something else might work.
  • CI systems would probably not be able to consume this directly either. You'd have to tell CI to run on Linux and Windows, and so on, and you'd have to tell it which configurations to use for each platform.
  • In general, I think this should be restricted to global configuration for a build. If the go command picks up configuration info from other modules, that would be very confusing.
  • I'm anxious that any new configuration file would be the target of a lot of scope creep. I could see people wanting to use this to configure other analysis and code generation tools. We have a lot of similar requests for go.mod features.

@Merovius
Copy link
Contributor

Merovius commented May 13, 2020

@mvdan Ah, so the intended way for tools to consume this file is to basically pass it through as an input for go/packages. That makes total sense and it also makes sense to design the format to work well with the existing API. So my questions are mostly answered and my concerns are probably more relevant to the eventual integration of go/packages with other build systems :)

@networkimprov
Copy link

networkimprov commented May 13, 2020

The background could maybe benefit from a summary of ways this problem is solved in other dev environments requiring a build step, and why each of those couldn't be made to work well with Go, and shouldn't be emulated in a new solution.

@rsc rsc moved this from Incoming to Active in Proposals (old) May 20, 2020
@rsc rsc changed the title proposal: introduce a build configurations file proposal: cmd/go: introduce a build configurations file May 20, 2020
@tv42
Copy link

tv42 commented Jun 16, 2020

We do not attempt to implement a whitelist of environment variables

This scares me. Let's say go build wants to invoke gcc. If you clone my project, can I guess your home directory and assume the repo is in ~/go/src/example.com/myproject, add PATH=/home/jdoe/go/src/example.com/myproject/evil:/bin:/usr/bin and get you to run my attack shell script, named evil/gcc?

@Baldomo
Copy link

Baldomo commented Jul 9, 2020

This is, to me a very interesting topic, although I think a simple text file with env variables would not be enough. I really like Rust's build.rs and Zig's build.zig (another example here). I personally include a build.go file with the build flags set to //+build build and I just run it as a standalone script (example in one of my projects). What do you think of this?

PS: I generally use my build.gos to also package binaries, assets and whatnot but I know the main concern now is the build step

@mvdan
Copy link
Member

mvdan commented Jul 9, 2020

@Baldomo running arbitrary code as part of the build is definitely not something we want. The fact that the current design allows for that kind of thing to happen via flags like -toolexec is a bug, not a feature. In general, the build configuration should be about parameters to go build, not about running a script or doing any extra work one might want.

@Baldomo
Copy link

Baldomo commented Jul 9, 2020

@mvdan I totally understand and I think at least having something like this declarative description file is the right way forward (also it being possibly optional is pretty cool). I hope someone comes up with a way of avoiding malicious program being called but it's not an easy problem to solve. I will be sticking to the build.go for now but I'll be keeping an eye on this proposal and I hope people will find it useful

@dominikh
Copy link
Member Author

I am retracting this proposal. It has an obvious flaw (arbitrary code execution) and hasn't gained any traction. The idea can be revisited in the future, in a less wrong way.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Proposal Tools This label describes issues relating to any tools in the x/tools repository.
Projects
No open projects
Development

No branches or pull requests

8 participants