Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: add global ignore mechanism for Go tooling ecosystem #42965

Open
burdiyan opened this issue Dec 3, 2020 · 126 comments
Open

cmd/go: add global ignore mechanism for Go tooling ecosystem #42965

burdiyan opened this issue Dec 3, 2020 · 126 comments

Comments

@burdiyan
Copy link

burdiyan commented Dec 3, 2020

UPDATE: The summary of the accepted proposal at: #42965 (comment).


Problem

For non-trivial (often multi-language) projects it's often desirable to make all the Go tools (including gopls) ignore certain directories.

Some of the examples could be the huge amount of files within node_modules, or bazel-* directories generated by Bazel. This causes many operations with ./... wildcards taking longer than desired. Also gopls often eats up a lot of CPU in VS Code depending on what you are doing.

Prior Art

This is something that has been discussed in several issues before, but seems like people couldn't get agree on a solution.

Some tools started to have their own solutions which causes fragmentation and is cumbersome.

For example goimports have its own machinery for this - .goimportsignore file in this case. But it's not working with Go Modules.

Other tools have a hard-coded list of directories to ignore, like .git and so on.

It seems like having a global solution that all the Go ecosystem could understand would make sense to solve this kind of problem.

Recently a workaround for this was to place a dummy go.mod file in the directories you wanted to ignore. But this is not easily portable between users of the project, because often these directories can be re-created on the user's machine and aren't even checked-in. Asking people to sprinkle some go.mod files all around every time is cumbersome.

@robpike was against of creating more dot files (#30058 (comment)).

Proposed Solution

Here're some of the options that this could be implemented with.

  1. Use go.mod file for specifying directories to ignore. (Rejected because go.mod is not a catch-all config file like package.json in NodeJS).
  2. Use a separate .goignore file. (This would go against Rob's desire to avoid new dot files, and although being in the spirit with other tools: .dockerignore, .gitignore, .bazelignore, etc. is concerning. The concerns are discussed in this thread).
  3. Use the go.work file that's coming in the next Go 1.18 release.
  4. Have a separate go.ignore file that would specify directories to ignore.

/cc @tj @stamblerre

@gopherbot gopherbot added this to the Proposal milestone Dec 3, 2020
@mvdan
Copy link
Member

mvdan commented Dec 3, 2020

But this is not easily portable between users of the project, because often these directories can be re-created on the user's machine and aren't even checked-in. Asking people to sprinkle some go.mod files all around every time is cumbersome.

I'm not sure that I understand this argument. Presumably, it's a program that creates and fills those directories, since they have to contain a significant amount of files for you to really want to ignore them in Go. If they were just a handful of files created manually by a human, it would be a negligible cost for Go to walk those and realise there are no Go packages there.

So, given that it is a program or script creating those large directories, why not add a touch ${dir}/go.mod at the end? That seems easy enough at a high level, at least.

I'm proposing to add this configuration into the existing go.mod file.

This is unlikely to happen, see #42343 (comment).

Another solution could be a global .goignore file. This would go against Rob's desire to avoid new dot files, but would be in the spirit with other tools like that have files like .dockerignore, .gitignore, .bazelignore, etc.

I have to admit that I dislike this option. It's bad enough that all these other tools use separate ignore files.

@burdiyan
Copy link
Author

burdiyan commented Dec 3, 2020

I'm not sure that I understand this argument. Presumably, it's a program that creates and fills those directories, since they have to contain a significant amount of files for you to really want to ignore them in Go. If they were just a handful of files created manually by a human, it would be a negligible cost for Go to walk those and realise there are no Go packages there.

So, given that it is a program or script creating those large directories, why not add a touch ${dir}/go.mod at the end? That seems easy enough at a high level, at least.

@mvdan It is indeed a program that creates these directories. But it's a program that you don't control normally. Wrapping well-known tools like nom install with your own script only to put an empty go.mod in there doesn't seem right.

On the other hand by placing arbitrary files in these directories you're invading the territory of other tools. What if that program checks the integrity of the directory and would break seeing a random unknown file? It's not the case with node_modules but breaking into structures created by other programs, only to work around your own problem doesn't seem right either.

I understand the objection about go.mod. I was not aware about @rsc's statement.

I have to admit that I dislike this option. It's bad enough that all these other tools use separate ignore files.

Could you elaborate on why do you think it's bad? It may not be the most elegant solution, but it's common practice, well-understood and somewhat expected.

If we already have .goimportsignore, why not standardizing it into something that can be handled and understood by all the ecosystem of Go tools?

@bcmills
Copy link
Contributor

bcmills commented Dec 3, 2020

It is indeed a program that creates these directories. But it's a program that you don't control normally. Wrapping well-known tools like nom install with your own script only to put an empty go.mod in there doesn't seem right.

Wouldn't you need to wrap the tool to add a .goignore file anyway? (Given that you need to inject a file, why does it matter whether it is named .goignore or go.mod?)

@burdiyan
Copy link
Author

burdiyan commented Dec 3, 2020

@bcmills My proposal is to add a file in the root of the project, not in the directory being ignored. So it would be checked in. Like .gitignore in Git. Basically the idea is to list the paths to ignore in that file, and check it in.

@bcmills
Copy link
Contributor

bcmills commented Dec 3, 2020

...ok? But why would you not also check in the injected go.mod files?

@mvdan
Copy link
Member

mvdan commented Dec 3, 2020

Could you elaborate on why do you think it's bad? It may not be the most elegant solution, but it's common practice, well-understood and somewhat expected.

The common practice is to litter repositories with dot files. That does not mean we should do the same, making the problem worse :) Go already has multiple mechanisms to ignore entire directories (. or _ prefixes, and dropping empty go.mod files), so there needs to be a really good reason to add another method.

@burdiyan
Copy link
Author

burdiyan commented Dec 3, 2020

...ok? But why would you not also check in the injected go.mod files?

@bcmills because often directories to ignore aren't checked in.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Dec 3, 2020
@burdiyan
Copy link
Author

burdiyan commented Dec 4, 2020

The common practice is to litter repositories with dot files. That does not mean we should do the same, making the problem worse :) Go already has multiple mechanisms to ignore entire directories (. or _ prefixes, and dropping empty go.mod files), so there needs to be a really good reason to add another method.

@mvdan IMHO, having a dot file in one place, that is trackable, is less of an evil, than sprinkling empty go.mod files all over the place, ad-hoc, and breaking into opinions of other tools.

Thinking about pros and cons of implementing such a feature, I'm struggling to see any cons (probably due to my ignorance), besides having to spend the time to implement it. I'd appreciate if anyone could bring some light into this to understand the implications.

@psigen
Copy link

psigen commented Mar 19, 2021

I have a use case where I have a multi-language repo where not all of the developers are touching the go components.

I don't think it is reasonable to ask my docker, bazel, and nodejs developers to all wrap their normal tooling in scripts that touch extra files in their build directories, nor ask them to try to rename their standard build directories to match existing go conventions, some of which conflict with the other tool conventions.

It seems like there should be a way to specify how to ignore certain files or directories that does not require modifying the content of those files or directories, because the ignored content is not being managed by go and may have its own conflicting conventions and lifecycle.

@rsc
Copy link
Contributor

rsc commented Aug 18, 2021

@psigen Go wants a directory tree that belongs to it. In a multi-language repo, why not create a top-level go/ directory?

@psigen
Copy link

psigen commented Aug 21, 2021

@rsc: because my projects are not organized that way. I have services like:

service1/
    backend/  # golang
    debug-cli/
    proto/
service2/
    backend/  #python
    debug-ui/
    proto/
webapp/
    frontend/ 

I know what you are asking, which is why not reorganize to:

proto/
    service1/
    service2/
golang/
    service1-svc/
    service1-cli/
python/
    service2/
nodejs/
    webapp/
    service1/debug-ui/

And the answer, (besides "that's a lot of work right now") is that it is not how our ownership is structured.

It is not convenient to have duplication in the CODEOWNERS files, .gitignore patterns that look like **/service1/foo*, cross-directory-tree docs links, etc. all in service of golang. It makes PR reviews harder when related changes happen all over the directory tree. It forces docker build contexts to all need to be at the root of the entire source tree, and makes live-rebuilds in tools like Tilt and Skaffold much more difficult to author.

I could go on, but I'm really just reiterating the core premise of this proposal:

For non-trivial (often multi-language) projects it's often desirable to make all the Go tools (including gopls) ignore certain directories.

@4k1k0
Copy link

4k1k0 commented Sep 3, 2021

I use the serverless framework to deploy lambdas on AWS. Some plugins that I have to use contains go files. So when I run go mod download or go mod tidy I add dependencies to my go.mod file that are required by the go files inside the node_modules directory. It would be great to define a way to exclude directories from go modules.

node_modules
  serverless
    lib
      plugins
        create
          templates
            aws-go
              main.go
cmd
  main.go
pkg
  dirA
  dirB
  dirC

@cespare
Copy link
Contributor

cespare commented Sep 3, 2021

@rsc

@psigen Go wants a directory tree that belongs to it. In a multi-language repo, why not create a top-level go/ directory?

Reading this statement does not make me happy. It goes against the entire premise of the code organization at my company.

We use a monorepo with projects in several languages. Some Go programs live inside projects written in other languages. Sometimes we rewrite a project from one language to another. Some projects use a combination of Go and other languages (imagine a website written using both Go and JavaScript extensively). We already have an organizational hierarchy within the monorepo that is based around purpose and ownership, not language.

This all worked fine with $GOPATH: the repo was inside its own single $GOPATH segment and a top-level vendor directory contained a single version of all shared dependencies.

Moving to modules has raised some challenges, but mostly it has worked. The whole repo is one module so we use a fixed, shared set of dependencies. One issue we faced is that the go mod tidy and other commands printed out a bunch of irrelevant spam (#35941) -- we sent a fix for that. Other issues we see involve various kinds of slowness in gopls (#46438 describes one particular issue I have). But mostly it works fine, and ISTM that the remaining issues are surmountable if the folks working on the tools care about making them work well in the presence of mixed-language source trees (and until now it seemed to me that they mostly do!).

But when I read "Go wants a directory tree that belongs to it", it sounds like you don't think this use case matters as far as the standard Go tools are concerned. I don't know how we could possibly adapt our repo to a "Go code all belongs in its own tree" model. Probably we wouldn't -- I imagine that if push came to shove, we'd look into alternative build tools.

@dolmen
Copy link
Contributor

dolmen commented Dec 16, 2021

In #50225 I'm bring in concerns about the resources (network, disk space) wasted on every developers machine because the module zips contain many irrelevant files.

Check this list of files that are in your Go modules cache:

find $(go env GOMODCACHE)/*.* -type f ! -name '*.go' ! -name 'go.mod' ! -name 'go.sum' ! -name 'list.lock' ! -name 'v*.mod' ! -name 'v*.info' ! -name 'v*.zip' ! -name 'v*.ziphash' ! -name 'v*.lock' ! -name 'LICENSE*' ! -name 'README*' -print

I have more than 200,000 useless files on my machine.

This also impacts CI builds (download time/space of new dependencies, requires to enable strong module caching to reduce the problem).

@burdiyan
Copy link
Author

While similar, I think this proposal is a bit different from yours @dolmen in a sense that here I mostly care about ignoring directories, not specific files, and definitely not for specific packages like x/mod/zip. Still, it could be the same solution for solving both problems.

@burdiyan
Copy link
Author

burdiyan commented Dec 17, 2021

BTW, go.work is coming in the next Go release. Maybe this feature could be implemented in there eventually? Or maybe a separate go.ignore file? Looks like a better approach than .goignore for sure!

I updated the initial comment.

@dolmen
Copy link
Contributor

dolmen commented Dec 17, 2021

BTW, go.work is coming in the next Go release. Maybe this feature could be implemented in there eventually?

I consider go.work as a development tool for your local development environment. Which means it is a file I would not commit in the repo.

Instead, ignore patterns must be available for tools that download the code from a VCS (for publishing on a proxy, or for filling the module cache, see #50225), so the ignore patterns must be always available in the repository.

So go.work would not be a good place for ignore patterns.

@burdiyan
Copy link
Author

@dolmen While I suspect that go.work is meant to be checked-in (I'm not sure about it), I think you're right that the ignore stuff should probably be in a separate place, because not all project would want to have go.work. Then maybe go.ignore is the remaining option that would make some people happy, and the rest (those who don't like the idea of dot files) at least not angry about it :)

@burdiyan burdiyan changed the title proposal: global ignore mechanism for Go tool ecosystem proposal: global ignore mechanism for Go tooling ecosystem Dec 18, 2021
@antichris
Copy link

@burdiyan, go.work is indeed not meant to be checked in:

These go.work files should not be checked into the repositories so that they don‘t override the workspaces users explicitly define. Checking in go.work files could also lead to CI/CD systems not testing the actual set of version requirements on a module and that version requirements among the repository’s modules are properly incremented to use changes in the modules. And of course, if a repository contains only a single module, or unrelated modules, there's not much utility to adding a go.work file because each user may have a different directory structure on their computer outside of that repository.
Proposal: Multi-Module Workspaces in cmd/go §Multiple modules in the same repository that depend on each other

@jaronsummers
Copy link

It is kind of frustrating that the responses from Go contributors are uniformly "Everyone else on earth is wrong, they should change to accommodate our design choices."

No matter how inelegant another dotfile is, it solves the problem in a universal way that will work for all repository structures and build tools. None of the proposed alternatives even attempt to do the same.

I currently just don't run gopls and try to minimize how often I have to write Go, which is not a "solution" that is available to everyone.

@hyangah
Copy link
Contributor

hyangah commented Feb 18, 2022

Some of us discussed the problem this proposal aims to address - i.e., allow to exclude certain directories when running go with patterns including ....

We agree this is a problem for some tools (e.g. gopls, and others that accept go's import path patterns). Many tools developed their own ways of configure exclusion rules (e.g. gopls has directoryFilter) but this is still not sufficient if they depend on go invocation with ... pattern underneath.

@bcmills had a great idea during the discussion - go already has the overlay mechanism (see the summary of the feature by @matloob and also the -overlay flag description in go command help page). That can be used as the directory exclusion mechanism. For exclusion, place an empty value; for inclusion, set identity mapping. gopls can implement this by applying already existing directoryFilter, and I guess other tools can do the same. (x/tools/go/packages supports overlay)

The overlay config isn't as flexible as glob patterns many dotfiles accept, but I think it still provides the sufficient knob
tools can play with. What do you think?


#50225 (for mechanism to fine tune the scope of a module) was mentioned during the discussion, but I don't think that is the goal of this proposal. For example, I think it's possible one wants to speed up gopls by excluding a directory but want to still keep it in the distributed module (directories containing asset files, etc) or the directory doesn't affect module distribution at all (ephemeral directories such as node_modules or bazel directories created during build).

@jaronsummers I think the Go team is trying to understand the problem better, not dismiss or ignore problems users are facing in the real world.

@amery
Copy link

amery commented May 17, 2023

Bit of a tangent, but: ignore .git would help a little bit (by at least excluding the git metadata); if there were some way to say "ignore directories containing a .git" that would fix this particular case. But I don't know how we'd be able to express that.

@cespare ignore .git/..?

@rsc
Copy link
Contributor

rsc commented May 18, 2023

Based on the discussion above and talking to @hyangah and @findleyr, I suggest we do:

  • ignore x means ignore the file or directory tree x (and its subtree) anywhere it appears
  • ignore ./x means ignore the specific file or directory ./x relative to the module root
  • there is no 'unignore'
  • ignoring a file or directory means treating it the same way we treat directories whose name begins with _ or . today

The last part is different from what I had been arguing above, but correcting what I said earlier, it does handle node_modules because those don't get checked into the git repo. That's the general rule too: if you don't want something in the module, don't check it into the repo (or don't tag a commit with those files). Then we have just one kind of ignore.

One question is whether the ignores should accept wildcards like in path.Match, but I am inclined to say no. It gets too difficult to understand which wildcards are allowed and not. People always want ** for example, but path.Match doesn't implement it (turns out to mean different things to different people!).

@gopherbot
Copy link

Change https://go.dev/cl/497795 mentions this issue: cmd/go: refuse to download zip files for too-new modules

@rsc
Copy link
Contributor

rsc commented May 24, 2023

Have all concerns about this proposal been addressed?

@silverwind
Copy link

The last part is different from what I had been arguing above, but correcting what I said earlier, it does handle node_modules because those don't get checked into the git repo

Probably not an issue, but I do want to highlight that it's not completely unheard of to check in node_modules into git. They could in rare cases be vendored like that.

gopherbot pushed a commit that referenced this issue May 25, 2023
In general an older version of Go does not know how to construct
a module written against a newer version of Go: the details may
change over time, such as for issues like #42965 (an ignore mechanism).

For #57001.

Change-Id: Id43fcfb71497375ad2eb5dfd292bad0adca0652e
Reviewed-on: https://go-review.googlesource.com/c/go/+/497795
Run-TryBot: Russ Cox <rsc@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@hyangah
Copy link
Contributor

hyangah commented May 31, 2023

@rsc @bcmills @matloob Is go.work also going to understand this global ignore mechanism?

@rsc
Copy link
Contributor

rsc commented May 31, 2023

Sure, I think go.work could probably be included here. Let's hear from @bcmills and @matloob.

@bcmills
Copy link
Contributor

bcmills commented May 31, 2023

  • ignoring a file or directory means treating it the same way we treat directories whose name begins with _ or . today
    … That's the general rule too: if you don't want something in the module, don't check it into the repo (or don't tag a commit with those files).

Today we don't filter directories beginning with _ or . out of module zip files. I guess that implies that we would have to load and interpret ignore directives for all dependencies when we're matching wildcards in dependencies. That would add some complexity but might be ok.

That said, I am concerned that that seems not to address the use-case of pruning non-Go files (such as checked-in source subtrees for other languages) out of the module zip archives. I worry that this only addresses something like half of the use-cases that have been brought up on this issue. 😅

@antichris
Copy link

Today we don't filter directories beginning with _ or . out of module zip files.

Which is perfectly fine IMO.

I have seen go installable modules that keep non-go resources in underscore-prefixed directories. I also took the same approach in a server that, in addition to the API, hosts some static web resources from a _resources directory (as a no Go zone) which also is essential to be included while packaging for full functionality. In a project that I own it may be fine to un-ignore (un-underscore) the directory to ensure it is always packaged, as I can have a high degree of confidence that no unsanctioned .go files are lurking there for fun surprises. But not everyone else is in such a privileged position.

/My two coppers.

@rsc
Copy link
Contributor

rsc commented Jun 20, 2023

If you don't want files in a module, one option is to use a tagging procedure that makes a commit on a temporary branch (or a detached head) and then remove the files and tag that commit. I don't think this will come up often, and we have a way to address it, so I think it's OK to have just one kind of ignore (ignore for Go files, like _ directories, but still include in the module).

@rsc
Copy link
Contributor

rsc commented Jun 21, 2023

To summarize the discussion and try to move things along, there are two different ways files can be ignored: let's call them build-ignore (like _ files and directories today), mod-ignore (do not pack into the module), and git-ignore (the .gitignore file, do not commit into git). Note that a few combinations are suspicious:

  • git-ignore YES + mod-ignore NO is infeasible, since if the file is not in git it cannot be packed into the module by the git-based downloader
  • build-ignore NO + mod-ignore YES is questionable, since it means the local module is different code than what users download

With that note, here are the eight possible combinations:

  1. build-ignore NO, mod-ignore NO, git-ignore NO - default behavior
  2. build-ignore NO, mod-ignore NO, git-ignore YES - infeasible
  3. build-ignore NO, mod-ignore YES, git-ignore NO - questionable
  4. build-ignore NO, mod-ignore YES, git-ignore YES - use .gitignore; still questionable, maybe local debugging code
  5. build-ignore YES, mod-ignore NO, git-ignore NO - use go.mod 'ignore'
  6. build-ignore YES, mod-ignore NO, git-ignore YES - infeasible
  7. build-ignore YES, mod-ignore YES, git-ignore NO - not possible
  8. build-ignore YES, mod-ignore YES, git-ignore YES - use go.mod 'ignore' + .gitignore

It seems clear to me that "build-ignore", enabling (4), is an important use case, and calling it "ignore" matches the meaning of "IgnoredGoFiles" in go list output: the files exist but are ignored. It seems like we should move forward with that meaning in this proposal.

It's less clear to me whether "mod-ignore", enabling (2) or (6), is also an important use case. If so, we could potentially add a second kind of statement, perhaps 'omit', to control what is and is not included in the module form. Given that ignore can ignore the files, 'omit' ends up being purely a module size optimization. Are there use cases where this space savings would be important? Thanks.

@rsc
Copy link
Contributor

rsc commented Jun 21, 2023

On the topic of having ignore in go.work, I spoke to @bcmills and @matloob and we decided it seemed odd to add it to go.work: whether files are ignored seems like it should be part of the definition of the module itself. We couldn't come up with a reason that a go.work would want to override parts of one of the modules it contains, and in general go.work is meant to be a "union of modules" not manipulation of the internals of any of the modules.

@dolmen
Copy link
Contributor

dolmen commented Jun 23, 2023

I would be tempted to mod-ignore [._]* by default because 99.9% of those files are useless and their distribution beyond the Git repository to the Go proxy and to users' machines is probably unexpected by the module authors.

$ find $(go env GOMODCACHE) -name '[._]*' | sed 's!.*/!!' | sort -u

However package embed allows to embed those files with the all: syntax. :(

@silverwind
Copy link

silverwind commented Jun 23, 2023

However package embed allows to embed those files with the all: syntax. :(

Rightfully so. I had the issue that go:embed would not include my _.<hash>.js file that vite outputs when it can not derive a better filename. Dotfiles I can somewhat understand so people don't accidentially expose .git, but ignoring files starting with _ by default is just obscure and a footgun.

@rsc
Copy link
Contributor

rsc commented Jun 27, 2023

I would be tempted to mod-ignore [._]* by default because 99.9% of those files are useless and their distribution beyond the Git repository to the Go proxy and to users' machines is probably unexpected by the module authors.

Maybe but maybe not. It depends. And since we have included them to date, we can't really start eliding them now, or else checksums will no longer match. We could change based on Go version, but it doesn't seem worth the churn.

@rsc
Copy link
Contributor

rsc commented Jun 28, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Jul 5, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: cmd/go: add global ignore mechanism for Go tooling ecosystem cmd/go: add global ignore mechanism for Go tooling ecosystem Jul 5, 2023
@rsc rsc modified the milestones: Proposal, Backlog Jul 5, 2023
@jschaf
Copy link

jschaf commented Mar 1, 2024

Attempting to summarize the state of the world to figure out the state of this proposal. My understanding from #42965 (comment) is as follows (please chime in if incorrect):

  1. go.mod will support an ignore directive.
  2. Use ignore foo to ignore a directory at any depth in the module
  3. Use ignore ./foo/bar to ignore a specific directory relative to the go.mod directory.
  4. This proposal is accepted, but work has not yet begun.

So, if you want to ignore directories with goimports or similar tooling, continue using other workarounds for now.

It's probably worth updating the first comment since it's quite hard to figure out the state of this proposal.

@burdiyan
Copy link
Author

burdiyan commented Mar 1, 2024

I updated the initial post to include the link to #42965 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests