Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

path/filepath: Glob should support ** for zero or more directories #11862

Open
ascarter opened this issue Jul 24, 2015 · 45 comments
Open

path/filepath: Glob should support ** for zero or more directories #11862

ascarter opened this issue Jul 24, 2015 · 45 comments
Labels
Documentation FeatureRequest NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@ascarter
Copy link

Go version 1.4.2
Mac OS X 10.10

Example:

package main

import "fmt"
import "path/filepath"
import "os"

func main() {
    files, err := filepath.Glob("/usr/local/go/src/**/*.go")
    if err != nil {
            fmt.Print(err)
            os.Exit(1)
    }
    fmt.Printf("files: %d\n", len(files))
    for _, f := range files {
        fmt.Println(f)
    }
}

Expected:

% ls /usr/local/go/src/**/*.go | wc -l
    1633

Actual:

files: 732

It seems that ** is equivalent to *. The extended ** pattern is common in shells and is supported in Rust and Java for example.

@ascarter
Copy link
Author

For reference, here is Rust's implementation. I think these rules would be a good description of how Go's filepath.Glob should work:

https://github.com/rust-lang/glob/blob/master/src/lib.rs#L369

@ianlancetaylor ianlancetaylor changed the title filepath.Glob should support ** for zero or more directories filepath: Glob should support ** for zero or more directories Jul 25, 2015
@ianlancetaylor
Copy link
Contributor

For reference, this is supported by bash when invoked as "bash -O globstar" or after running "shopt -s globstar".

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Jul 25, 2015
@mikioh mikioh changed the title filepath: Glob should support ** for zero or more directories path/filepath: Glob should support ** for zero or more directories Jul 25, 2015
ascarter pushed a commit to ascarter/gotags that referenced this issue Aug 5, 2015
Added --exclude option that mimics ctag's exclude flag. The pattern is expected to be a shell glob pattern (which is what Go's filepath.Match method expects). It can be set multiple times to set several filters. Once the file list is built, the patterns will be applied to each file to filter out any names that match the exclude patterns before processing.

Note: Glob pattern '**' does not work recursively. See open issue golang/go#11862
@nodirt
Copy link
Contributor

nodirt commented Oct 22, 2015

OK to implement? @adg

@nodirt
Copy link
Contributor

nodirt commented Oct 22, 2015

@bradfitz for opinion

@ianlancetaylor
Copy link
Contributor

I think this is OK to implement.

@adg
Copy link
Contributor

adg commented Oct 23, 2015

Yep, I agree.

@gopherbot
Copy link

CL https://golang.org/cl/16280 mentions this issue.

@dotcomputercraft
Copy link

all, I wanted to check in to see if this issue (#11862) was fixed. Please let me know status of this ticket. Is there a work around to this problem? Talk to you soon. Thanks in advance.

@nodirt
Copy link
Contributor

nodirt commented Dec 13, 2015

Status: currently I'm not working on this. Feel free to unassign.

@rsc
Copy link
Contributor

rsc commented Jan 4, 2016

I certainly see the utility here but adding ** has various knock-on effects that may not be obvious. For example, * matches symlinks to other directories. ** cannot, at least not without extra work to avoid file system cycles. Also it's not obvious this should be considered a backwards compatible change, both for Glob and for Match. And you'd probably want to do path.Match as well. There's a lot to consider here, even though it looks trivial.

@slimsag
Copy link

slimsag commented Feb 8, 2016

@rsc closed the most recent CL for this with the comment:

I really don't think we should do any of this. This is much less understandable than the current code. If you need a very fast, very complex filepath.Glob, it is of course possible to write one outside the standard library.

Should this issue remain open?

@rsc
Copy link
Contributor

rsc commented Feb 9, 2016

This issue does remain open. What's confusing is that GitHub has linked just above your comment to issue Shopify/themekit#102, which is closed.

@slimsag
Copy link

slimsag commented Feb 9, 2016

@rsc that is because someone linked to it in a Markdown link / comment: Shopify/themekit#102 (comment)

What can be done here? A better implementation? It's not clear to me.

@extemporalgenome
Copy link
Contributor

I think it's useful, but I wouldn't consider it backwards compatible. ** already works and does something (which is the same something as a single star).

@ascarter
Copy link
Author

ascarter commented Feb 9, 2016

@extemporalgenome At best, I'd argue that ** is undefined and it just happens to do the same thing as *. I don't think this should represent an issue in regards to backwards compatibility.

@ascarter
Copy link
Author

ascarter commented Feb 9, 2016

@slimsag the shopify issue I think is different. They chose to walk the tree and that solved their case - I don't think that applies to the general problem. I think @rsc was saying that the solution here is non trivial (and it's not just filepath.Glob but also path.Match).

I'd like it to stay on the list. Coming from other languages, I think this is a pretty common feature for glob syntax.

Note that the downside of attempting to write your own implementation is that I think you would wind up re-implementing glob entirely. I'm trying to figure out how to solve this for http://github.com/jstemmer/gotags since it won't be solved in the standard library anytime soon.

@extemporalgenome
Copy link
Contributor

@ascarter * is very well defined as matching any sequence (including an empty sequence) of non-separator characters. Thus ** is equally well defined as matching zero or more non-separator characters, followed by a match of zero characters on the tail of the same filename.

The proposed special-casing of ** will certainly not match less than it does now, but may match more, and the question is whether existing programs could become broken through what constitutes a change in behavior.

@mattn
Copy link
Member

mattn commented Feb 9, 2016

I'm thinking this feature should be provided from third-party products not core.
for example, you can use https://github.com/mattn/go-zglob

sfllaw referenced this issue in golang/pkgsite Oct 28, 2021
Go's glob patterns do not treat "**" as meaning "all directories
below, recursively." Instead, "**" means the same as "*". So

  shared/**/*.tmpl

is legal but retrieves only files that are one level down
from "shared".

Currently this isn't a problem, but it would be if a subdirectory of
"shared" had its own subdirectory with *.tmpl files.

Remove the "**" to make this clear.

Change-Id: Ia57212eb55347705af49e2e42cbe9948e14370be
Reviewed-on: https://go-review.googlesource.com/c/pkgsite/+/359034
Trust: Jonathan Amsterdam <jba@google.com>
Run-TryBot: Jonathan Amsterdam <jba@google.com>
TryBot-Result: kokoro <noreply+kokoro@google.com>
Reviewed-by: Julie Qiu <julie@golang.org>
Reviewed-by: Jamal Carvalho <jamal@golang.org>
@jba
Copy link
Contributor

jba commented Oct 28, 2021

In general I agree that a function's documentation should say what it does and not what it doesn't do. But using ** is a pitfall, and we do document pitfalls, as in this lengthly caveat from fs.Sub:

Note that Sub(os.DirFS("/"), "prefix") is equivalent to os.DirFS("/prefix") and that neither of them guarantees to avoid operating system accesses outside "/prefix", because the implementation of os.DirFS does not check for symbolic links inside "/prefix" that point to other directories. That is, os.DirFS is not a general substitute for a chroot-style security mechanism, and Sub does not change that fact.

We could word the doc so that it describes a consequence of the function's behavior, instead of describing non-behavior. For example, "Note that multiple consecutive *s behave like a single *".

@soypat
Copy link

soypat commented Jul 17, 2022

Could this be needed to match only main.go, hey.go files in the following case?

- unwanted.go
- dir1
| - main.go
| - dir2
   | - hey.go
  • Glob("dir1/*.go") only matches main.go
  • Glob("dir1/*/*.go") only matches hey.go
  • Glob("*.go") matches them all, including the unwanted unwanted.go

@mih-kopylov
Copy link

It's a shame such an famous Glob feature not implemented in Go for 7 years.

@adg @rsc is there a chance to prioritize it please?

@adg
Copy link
Contributor

adg commented Aug 7, 2022

@mih-kopylov There are some concerns that Russ outlined in an earlier comment and also there are compatibility promises that we must heed. So if someone wants to address this issue they need should probably write it up as a proposal that addresses all these potential issues.

In the meantime, if there is a (seemingly, from my brief research) mature doublestar package that provides a glob that handles **. It doesn't have any dependencies and has extensive unit tests. Give it a shot.

devfbe added a commit to devfbe/gipgee that referenced this issue Aug 25, 2022
for the feature branch delta detection, we need a glob like
syntax in the config file which can be matched against the
current diff between a feature and the default branch in gitlab.

Because golangs internal file path match is broken
(golang/go#11862) is seems to be necessary
to use an external lib. go-zglob does this very well, so use it.
@guillermo
Copy link

I needed something more straightforward than the solution from @adg without adding an external lib. Maybe this is enough for most people:

https://github.com/guillermo/doubleglob/blob/main/double_glob.go

var replaces = regexp.MustCompile(`(\.)|(\*\*\/)|(\*)|([^\/\*]+)|(\/)`)

func toRegexp(pattern string) string {
	pat := replaces.ReplaceAllStringFunc(pattern, func(s string) string {
		switch s {
		case "/":
			return "\\/"
		case ".":
			return "\\."
		case "**/":
			return ".*"
		case "*":
			return "[^/]*"
		default:
			return s
		}
	})
	return "^" + pat + "$"
}

// Glob returns a list of files matching the pattern.
// The pattern can include **/ to match any number of directories.
func Glob(inputFS fs.FS, pattern string) ([]string, error) {
	files := []string{}

	regexpPat := regexp.MustCompile(toRegexp(pattern))

	err := fs.WalkDir(inputFS, ".", func(path string, d fs.DirEntry, err error) error {
		if d.IsDir() || err != nil {
			return nil
		}
		if regexpPat.MatchString(path) {
			files = append(files, path)
		}
		return nil
	})

	return files, err
}

@claytonchew
Copy link

Was wondering why ** doesn't work until I came across this issue.

To see that such famous ** glob pattern wasn't considered since 2015. I hope it gets prioritised.

The use case for this is plain obvious and should not introduce backward compatibility issue.

Instead of having to specify glob patterns like this to match all directories and subdirectories within a directory like this:

dir/*
dir/*/*
dir/*/*/*
dir/*/*/*/*

☝️ another limitation: only allows matches up to 4 nested levels

It could have done away with this instead:

dir/**/*

@BenStigsen
Copy link

This issue has been open since 2015, but it is not clear whether adding support for ** is planned or considered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation FeatureRequest NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests