Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

path/filepath: fix HasPrefix #18358

Open
dsnet opened this issue Dec 16, 2016 · 26 comments
Open

path/filepath: fix HasPrefix #18358

dsnet opened this issue Dec 16, 2016 · 26 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Dec 16, 2016

The filepath.HasPrefix function has been in a deprecated-like status for the past 4 years (since 2012). The reasons why it was deprecated is described in CL/5712045. The issue with respecting path boundaries is probably the most significant.

Given that it's had a note steering people away for 4 years, there should be nearly no users of the function. Inside Google, there are 3 uses of it and all of them would have wanted the correct implementation. We should actually fix this function and bring it out of deprecated status.

The challenge with properly implementing it is getting the case-sensitivity correct. This is a dependent on the underlying filesystem and not the GOOS.

@dsnet dsnet added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 16, 2016
@dsnet dsnet added this to the Go1.9Maybe milestone Dec 16, 2016
@bradfitz
Copy link
Contributor

I was going to complain that the path/filepath package is purely string processing, but I forgot about Glob, which does use the os package and read from the filesystem. So fixing this properly sounds doable.

@minux
Copy link
Member

minux commented Dec 17, 2016 via email

@cespare
Copy link
Contributor

cespare commented Dec 17, 2016

I agree with @minux, but for the record we did un-deprecate a function before when we fixed up debug.Stack for Go 1.6. See 2d697b2 and #12363.

@minux
Copy link
Member

minux commented Dec 17, 2016 via email

@alexbrainman
Copy link
Member

Whatever we decide to do here, I think filepath.HasPrefix is useful function. We can all complain that it is hard to implement it, but then people will write they own. We have written cmd/go.hasFilePathPrefix ourselves. Why didn't we use filepath.HasPrefix in cmd/go instead?

Alex

@RobbieMcKinstry
Copy link

Repost from a closed bug: (Sorry for the noise, I'm a newbie still!)

Just checking: this function is deprecated, and there is no alternative? Maybe a regex.Match? I assume that's why there's interest in fixing.

@zacps
Copy link

zacps commented May 3, 2017

It's more complicated than regex, which is why it hasn't been solved yet.
golang/dep has a solution to this here but it might not be an acceptable replacedment for the stdlib because it does involve a stat call.

@hirochachacha
Copy link
Contributor

I think checking GOOS is good enough.
It seems that the same decision was done for filepath.Rel.
Or should we deprecate filepath.Rel also?

@zacps
Copy link

zacps commented May 18, 2017

Looks like Rel isn't ideal either because it doesn't take into account case insensitive filesystems. It matters less in that case as the user can simply supply the same case in both arguments.

IMO Rel should be kept how it is and we should think about adopting the solution from golang/dep for HasPrefix.

@dsnet dsnet removed their assignment Jun 21, 2017
@dsnet
Copy link
Member Author

dsnet commented Jun 21, 2017

I've unassigned myself from this. Checking GOOS is not sufficient. Properly implementing filepath.HasPrefix implies having knowledge about the case sensitivity of the underlying filesystem, which is entirely independent of the GOOS. In fact there may be multiple filesystems, mounted at each different segment of the path.

Whatever solution is proposed must handle case sensitivity properly. The approach taken by dep is a hack and assumes that the file exists.

@dsnet dsnet modified the milestones: Go1.10, Go1.9Maybe Jun 21, 2017
@hirochachacha
Copy link
Contributor

hirochachacha commented Jun 21, 2017

Another approach is using statfs, but it has a portability issue.
Or we could polish dep's implementation instead.
Previously, I wrote the code after looking into dep code.

func HasPrefix(p, prefix string) (bool, error) {
	st1, err := os.Lstat(prefix)
	if err != nil {
		return false, err
	}
	if !st1.IsDir() {
		return false, errors.New("prefix isn't dir")
	}
	p = filepath.Clean(p)
	prefix = filepath.Clean(prefix)
	for len(p) > len(prefix) {
		p = filepath.Dir(p)
	}
	if len(p) < len(prefix) {
		return false, nil
	}
	st2, err := os.Lstat(p)
	if err != nil {
		return false, err
	}
	return os.SameFile(st1, st2), nil
}

I didn't test it well, but presumably it is more efficient than the original one.
In any case, it requires IO syscalls, I'm not sure a better way to handle IO errors.
Perhaps, fallback to GOOS approach?

EDIT: removed unused code.

@alexbrainman
Copy link
Member

I'm not sure a better way to handle IO errors.

Yes, handling errors will be a problem with your approach. Real filepath.HasPrefix returns bool. Do you return true or false, if some of your syscalls return ERROR_ACCESS_DENIED on windows?

Alex

@dsnet
Copy link
Member Author

dsnet commented Jun 23, 2017

The moment we have to make syscalls heavily implies that there is a possibility that this function can fail, which further implies that the current signature is not appropriate. We either create a new function that has an error or leave this for Go2.

@hirochachacha
Copy link
Contributor

I suppose creating a new function is fine, but I don't have strong opinions.

@hirochachacha
Copy link
Contributor

Oh, I think above program has a bug. Perhaps, https://play.golang.org/p/u2c-PTj3HX is correct one.

@alexbrainman
Copy link
Member

Perhaps, https://play.golang.org/p/u2c-PTj3HX is correct one.

Why your program's HasPrefix returns 2 parameters?
The filepath.HasPrefix only returns single bool.

Alex

@hirochachacha
Copy link
Contributor

hirochachacha commented Jun 23, 2017

@alexbrainman I'm not sure which option we want to choose
A. no signature change. discard errors.
B. introduce a new function with a new signature.
C. postpone this until Go2, then change the signature.
D. others.

@alexbrainman
Copy link
Member

A. no signature change. discard errors.

I do not see how it is possible to implement this option - see my ERROR_ACCESS_DENIED comment.

B. introduce a new function with a new signature.

Maybe this is acceptable.

C. postpone this until Go2, then change the signature.

Gophers that need function like that, cannot wait for Go2. They implement they own version of this function.

Alex

@hirochachacha
Copy link
Contributor

I do not see how it is possible to implement this option - see my ERROR_ACCESS_DENIED comment.

I agree.

Maybe this is acceptable.

I think so.

Gophers that need function like that, cannot wait for Go2. They implement they own version of this function.

fair.

So we both have the same opinion. :)

@alexbrainman
Copy link
Member

I think so.

What would new version of filepath.HasPrefix look like?
What are the downsides of having filepath.HasPrefix that calls syscalls?
We know about one - users have to deal with errors. Are there others?

Alex

@hirochachacha
Copy link
Contributor

hirochachacha commented Jun 24, 2017

What would new version of filepath.HasPrefix look like?

I suppose func HasPrefix(p, prefix string) (bool, error) is fine as the signature.
But I don't have any nice namings in my mind. I just wanted to share some ideas, so.

What are the downsides of having filepath.HasPrefix that calls syscalls?
We know about one - users have to deal with errors. Are there others?

As @dsnet pointed it out, these code assume existence of files.
Maybe there are some legal usage of non-existent filepaths. Maybe @dsnet knows.

@hirochachacha
Copy link
Contributor

Unrelated to previous comment, I think windows implementation should use private functions - toNorm, normBase.

func HasPrefix(p, prefix string) (bool, error) {
	p, err := toNorm(p, normBase)
	if err != nil {
		return false, err
	}
	prefix, err = toNorm(prefix, normBase)
	if err != nil {
		return false, err
	}
	return strings.HasPrefix(p, prefix), nil
}

So, we can also handle short filenames.
And we should also handle unicode normalization issue on other platforms, which maybe be a trivial change though. The following code fails on hfs+.

package main

import (
	"os"

	"golang.org/x/text/unicode/norm"
)

var a = norm.NFC.String("ぷ")
var b = norm.NFD.String("ぷ")

func main() {
	f, err := os.Create(a)
	if err != nil {
		panic(err)
	}
	f.Close()
	defer os.Remove(a)
	f, err = os.Open(b)
	if err == nil {
		panic("filesystem do the unicode normalization")
	}
}

@IllyaYalovyy
Copy link
Contributor

Is anyone working on this? It seems like a useful function. I would like to help. I think we don't have too many options. Here is the summary. In order to do it 100% correct we need to know case sensitivity of the mounted file system. As far as I know it can be done only by accessing a file/folder on that file system. (User can mount FS with different case sensitivity side by side, even Windows is going to support case sensitive FS). The existing signature doesn't really allow for that. My proposal is as follows:

  1. Update the existing function to address mentioned in the deprecation note concerns. Use case sensitivity default for the current OS (linux -case sensitive, Win/Mac - case insensitive, etc)
  2. Add a new function that does it the same way dep.HasFilepathPrefix().

Please let me know whether it makes sense.

@ianlancetaylor
Copy link
Contributor

I think the first step is to define exactly what the function should do.

@IllyaYalovyy
Copy link
Contributor

@ianlancetaylor
Thank you for the quick response. This is a good point. May be the best course of action is to just let it alone, and just add a new one. Which will use dep.HasFilepathPrefix() logic. What do you think?

Here are requirements I can see right now:

  1. It should take into account boundaries
  2. It should use actual case sensitivity (detected by accessing the parent folder)
  3. Compare based on proper Unicode case-folding

Does it sound useful at all?

@mathstuf
Copy link

we need to know case sensitivity of the mounted file system

It's even more complicated than that: directories can have a "is case sensitive" flag too on Windows. Linux's ext4 also has this functionality now too, so it's not just a Windows problem. https://lwn.net/Articles/784041/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests