Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

path/filepath: Clean definition is not idempotent #22861

Open
larhun opened this issue Nov 23, 2017 · 8 comments
Open

path/filepath: Clean definition is not idempotent #22861

larhun opened this issue Nov 23, 2017 · 8 comments
Labels
Documentation NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@larhun
Copy link

larhun commented Nov 23, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

version 1.9

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

windows/amd64

Issue

The algorithm defined for the Clean function is faulty and, in addition, is not faithfully implemented.

In particular, the definition is faulty because the idempotent condition

Clean(Clean(x)) == Clean(x)

could fail, if Separator is not '/', as on Windows systems.

In fact, by the algorithm documentation, on Windows systems '/' would not be processed as Separator, until the final translation to Separator. Thus, applying Clean once or twice (with '/' translated to Separator at the end of the first cleaning step) could return different results, contrary to the idempotent condition.

In addition, the algorithm implementation slightly deviates from the documentation. In fact, every character that satisfies

os.IsPathSeparator(c) == true

is processed as Separator, not only Separator, as stated in the algorithm definition. On Windows systems, this is true for both: '\\' and '/'.

Please note that, if os.IsPathSeparator('/') is true for all available implementations, '/' is always processed as Separator. Thus, idempotence can be magically preserved, despite the errors in the algorithm definition and implementation.

Proposal

I propose to clarify the algorithm documentation and adapt the implementation as follows.

Documentation

  1. change the first sentence to

Clean returns the shortest path name equivalent to path by purely lexical processing. First, it replaces any occurrence of slash by Separator. Then, it applies the following rules iteratively until no further processing can be done:

  1. replace slash with Separator in the sentence

The returned path ends in a slash ...

  1. remove the sentence

Finally, any occurrences of slash are replaced by Separator.

Implementation

  1. add the follwing statement at the first line

    path = FromSlash(path)

  2. change any expression like os.IsPathSeparator(x) to

    x == Separator

  3. change the last line to:

    return out.string()

@davecheney
Copy link
Contributor

davecheney commented Nov 24, 2017 via email

@larhun
Copy link
Author

larhun commented Nov 24, 2017

The issue can be demonstrated by a program only if os.IsPathSeparator('/') returns false, that never occurs on my Windows system.

Therefore, to prove that the algorithm definition is faulty, a modified path/filepath package is provided: github.com/larhun/filepath is a copy of the original, except that any occurrence of os.IsPathSeparator in the Clean function is replaced by IsPathSeparator, defined as

func IsPathSeparator(s byte) bool {
    return s == Separator
}

that is true only for Separator, conforming to the algorithm defined for the Clean function.

Then, the following program prints false.

package main

import(
    "fmt"
    "github.com/larhun/filepath"
)

func main(){
    p1:="/../../.."
    p2:=filepath.Clean(p1) // returns "\..\..\.."
    p3:=filepath.Clean(p2) // returns "\"
    fmt.Println(p2==p3)    // i.e. Clean(Clean(p1)) != Clean(p1)
}

For an idempotent function, the expected value is true.

@cznic
Copy link
Contributor

cznic commented Nov 24, 2017

Do I understand correctly that you're saying the issue occurs only when a condition is met - but that condition cannot happen?

@larhun
Copy link
Author

larhun commented Nov 24, 2017

No, I am not saying that. The issue is about documentation and implementation, that are faulty and not consistent.

The documentation states:

Clean returns the shortest path name equivalent to path by purely lexical processing. It applies the following rules iteratively until no further processing can be done:

1. Replace multiple Separator elements with a single one.
2. Eliminate each . path name element (the current directory).
3. Eliminate each inner .. path name element (the parent directory)
   along with the non-.. element that precedes it.
4. Eliminate .. elements that begin a rooted path:
   that is, replace "/.." by "/" at the beginning of a path,
   assuming Separator is '/'.

The returned path ends in a slash only if it represents a root directory, such as "/" on Unix or C:\ on Windows.

Finally, any occurrences of slash are replaced by Separator.

If the result of this process is an empty string, Clean returns the string ".".

I am saying that the previous algorithm is not right, because it states that only Separator is processed, before finally replacing any slash occurrence. If this was implemented precisely, then the Clean function could not be idempotent on Windows systems, as shown by the provided program.

However, the implementation deviates from the documentation and processes as Separator any character that is true for os.IsPathSeparator and, on Windows systems, that includes the slash. Then, magically, the Clean function results idempotent, despite the wrong algorithm and the wrong implementation.

The documentation should not be misleading. The right algorithm should start by first replacing the slash with Separator. The right implementation should perform as stated by the algorithm.

@cznic
Copy link
Contributor

cznic commented Nov 24, 2017

If the implementation is faulty as claimed, there must be a way how to demonstrate the issue with that faulty implementation. But earlier you wrote:

The issue can be demonstrated by a program only if os.IsPathSeparator('/') returns false, that never occurs on my Windows system.

Is there any other system where os.IsPathSeparator('/') returns false? Because if there isn't any such then somewhere we have contradicting pieces of information.

@larhun
Copy link
Author

larhun commented Nov 24, 2017

To clarify my previous notes, there are two errors that translate into a correct behavior.

  • The first mistake is the definition of the algorithm, which formally does not guarantee the expected behavior, as shown by the provided program.

  • The second is the implementation, which differs from the stated algorithm, but eventually achieves the expected result.

Therefore,

  • The definition should be changed for the clarity and quality of the algorithm documentation.

  • The implementation is not true, but it does work as long as os.IsPathSeparator does not have a new system implementation that returns false for the slash. It should be changed for the quality of the source code.

I propose to change both.

@as
Copy link
Contributor

as commented Nov 25, 2017

Any changes to this package will affect millions of users that import it. The correct behavior is the one that causes the least surprise and adheres to the compatibility guidelines.

The risk involved in changing the algorithm is that the new algorithm will be incorrect (and that this incorrectness will also be inconsistent). File path handling is subtle, and requires many special cases to get right. That is probably the reason the code looks the way it does now.

On a system where '/' isn't a separator, what hope is there for correctness anyway?

@ianlancetaylor
Copy link
Contributor

I don't see any reason to change the implementation.

If you think that we should change the documentation, by all means send a pull request. Thanks.

@ianlancetaylor ianlancetaylor added Documentation NeedsFix The path to resolution is known, but the work has not been done. labels Mar 29, 2018
@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Mar 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants