Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: os: Splitenv #62038

Open
zyga opened this issue Aug 15, 2023 · 10 comments
Open

proposal: os: Splitenv #62038

zyga opened this issue Aug 15, 2023 · 10 comments
Labels
Milestone

Comments

@zyga
Copy link

zyga commented Aug 15, 2023

I'd like to propose adding:

// Splitenv splits a key=value entry, as found in the environment block.
//
// Splitenv understands Windows-specific environment entries that contain
// a leading '=' in the key name.
func Splitenv(s string) (key, value string, ok bool) {
   ...
}

This issue is a continuation of #61956 where, after some archeology, it was re-discovered that on Windows, environment variables have one of two forms. In addition to the well-known key=value, the form =key=value is allowed and indeed used by cmd.exe.

Problems related to =key=value entires have been previously encountered in #49886 and #52436.

I believe that this belongs in the standard library, so that more Go programs are portable across platforms, and so that people do not have to re-invent this particular elliptical wheel whenever they port their stack to Windows and notice something unusual.

In addition, although this may be a separate proposal due to the difficulty of detecting this reliably, go vet might detect some common, incorrect attempts to implement Splitenv based on strings.Cut or similar functions.

Lastly I would suggest that documentation of os.Environ mentions os.Splitenv, so that developers are pointed towards the right code and, perhaps, refrain from implementing this function by hand.

I'm happy to implement the function, tests or anything else required.

@zyga zyga added the Proposal label Aug 15, 2023
@gopherbot gopherbot added this to the Proposal milestone Aug 15, 2023
@seankhliao seankhliao changed the title proposal: affected/package: os proposal: os: Splitenv Aug 15, 2023
@seankhliao
Copy link
Member

cc @bcmills @golang/windows

@Jorropo
Copy link
Member

Jorropo commented Aug 15, 2023

Wouldn't ok bool benefit to be err error type ?
So then this function can return meaningful errors messages instead of false.

@bcmills
Copy link
Contributor

bcmills commented Aug 15, 2023

@Jorropo, there are no meaningful error messages to report. Either the entry contains a non-leading = character (and can be split), or it does not (and cannot be split).

Per #52436 (comment), on some platforms it may be valid for Environ to contain entries that are not of the usual key=value form.

@Jorropo
Copy link
Member

Jorropo commented Aug 15, 2023

Wouldn't it error if I pass it something like "Key" ?

@bcmills
Copy link
Contributor

bcmills commented Aug 15, 2023

One question, though. If s does not contain a non-leading = character, what should the key and value return-values contain?

An implementation by analogy with strings.Cut would place the entire string in the key variable, although to me it is somewhat more intuitive to treat the whole string as a value. 🤔

@bcmills
Copy link
Contributor

bcmills commented Aug 15, 2023

Wouldn't it error if I pass it something like "Key"?

No. It would return with ok == false. Please read the comment I linked earlier.

@Jorropo
Copy link
Member

Jorropo commented Aug 15, 2023

If it returns ok == false wouldn't it be more clear for users if it returned errors.New("env variable without = on windows") ?
I've red the comment but this was new information to me, I was thinking from a new user point of view, it's nice to see the error explain why it failed.
false is not very descriptive.

This was just a suggestion, if this is evident to anyone that needs to use this function the use bool.

@apparentlymart
Copy link

apparentlymart commented Aug 15, 2023

@Jorropo Both approaches have precedent in existing language and library features, and I agree that the tradeoff between them is subtle.

A result, error signature is typically used to represent something which has success and failure conditions, especially when there are multiple error conditions, and possibly unbounded error conditions for functions that call other functions dynamically. In languages that use tagged unions instead of multiple return values, this is analogous to a "result" type, like Result<T, E> in Rust.

A result, bool signature is typically used for a "maybe" situation, which we might understand as an encapsulation of "if X, return Y; otherwise return nothing". Go doesn't have a concept of dynamically returning nothing, so instead the second result being false represents that the condition didn't hold. In tagged-union-ish languages this is analogous to an "option" or "maybe" type, like Option<T> in Rust.

(Neither analogy is perfect because in principle Go functions can return both a result and an error, or return a result even when the boolean indicator is false. But I think the use-cases for each are similar to the corresponding tagged unions' use-cases, and I think Go has examples of both patterns for the same reason as Rust has both types, even though technically Result<T, ()> would be essentially equivalent to Option<T> if we disregard the semantics we apply to each of them as humans.)

With that said then, I understand this proposal as being for a function that represents "If the string follows the conventions of a key/value pair, return the separated key and value; otherwise <do something>" -- the "do something" part is a subject of active discussion above, but one example I see above is "otherwise return the whole string as the value and an empty key". In that case, the boolean result is not indicating success or failure but instead indicating whether the predicate held.

It also seems a valid interpretation to define this as "try to parse this as a key/value pair", and consider the inability to do so as an error. But I think the assumption here is that even if we do think of this as a failure rather than just one of two possible outcomes, there is only one possible error case and no evidence that there will be any new error cases in future, and so this function would presumably always return a single fixed error and callers would only ever compare it to nil to to produce a boolean outcome anyway, in which case returning the error is just additional friction without any additional information.

FWIW I don't have a strong opinion either way, but both of these designs feel functionally equivalent to me and so I'm therefore inclined to marginally prefer the simpler of the two, which is to return a boolean directly.

@ianlancetaylor
Copy link
Contributor

Most code can and should call os.Getenv. This only arises for code that calls os.Environ and walks through the result. Perhaps we can simply update the Environ docs to explicitly point out that on Windows some entries may have a leading =. To put it another way, since I think not much code uses os.Environ, even less code needs os.Splitenv.

@apparentlymart
Copy link

apparentlymart commented Aug 16, 2023

It is kinda annoying that the platform-agnostic abstraction provided by os.Environ is incomplete and thus the caller must use platform-specific code to properly handle the result, so I was initially favorable from the perspective of "completing the abstraction", but honestly I can't really argue with the idea that os.Environ itself is strange and rarely needed in practice.

Out of curiosity I searched the main codebase I maintain in my day job and found several uses of os.Environ, but less than 10.

Several of them are taking that result, appending new entries to it, and passing the result into the Env field of an os/exec.Command object, with the intention of passing on environment variables to a child process while adding a few new ones. These examples treat all of the os.Environ results as opaque strings and so would not use os.Splitenv.

Three of the uses seem potentially problematic:

  1. Searching for all environment variables whose names have a given prefix, with code similar to the following:

    const prefix = "EXAMPLE_"
    for _, raw := range os.Environ() {
        if !strings.HasPrefix(raw, prefix) {
            continue
        }
        raw = raw[len(prefix):] // trim the prefix
        key, value, ok = strings.Cut(raw, "=")
        # ...
    }

    Based on the explanation of what these equals-prefixed entries mean, this use seems correct even on Windows: there's no reason why one of the environment variables my program is interested in would ever have that leading equals.

  2. Turning the []string returned from os.Environ into a map. This one I'm just including verbatim since it's a separate utility function:

    func makeEnvMap(environ []string) map[string]string {
    	if len(environ) == 0 {
    		return nil
    	}
    
    	ret := make(map[string]string, len(environ))
    	for _, entry := range environ {
    		eq := strings.IndexByte(entry, '=')
    		if eq == -1 {
    			continue
    		}
    		ret[entry[:eq]] = entry[eq+1:]
    	}
    	return ret
    }    

    Taken in isolation, this utility function is incorrect: on Windows it could potentially end up with an extra entry in the map whose key is "" and whose value is the entry for whichever drive letter happened to appear last in the environment block.

    However, the only caller of this function then proceeds to just pluck out a few specific entries from the map using constant keys, so the extra "" elements are harmless. After a little archaeology I've concluded that it's only using this method rather than os.Getenv directly because there's a unit test that passes in various hand-written maps into the function to test how it treats different combinations of environment variables. It could potentially achieve that by taking a func (string) string instead, and then having the test pass in a closure that reads from a map and having the real code just pass os.Environ directly.

  3. The third instance is, for our purposes here, essentially equivalent to the first case but with some different details. Again, it's only interested in explicitly-set environment variables with a particular name prefix, and so it just ignores strings that start with =.

Based on this analysis, my codebase in particular could use os.Splitenv to make case 2 more correct on Windows, but that localized improvement would not actually make any difference to this function's one caller, so I'd sooner just remove that function entirely and adopt a different approach for testing.

I don't mean to imply that this wouldn't be useful to other programs, but I did think it was interesting that this program in particular is using os.Environ perhaps more than the average program does and yet still, despite not actually attempting to handle this Windows-specific exception, behaves correctly nonetheless.

For my purposes, documentation would be sufficient although I'd prefer that documentation to describe the specific situations where these odd environment entries are expected rather than just make a general allusion to "you might see this sort of thing", since it seems like it's not actually important to handle that exception in practice unless you are particularly interested in interacting with this implementation detail of cmd.exe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

7 participants