Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/env: populate a struct from environment variables #64891

Open
imjasonh opened this issue Dec 28, 2023 · 19 comments
Open

proposal: encoding/env: populate a struct from environment variables #64891

imjasonh opened this issue Dec 28, 2023 · 19 comments
Labels
Milestone

Comments

@imjasonh
Copy link

imjasonh commented Dec 28, 2023

Proposal Details

Today, programs can query environment variables using methods in os: os.Getenv, os.Environ, os.LookupEnv, etc.

foo := os.Getenv("FOO")
bar, found := os.LookupEnv("BAR")
all := os.Environ()

This works and is very simple, but can get more complicated if you need to further validate or especially convert the string values to other types.

e := os.Getenv("NUM_FOOS")
if e == "" {
    // NUM_FOOS must be set!
}
i, err := strconv.Atoi(e)
if err != nil {
    // NUM_FOOS must be parseable as an int!
}

If a program relies on many environment variables to configure its behavior, a common smells can creep in: authors call os.Getenv from deep within their code, which can make it hard to test (T.Setenv helps)

The alternative, slightly beter, is to populate a struct at the top of their program from env vars, and pass around this struct after validation/conversion is done.

https://github.com/kelseyhightower/envconfig is a very popular, very simple module to make this second approach simpler, by populating a struct from env vars, with type conversion and basic validation built in and configurable using struct tags.

Before:

e := os.Getenv("NUM_FOOS")
if e == "" {
    // NUM_FOOS must be set!
}
i, err := strconv.Atoi(e)
if err != nil {
    // NUM_FOOS must be parseable as an int!
}
... for each env var to process

After:

var env struct {
    Debug       bool
    Port        int    `required:"true"`
    User        string `default:"foobar"`
    Users       []string
    Rate        float32
    Timeout     time.Duration
    ColorCodes  map[string]int
}
if err := envconfig.Process("", &env); err != nil {
    // Something went wrong!
}

The second arg lets callers pass a prefix to env vars, so using envconfig.Process("MY", &env) would populat TimeoutfromMY_TIMEOUT`.

envconfig currently has 12k+ dependents on GitHub: https://github.com/kelseyhightower/envconfig/network/dependents

It has no dependencies outside of stdlib: https://github.com/kelseyhightower/envconfig/blob/10e87fe9eaec671f89425dc366f004a9336bcc8f/go.mod

I propose that some functionality like this be included in the stdlib directly.


There may be functionality included in envconfig that Go authors don't consider worth including in stdlib, or would implement or expose differently, and IMO that's totally fine.

I'm mainly opening this to get feedback and gauge interest in such a thing being included by default.

Personally I'd propose dropping the MY_ prefixing feature, and rename the package to os/env (or just env? Or include it as os.ProcessEnv?). If the required and default struct tags make it, they should be namespaced like env:"required" etc.

If multiple values failed validation, all the errors should be returned with errors.Join.

There's also support for custom decoders -- perhaps those should leverage TextUnmarshaler?

@gopherbot gopherbot added this to the Proposal milestone Dec 28, 2023
@kelseyhightower
Copy link
Contributor

Even the basic functionality of populating a struct from env vars would be a welcome addition to the standard library.

@imjasonh
Copy link
Author

Someone also pointed out https://github.com/sethvargo/go-envconfig which is very similar, and has 980 dependents according to GitHub: https://github.com/sethvargo/go-envconfig/network/dependents

It supports TextUnmarshaler and BinaryUnmarshaler, and nested fields with configurable prefixes.

I don't think all of these features are strictly necessary for a stdlib equivalent -- like @kelseyhightower I'd be perfectly happy with whatever basic featureset the Go team finds reasonable.

@seankhliao
Copy link
Member

seankhliao commented Dec 29, 2023

maybe it belongs in encoding/env along with all the other encoding/* packages that do struct reflection.

Which would point to an API like the following, allowing the actual environ source to be swapped out (I still think testing.T.Setenv encourages the wrong thing):

func Unmarshal(envs []string, dst any) error

func Marshal(dst any) ([]string, error)

usage would be like:

env.Unmarshal(os.Environ(), &mystruct)

@sethvargo
Copy link
Contributor

👋 Like Kelsey said, I've found "populate the value from the environment variable $FOO into this struct field" solves 80% of use cases. The other 20% are probably better left to an external library.

One of the things I'd strongly advocate for is having the ability to mock/stub out the environment by passing in a map or custom interface. Since the environment is inherently shared, it's not safe for concurrent use. Being able to pass in a map/interface has been extremely useful for envconfig.

@myaaaaaaaaa
Copy link

maybe it belongs in encoding/env along with all the other encoding/* packages that do struct reflection.

...

usage would be like:

env.Unmarshal(os.Environ(), &mystruct)

I would also advocate for having env.MustUnmarshal() alongside env.Unmarshal(), to make it easier to initialize the environment into a global variable:

var envStruct = env.MustUnmarshal[envStructType](os.Environ())

@imjasonh
Copy link
Author

var envStruct = env.MustUnmarshal[envStructType](os.Environ())

I think I'd prefer a generic Must #54297

@myaaaaaaaaa
Copy link

I think I'd prefer a generic Must #54297

That would work too, although in that case, env.Unmarshal would need to be tweaked to support global initialization:

package env
func Unmarshal[T any](envs []string) (T, error)

package main
var envStruct = must(env.Unmarshal[envStructType](os.Environ()))

@seankhliao
Copy link
Member

I don't think Marshal should take a type parameter and allocate for you, it's better to stay in line with current precedence, and also makes merging of config from various sources easier

@fzipp
Copy link
Contributor

fzipp commented Dec 31, 2023

I would also advocate for having env.MustUnmarshal() alongside env.Unmarshal()

Must* functions that panic are for cases where a careful programmer can provide the correct input at compile time (regular expressions, templates). That's not the case for environment variables. These functions are not a general error handling avoidance mechanism.

@tebeka
Copy link
Contributor

tebeka commented Dec 31, 2023

The serialization format must be well defined first.
Personally, I don't like the encoding for lists and maps in https://github.com/kelseyhightower/envconfig (doesn't support list values with , for example) - I prefer it'll be a known format such as JSON.

@seankhliao
Copy link
Member

we can follow go's convention for lists seen in GOFLAGS (#26849): space separated, allowing one level of quoting

we can probably go with no map support, or treat maps like flags in GOFLAGS

@fzipp
Copy link
Contributor

fzipp commented Dec 31, 2023

we can follow go's convention for lists seen in GOFLAGS (#26849): space separated, allowing one level of quoting

GOPROXY, GOMODCACHE, GONOPROXY, GONOSUMDB, GOPRIVATE, GOVCS are comma-separated, though (Ctrl-F "comma-separated" in https://go.dev/ref/mod)

@seankhliao
Copy link
Member

yes, but they don't need to support values which may include commas

@mvdan
Copy link
Member

mvdan commented Dec 31, 2023

My 2c: at least there should be general consensus on the scope and rough design of the proposal for it to be considered. Otherwise, the amount of upvotes simply signals "this would be nice to have in the standard library" without a clear idea of what it is we're even considering.

For example, it's entirely unclear to me whether nested structs would work, how lists or maps would work, what struct tag options would be supported, whether marshalling is also being proposed, or what would be the logic to match env var names to field names.

What seems to have worked well with recent new API proposals like log/slog or enhanced http routing was to implement the proposed design outside the standard library first. This forces the design to be clearly defined and documented, and also allows others to try it out before it's frozen. Presumably what is being proposed here is neither envconfig nor go-envconfig.

@chrispickard
Copy link

This proposal includes the “what” and the “how”, but doesn’t include the “why”. This functionality already exists in an external library that is stable and supported, so why include it in stdlib? As of right now I’m a -1 on this proposal

@imjasonh
Copy link
Author

imjasonh commented Dec 31, 2023

My 2c: at least there should be general consensus on the scope and rough design of the proposal for it to be considered. Otherwise, the amount of upvotes simply signals "this would be nice to have in the standard library" without a clear idea of what it is we're even considering.

That's completely fair. My bad.

As far as a concrete proposal, how's this:

  1. encoding/env.Unmarshal(env []string, s interface{}) error
  2. struct tags env:"required", env:"default:123", none others
  3. lists and maps are not supported (folks might propose this as a later addition?)
  4. nested structs are not supported
  5. configurable var prefixes (MY_) not supported
  6. marshaling not supported (folks might propose this later?)

I can take a crack at implementing this in a fork or new repo, to demonstrate the contours better.

This proposal includes the “what” and the “how”, but doesn’t include the “why”. This functionality already exists in an external library that is stable and supported, so why include it in stdlib? As of right now I’m a -1 on this proposal

That's also totally fair.

I think there's some value in having this be blessed as "the way" folks should do this. Having stable supported external packages is definitely nice, but there are at least two of them, and they have diverging semantics and featuresets. Having an even more stable one in stdlib (even with fewer features; especially with fewer features!) would help move folks toward this as the right way for 80% of cases.

The alternative ("do nothing") is that folks not wanting to take a dep on an external package would have to make do with os.Getenv and manual validation/conversion, which is error prone and tedious. Some folks might respond to that tedium by building yet another package to do this.

This borrows some of the motivation from the log/slog proposal:

Go has many popular structured logging packages, all good at what they do. We do not expect developers to rewrite their existing third-party structured logging code to use this new package. We expect existing logging packages to coexist with this one for the foreseeable future.

We have tried to provide an API that is pleasant enough that users will prefer it to existing packages in new code, if only to avoid a dependency. [...] We also expect newcomers to Go to encounter this package before learning third-party packages, so they will likely be most familiar with it.

As there, there are already env-to-struct packages in the wild, but my hope is that having a very solid one in stdlib would make relatively common environment processing tasks easier for everyone.

@seankhliao seankhliao changed the title proposal: os/env: populate a struct from environment variables proposal: encoding/env: populate a struct from environment variables Dec 31, 2023
@earthboundkid
Copy link
Contributor

Years ago, @peterbourgon made ff, which, among other things, parses env vars using the flag package. I made my own derivative of that called flagx.ParseEnv() which does the same thing more or less. I think reading env vars is basically the same operation as reading command line flags, except there are prefixes to drop and they need to be converted from snake case to kebob case. I'm not sure this makes sense as a new package and not just a new mode for the existing flag package. Or it can be a third party package, as noted above.

Among problems with just writing it to a struct: how do you deal with field types other than string? Either ignore them (less useful) or have a conversion functionality. But if you have a conversion functionality, you end up duplicating the flag package. So it should probably just be a new mode on the existing flag package.

@shaj13
Copy link

shaj13 commented Mar 28, 2024

I ended up forking the flag package (https://github.com/shaj13/env) to create a mechanism that enables users to parse and declare environment variables similarly.

As a developer, I sometimes forget the environment variable names and have to refer back to the code or documentation to find them.

It's more beneficial to have the env pkg structured similarly to the flagpkg. This allows users to parse environment variables into variables or get pointers to them, as well as to struct fields. This approach is also beneficial for CLI purposes, as it enables printing the default values and usage when using the -h flag. Many CLI tools today allow overriding flags using environment variables.

package main

import (
	"errors"
	"fmt"
	"os"
	"strings"
	"time"

	"env"
)

var _ = species

// Example 1: A single string env called "species" with default value "gopher".
var species = env.String("species", "gopher", "the species we are studying")

// Example 2: A single string var env called "gopher_type".
// Must be set up with an init function.
var gopherType string

func init() {
	env.StringVar(&gopherType, "gopher_type", "pocket", "the variety of gopher")
}

// Example 3: A user-defined env type, a slice of durations.
type interval []time.Duration

// String is the method to format the env's value, part of the env.Value interface.
// The String method's output will be used in diagnostics.
func (i *interval) String() string {
	return fmt.Sprint(*i)
}

// Set is the method to set the env value, part of the env.Value interface.
// Set's argument is a string to be parsed to set the env.
// It's a comma-separated list, so we split it.
func (i *interval) Set(value string) error {
	if len(*i) > 0 {
		return errors.New("interval env already set")
	}
	for _, dt := range strings.Split(value, ",") {
		duration, err := time.ParseDuration(dt)
		if err != nil {
			return err
		}
		*i = append(*i, duration)
	}
	return nil
}

// Define a env to accumulate durations. Because it has a special type,
// we need to use the Var function and therefore create the env during
// init.

var intervalEnv interval

func init() {
	// Tie the environ to the intervalEnv variable and
	// set a usage message.
	env.Var(&intervalEnv, "delta_t", "comma-separated list of intervals to use between events")
}

type Config struct {
	Host string
	Port string
	// ....
}

func init() {
	cfg := new(Config)
	// Tie the environ to the struct fields and
	// set a usage messages.
	env.StringVar(&cfg.Host, "host", "localhost", "App host name")
	env.StringVar(&cfg.Port, "port", "443", "App port")
}

func main() {
	os.Setenv("DELTA_T", "1s,2m,3h")

	// All the interesting pieces are with the variables declared above, but
	// to enable the env package to see the env defined there, one must
	// execute, typically at the start of main (not init!):
	env.Parse()

	fmt.Println("Interval: ", intervalEnv)   // print user defined env value
	fmt.Println("Gopher Type: ", gopherType) // print default env value
	fmt.Println("Species: ", *species)       // print default env value
	
	env.Usage() // print the usage
}

@fxrlv
Copy link
Contributor

fxrlv commented Mar 28, 2024

In addition to @peterbourgon's ff which was mentioned here, there is https://github.com/gobwas/flagutil (inspired by ff).

The key idea is the same — let's treat flag.FlagSet as a container to define any variable (not only command-line flags).
Then we are able to populate flag.FlagSet any way we like (command-line, json, env, yaml).

This way variables are unaware of the method they have been set.

simplified example

type Config struct {
	Endpoint string
	Timeout  time.Duration
}

func Define(fset *flag.FlagSet) *Config {
	c := new(Config)

	fset.StringVar(&c.Endpoint, "endpoint", "localhost:80", "")
	fset.DurationVar(&c.Timeout, "timeout", 5*time.Second, "")

	return c
}

var (
	config = Define(flag.CommandLine)
)

func main() {
	flag.Parse()
	// flag.ParseEnv(os.LookupEnv)
	// flag.ParseJSON("config.json")
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests