Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: add func UserCacheDir() string #22536

Closed
bradfitz opened this issue Nov 1, 2017 · 18 comments
Closed

os: add func UserCacheDir() string #22536

bradfitz opened this issue Nov 1, 2017 · 18 comments

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Nov 1, 2017

I've seen a dozen or so implementations of looking up the user's cache directory.

The latest is from @rsc in:

https://go-review.googlesource.com/c/go/+/68116/3/src/cmd/go/internal/cache/default.go

Which says:

// TODO(rsc): This code belongs somewhere else,
// like maybe ioutil.CacheDir or os.CacheDir.

Some examples of other implementations:
https://godoc.org/go4.org/xdgdir
https://godoc.org/camlistore.org/pkg/osutil#CacheDir
https://github.com/golang/crypto/blob/bd6f299fb381e4c3393d1c4b1f0b94f5e77650c8/acme/autocert/listener.go#L142

Consolidate into std?

@gopherbot gopherbot added this to the Proposal milestone Nov 1, 2017
@rsc
Copy link
Contributor

rsc commented Nov 2, 2017

Yes, I think we should do this. The windows docs I found said to use %localappdata% for caches, which none of those do. And I know of other copies in the dashboard scraper that (incorrectly) use $HOME/.cache on macOS, and so on. This definitely seems worth putting into the standard library.

Reading the actual definitions of ioutil.TempDir and os.TempDir, this looks closer to os.TempDir (return $TMPDIR) than ioutil.TempDir (make and return a fresh dir under $TMPDIR), so it should probably be os.CacheDir.

Probably should leave for Go 1.11 at this point.

@mvdan
Copy link
Member

mvdan commented Nov 2, 2017

What about a config/data variant? It's similarly tricky to get right.

@bradfitz
Copy link
Contributor Author

bradfitz commented Nov 2, 2017

The os/user package might be a good place for this, since it os/user would likely be a dependency anyway, to find the current user and their $HOME.

@rsc
Copy link
Contributor

rsc commented Nov 3, 2017

If we put it in os/user then it's harder for cmd/go to use it (it cannot depend on cgo for bootstrap). I don't believe os/user is necessary for XDG anyway - the spec very clearly says $HOME not "the user's home directory".

@mvdan
Copy link
Member

mvdan commented Nov 3, 2017

Perhaps a silly question, but wouldn't this be a bit limited then? If a Go program is run outside a shell, $HOME would likely not be set.

@rsc
Copy link
Contributor

rsc commented Nov 3, 2017

@mvdan that's not very specific. What execution environment do you have in mind that does not set $HOME?

@mvdan
Copy link
Member

mvdan commented Nov 3, 2017

I was thinking anything that doesn't get run under a shell. I was hoping to demonstrate this with an empty docker container just running a Go program printing $HOME, but it seems like docker has a fallback of HOME=/ in that case.

Systemd also came to mind, but it seems to also take care of the variable on its own. And, on Linux, login programs apparently set it too, so opening a terminal without a shell still has $HOME present.

I thought that relying on this variable was more fragile than it actually is - my bad. Still curious if any reasonable execution environment wouldn't have it set.

@rsc
Copy link
Contributor

rsc commented Nov 6, 2017

It sounds like we should do this, as os.UserCacheDir. Per discussion with @bradfitz, let's run with $HOME (not using os/user) until we know a compelling reason to need the fallback.

@rsc rsc modified the milestones: Proposal, Go1.11 Nov 6, 2017
@rsc rsc changed the title proposal: io/ioutil: add functions to get user cache directory? os: add func UserCacheDir() string Nov 6, 2017
@andybons andybons self-assigned this Nov 17, 2017
@gopherbot
Copy link

Change https://golang.org/cl/78835 mentions this issue: os: add func UserCacheDir() string

@magical
Copy link
Contributor

magical commented Nov 20, 2017

It sounds like we should do this, as os.UserCacheDir. Per discussion with @bradfitz, let's run with $HOME (not using os/user) until we know a compelling reason to need the fallback.

If this goes in as os.UserCacheDir, then isn't it true that it can never fall back on os/user because os cannot depend on os/user?

@bradfitz
Copy link
Contributor Author

If this goes in as os.UserCacheDir, then isn't it true that it can never fall back on os/user because os cannot depend on os/user?

There are shenanigans we can do with internal packages to make it work later if we wanted.

@magical
Copy link
Contributor

magical commented Nov 21, 2017

There are shenanigans we can do with internal packages to make it work later if we wanted.

I assume you mean something like creating an internal/os/user package that both os and os/user depend on.

Suppose you did that. On unix, os/user can get the user data one of two ways

  1. Using a C standard library call. Russ objected to cgo earlier, so i assume this is out.
  2. Reading /etc/password. This requires opening a file, which requires... os. So os still can't depend on internal/os/user because it creates a dependency loop. Now you either need to pull os.File into an internal package, or rewrite os/user using raw syscalls, or something else.

You're also basically pulling most of the os/user functionality into os, and I assume that there was a good reason to keep them separate in the first place.

It's probably not impossible to work around all these issues, but it seems like an awful lot of effort for something very small.

I don't really care either way whether UserCacheDir falls back on os/user or not (and Russ makes a good point that the spec doesn't say it has to). I just don't think you can defer the decision if you put it in os.

@bradfitz
Copy link
Contributor Author

Well, cgo could be done if needed. And regarding 2), we could e.g. have the os package tell the internal package how to open files, passing an implementation of func (name string) (io.ReadCloser, error) in some init block. And then the os/user and/or internal package would use that to read /etc/password.

So while I'm not too worried about the trickery we could play to make it work later if we needed the current user, I'm also not worried where this function lives. I'll defer to Russ who seems content with package os.

@magical
Copy link
Contributor

magical commented Nov 21, 2017

As you will.

@nightlyone
Copy link
Contributor

Why is it implemented now without returning an error?

Chaining is impossible anyway, since the return value must be checked for the in-band error condition of empty string. So might as well return a real error value where error conditions for not being able to reduce a cache directory can be communicated, so the user/developer/admin can fix them.

With the currently implemented API the caller needs to re-implement the same functionality again with error handling when he receives an empty string to be able to tell the user how he can fix the situation.

@mvdan
Copy link
Member

mvdan commented Feb 13, 2018

@nightlyone what about non-empty dir path, but the actual dir does not exist? Would the func also return an error in that case?

@mvdan
Copy link
Member

mvdan commented Jul 19, 2018

@bradfitz @rsc did you see @nightlyone's comment above? If we want to change the signature of this function, we should do that before the final 1.11 release.

@mvdan
Copy link
Member

mvdan commented Jul 19, 2018

Scratch my comment above; I just realised the error return parameter was added later in 50bd1c4.

wking added a commit to wking/openshift-installer that referenced this issue Sep 19, 2018
Checking our RHCOS source against ETag [1] / If-None-Match [2] and
Last-Modified [3] / If-Modified-Since [4], it seems to support
In-None-Match well, but only supports If-Modified-Since for exact
matches:

  $ URL=http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz
  $ curl -I "${URL}"
  HTTP/1.1 200 OK
  Server: nginx/1.8.0
  Date: Wed, 19 Sep 2018 04:32:19 GMT
  Content-Type: application/octet-stream
  Content-Length: 684934062
  Last-Modified: Tue, 18 Sep 2018 20:05:24 GMT
  Connection: keep-alive
  ETag: "5ba15a84-28d343ae"
  Accept-Ranges: bytes
  $ curl -sIH 'If-None-Match: "5ba15a84-28d343ae"' "${URL}" | head -n1
  HTTP/1.1 304 Not Modified
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2018 20:05:24 GMT' "${URL}" | head -n1
  HTTP/1.1 304 Not Modified
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2015 20:05:24 GMT' "${URL}" | head -n1
  HTTP/1.1 200 OK
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2018 20:05:25 GMT' "${URL}" | grep 'HTTP\|Last-Modified'
  HTTP/1.1 200 OK
  Last-Modified: Tue, 18 Sep 2018 20:05:24 GMT

That last entry should have 304ed, although the spec has [4]:

  When used for cache updates, a cache will typically use the value of
  the cached message's Last-Modified field to generate the field value
  of If-Modified-Since.  This behavior is most interoperable for cases
  where clocks are poorly synchronized or when the server has chosen
  to only honor exact timestamp matches (due to a problem with
  Last-Modified dates that appear to go "back in time" when the origin
  server's clock is corrected or a representation is restored from an
  archived backup).  However, caches occasionally generate the field
  value based on other data, such as the Date header field of the
  cached message or the local clock time that the message was
  received, particularly when the cached message does not contain a
  Last-Modified field.
  ...
  When used for cache updates, a cache will typically use the value of
  the cached message's Last-Modified field to generate the field value
  of If-Modified-Since.  This behavior is most interoperable for cases
  where clocks are poorly synchronized or when the server has chosen
  to only honor exact timestamp matches (due to a problem with
  Last-Modified dates that appear to go "back in time" when the origin
  server's clock is corrected or a representation is restored from an
  archived backup).  However, caches occasionally generate the field
  value based on other data, such as the Date header field of the
  cached message or the local clock time that the message was
  received, particularly when the cached message does not contain a
  Last-Modified field.

So the server is violating the SHOULD by not 304ing dates greater than
Last-Modified, but it's not violating a MUST-level requirement.  The
server requirements around If-None-Match are MUST-level [2], so using
it should be more portable.  The RFC also seems to prefer clients use
If(-None)-Match [4,5].

I'm using gregjones/httpcache for the caching, since that
implemenation seems reasonably popular and the repo's been around for
a few years.  That library uses the belt-and-suspenders approach of
setting both If-None-Match (to the cached ETag) and If-Modified-Since
(to the cached Last-Modified) [6], so we should be fine.

UserCacheDir requires Go 1.11 [7,8,9].

[1]: https://tools.ietf.org/html/rfc7232#section-2.3
[2]: https://tools.ietf.org/html/rfc7232#section-3.2
[3]: https://tools.ietf.org/html/rfc7232#section-2.2
[4]: https://tools.ietf.org/html/rfc7232#section-3.3
[5]: https://tools.ietf.org/html/rfc7232#section-2.4
[6]: https://github.com/gregjones/httpcache/blob/9cad4c3443a7200dd6400aef47183728de563a38/httpcache.go#L169-L181
[7]: golang/go#22536
[8]: golang/go@816154b
[9]: golang/go@50bd1c4
wking added a commit to wking/openshift-installer that referenced this issue Sep 19, 2018
Checking our RHCOS source against ETag [1] / If-None-Match [2] and
Last-Modified [3] / If-Modified-Since [4], it seems to support
In-None-Match well, but only supports If-Modified-Since for exact
matches:

  $ URL=http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz
  $ curl -I "${URL}"
  HTTP/1.1 200 OK
  Server: nginx/1.8.0
  Date: Wed, 19 Sep 2018 04:32:19 GMT
  Content-Type: application/octet-stream
  Content-Length: 684934062
  Last-Modified: Tue, 18 Sep 2018 20:05:24 GMT
  Connection: keep-alive
  ETag: "5ba15a84-28d343ae"
  Accept-Ranges: bytes
  $ curl -sIH 'If-None-Match: "5ba15a84-28d343ae"' "${URL}" | head -n1
  HTTP/1.1 304 Not Modified
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2018 20:05:24 GMT' "${URL}" | head -n1
  HTTP/1.1 304 Not Modified
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2015 20:05:24 GMT' "${URL}" | head -n1
  HTTP/1.1 200 OK
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2018 20:05:25 GMT' "${URL}" | grep 'HTTP\|Last-Modified'
  HTTP/1.1 200 OK
  Last-Modified: Tue, 18 Sep 2018 20:05:24 GMT

That last entry should have 304ed, although the spec has [4]:

  When used for cache updates, a cache will typically use the value of
  the cached message's Last-Modified field to generate the field value
  of If-Modified-Since.  This behavior is most interoperable for cases
  where clocks are poorly synchronized or when the server has chosen
  to only honor exact timestamp matches (due to a problem with
  Last-Modified dates that appear to go "back in time" when the origin
  server's clock is corrected or a representation is restored from an
  archived backup).  However, caches occasionally generate the field
  value based on other data, such as the Date header field of the
  cached message or the local clock time that the message was
  received, particularly when the cached message does not contain a
  Last-Modified field.
  ...
  When used for cache updates, a cache will typically use the value of
  the cached message's Last-Modified field to generate the field value
  of If-Modified-Since.  This behavior is most interoperable for cases
  where clocks are poorly synchronized or when the server has chosen
  to only honor exact timestamp matches (due to a problem with
  Last-Modified dates that appear to go "back in time" when the origin
  server's clock is corrected or a representation is restored from an
  archived backup).  However, caches occasionally generate the field
  value based on other data, such as the Date header field of the
  cached message or the local clock time that the message was
  received, particularly when the cached message does not contain a
  Last-Modified field.

So the server is violating the SHOULD by not 304ing dates greater than
Last-Modified, but it's not violating a MUST-level requirement.  The
server requirements around If-None-Match are MUST-level [2], so using
it should be more portable.  The RFC also seems to prefer clients use
If(-None)-Match [4,5].

I'm using gregjones/httpcache for the caching, since that
implemenation seems reasonably popular and the repo's been around for
a few years.  That library uses the belt-and-suspenders approach of
setting both If-None-Match (to the cached ETag) and If-Modified-Since
(to the cached Last-Modified) [6], so we should be fine.

UserCacheDir requires Go 1.11 [7,8,9].

The BUILD.bazel changes were generated with:

  $ bazel run //:gazelle

using:

  $ bazel version
  Build label: 0.16.1- (@non-git)
  Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
  Build time: Mon Aug 13 16:42:29 2018 (1534178549)
  Build timestamp: 1534178549
  Build timestamp as int: 1534178549

[1]: https://tools.ietf.org/html/rfc7232#section-2.3
[2]: https://tools.ietf.org/html/rfc7232#section-3.2
[3]: https://tools.ietf.org/html/rfc7232#section-2.2
[4]: https://tools.ietf.org/html/rfc7232#section-3.3
[5]: https://tools.ietf.org/html/rfc7232#section-2.4
[6]: https://github.com/gregjones/httpcache/blob/9cad4c3443a7200dd6400aef47183728de563a38/httpcache.go#L169-L181
[7]: golang/go#22536
[8]: golang/go@816154b
[9]: golang/go@50bd1c4
wking added a commit to wking/openshift-installer that referenced this issue Sep 19, 2018
Checking our RHCOS source against ETag [1] / If-None-Match [2] and
Last-Modified [3] / If-Modified-Since [4], it seems to support
In-None-Match well, but only supports If-Modified-Since for exact
matches:

  $ URL=http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz
  $ curl -I "${URL}"
  HTTP/1.1 200 OK
  Server: nginx/1.8.0
  Date: Wed, 19 Sep 2018 04:32:19 GMT
  Content-Type: application/octet-stream
  Content-Length: 684934062
  Last-Modified: Tue, 18 Sep 2018 20:05:24 GMT
  Connection: keep-alive
  ETag: "5ba15a84-28d343ae"
  Accept-Ranges: bytes
  $ curl -sIH 'If-None-Match: "5ba15a84-28d343ae"' "${URL}" | head -n1
  HTTP/1.1 304 Not Modified
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2018 20:05:24 GMT' "${URL}" | head -n1
  HTTP/1.1 304 Not Modified
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2015 20:05:24 GMT' "${URL}" | head -n1
  HTTP/1.1 200 OK
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2018 20:05:25 GMT' "${URL}" | grep 'HTTP\|Last-Modified'
  HTTP/1.1 200 OK
  Last-Modified: Tue, 18 Sep 2018 20:05:24 GMT

That last entry should have 304ed, although the spec has [4]:

  When used for cache updates, a cache will typically use the value of
  the cached message's Last-Modified field to generate the field value
  of If-Modified-Since.  This behavior is most interoperable for cases
  where clocks are poorly synchronized or when the server has chosen
  to only honor exact timestamp matches (due to a problem with
  Last-Modified dates that appear to go "back in time" when the origin
  server's clock is corrected or a representation is restored from an
  archived backup).  However, caches occasionally generate the field
  value based on other data, such as the Date header field of the
  cached message or the local clock time that the message was
  received, particularly when the cached message does not contain a
  Last-Modified field.
  ...
  When used for cache updates, a cache will typically use the value of
  the cached message's Last-Modified field to generate the field value
  of If-Modified-Since.  This behavior is most interoperable for cases
  where clocks are poorly synchronized or when the server has chosen
  to only honor exact timestamp matches (due to a problem with
  Last-Modified dates that appear to go "back in time" when the origin
  server's clock is corrected or a representation is restored from an
  archived backup).  However, caches occasionally generate the field
  value based on other data, such as the Date header field of the
  cached message or the local clock time that the message was
  received, particularly when the cached message does not contain a
  Last-Modified field.

So the server is violating the SHOULD by not 304ing dates greater than
Last-Modified, but it's not violating a MUST-level requirement.  The
server requirements around If-None-Match are MUST-level [2], so using
it should be more portable.  The RFC also seems to prefer clients use
If(-None)-Match [4,5].

I'm using gregjones/httpcache for the caching, since that
implemenation seems reasonably popular and the repo's been around for
a few years.  That library uses the belt-and-suspenders approach of
setting both If-None-Match (to the cached ETag) and If-Modified-Since
(to the cached Last-Modified) [6], so we should be fine.

UserCacheDir requires Go 1.11 [7,8,9].

The BUILD.bazel changes were generated with:

  $ bazel run //:gazelle

using:

  $ bazel version
  Build label: 0.16.1- (@non-git)
  Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
  Build time: Mon Aug 13 16:42:29 2018 (1534178549)
  Build timestamp: 1534178549
  Build timestamp as int: 1534178549

[1]: https://tools.ietf.org/html/rfc7232#section-2.3
[2]: https://tools.ietf.org/html/rfc7232#section-3.2
[3]: https://tools.ietf.org/html/rfc7232#section-2.2
[4]: https://tools.ietf.org/html/rfc7232#section-3.3
[5]: https://tools.ietf.org/html/rfc7232#section-2.4
[6]: https://github.com/gregjones/httpcache/blob/9cad4c3443a7200dd6400aef47183728de563a38/httpcache.go#L169-L181
[7]: golang/go#22536
[8]: golang/go@816154b
[9]: golang/go@50bd1c4
wking added a commit to wking/openshift-installer that referenced this issue Sep 22, 2018
Checking our RHCOS source against ETag [1] / If-None-Match [2] and
Last-Modified [3] / If-Modified-Since [4], it seems to support
In-None-Match well, but only supports If-Modified-Since for exact
matches:

  $ URL=http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz
  $ curl -I "${URL}"
  HTTP/1.1 200 OK
  Server: nginx/1.8.0
  Date: Wed, 19 Sep 2018 04:32:19 GMT
  Content-Type: application/octet-stream
  Content-Length: 684934062
  Last-Modified: Tue, 18 Sep 2018 20:05:24 GMT
  Connection: keep-alive
  ETag: "5ba15a84-28d343ae"
  Accept-Ranges: bytes
  $ curl -sIH 'If-None-Match: "5ba15a84-28d343ae"' "${URL}" | head -n1
  HTTP/1.1 304 Not Modified
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2018 20:05:24 GMT' "${URL}" | head -n1
  HTTP/1.1 304 Not Modified
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2015 20:05:24 GMT' "${URL}" | head -n1
  HTTP/1.1 200 OK
  $ curl -sIH 'If-Modified-Since: Tue, 18 Sep 2018 20:05:25 GMT' "${URL}" | grep 'HTTP\|Last-Modified'
  HTTP/1.1 200 OK
  Last-Modified: Tue, 18 Sep 2018 20:05:24 GMT

That last entry should have 304ed, although the spec has [4]:

  When used for cache updates, a cache will typically use the value of
  the cached message's Last-Modified field to generate the field value
  of If-Modified-Since.  This behavior is most interoperable for cases
  where clocks are poorly synchronized or when the server has chosen
  to only honor exact timestamp matches (due to a problem with
  Last-Modified dates that appear to go "back in time" when the origin
  server's clock is corrected or a representation is restored from an
  archived backup).  However, caches occasionally generate the field
  value based on other data, such as the Date header field of the
  cached message or the local clock time that the message was
  received, particularly when the cached message does not contain a
  Last-Modified field.
  ...
  When used for cache updates, a cache will typically use the value of
  the cached message's Last-Modified field to generate the field value
  of If-Modified-Since.  This behavior is most interoperable for cases
  where clocks are poorly synchronized or when the server has chosen
  to only honor exact timestamp matches (due to a problem with
  Last-Modified dates that appear to go "back in time" when the origin
  server's clock is corrected or a representation is restored from an
  archived backup).  However, caches occasionally generate the field
  value based on other data, such as the Date header field of the
  cached message or the local clock time that the message was
  received, particularly when the cached message does not contain a
  Last-Modified field.

So the server is violating the SHOULD by not 304ing dates greater than
Last-Modified, but it's not violating a MUST-level requirement.  The
server requirements around If-None-Match are MUST-level [2], so using
it should be more portable.  The RFC also seems to prefer clients use
If(-None)-Match [4,5].

I'm using gregjones/httpcache for the caching, since that
implemenation seems reasonably popular and the repo's been around for
a few years.  That library uses the belt-and-suspenders approach of
setting both If-None-Match (to the cached ETag) and If-Modified-Since
(to the cached Last-Modified) [6], so we should be fine.

UserCacheDir requires Go 1.11 [7,8,9].

[1]: https://tools.ietf.org/html/rfc7232#section-2.3
[2]: https://tools.ietf.org/html/rfc7232#section-3.2
[3]: https://tools.ietf.org/html/rfc7232#section-2.2
[4]: https://tools.ietf.org/html/rfc7232#section-3.3
[5]: https://tools.ietf.org/html/rfc7232#section-2.4
[6]: https://github.com/gregjones/httpcache/blob/9cad4c3443a7200dd6400aef47183728de563a38/httpcache.go#L169-L181
[7]: golang/go#22536
[8]: golang/go@816154b
[9]: golang/go@50bd1c4
@golang golang locked and limited conversation to collaborators Jul 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants