Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: bytes: add bytes.Size units #19375

Closed
posener opened this issue Mar 3, 2017 · 13 comments
Closed

proposal: bytes: add bytes.Size units #19375

posener opened this issue Mar 3, 2017 · 13 comments

Comments

@posener
Copy link

posener commented Mar 3, 2017

I think that it would be nice to have unit that could be easily translated to bytes.

  1. I suggest to add a bytes.Size of type int64 similar to time.Duration type.
  2. Similarly to the time package, exposing time.Second for example, it would be nice to have bytes.Megabyte for example and other units conversion.
  3. Similarly to the time package, translate from string 10mb to the right amount of bytes.Size, such as 1m will be translated to the right amount of time.Duration.

Why?

  • Translation for config/ user interaction: I see in a lot of places, that people have configs such as CACHE_SIZE_MB, expecting an int. If the proposed units would have exists, this config could be CACHE_SIZE expecting a string, which will be converted easily to bytes.
  • A lot of places have those units configured for themselves, and have to have the conversion logic, it would be nice to have it inherent in the language. It could also reduce bugs in go code, resulting in errored conversion (dividing by 1000 instead of shifting by 10), and using inappropriate types (such as int instead of int64).
  • In future go1 incompatible versions, it would be nice to convert interfaces such as Writer to return a bytes.Size type and not an int, which will be easily converted and used.
  • It will make go programming more fun :-)

I know that it is not like the time package in the sense that you want to be able to change the basic units, but I think that, even though, it will be very convenient to have such units.

Updates

  • Updated proposed bytes.Size type according to @ianlancetaylor suggestion not to use uint because of it's behavior around the zero value.
@mvdan mvdan added the Proposal label Mar 3, 2017
@mvdan mvdan added this to the Proposal milestone Mar 3, 2017
@a8m
Copy link
Contributor

a8m commented Mar 3, 2017

Many of the projects I'm working on contain the same code as here. and a quick search on Github shows that it is a common case.

In general, I'm in favor of adding something like this as bytes.Size.
But it feels more natural if it will be used by other std packages. for example: os.FileInfo.Size, runtime.MemStats.Alloc. so I'm not sure how it fits with Go 1.x.

@ianlancetaylor
Copy link
Contributor

bytes.Size seems ambiguous to me. It sounds a lot like the C/C++ type size_t. But in Go, that type is spelled int. Go is written such that int can represent the size of all types used by the language. That is a good thing, as it's better to work with a signed type than an unsigned one (http://www.airs.com/blog/archives/327).

So when you suggest that the Write method of io.Writer should return an unsigned type, that is going against an explicit decision made by the language. That isn't going to happen. The maximum size of the []byte passed to Write can be represented with int, so it's fine that Write returns int.

Similarly, we're not going to change os.Fileinfo.Size to return an unsigned type. But that case does return int64 rather than int, so that we can represent a full range of file sizes on a 32-bit system.

I think that leaves this proposal with the idea of defining bytes.Megabyte and so forth. In Go those are normally done with shifts, as @a8m mentions. There is a simple, easy to remember, algorithm: shift by 10 to get kilobytes, shift by 20 to get megabtes, shift by 30 to get gigabytes.

So in the end I'm not sure this proposal carries its weight.

@posener
Copy link
Author

posener commented Mar 3, 2017

@a8m it is interesting to see in the Github search link that you gave how many cases the definition or conversion is incorrect, inaccurate, or inefficient: base 10 counting, division instead of shifting, floating point type, and so on.

@ianlancetaylor choosing int, uint or unit64 is truly an interesting discussion.
First, from your post, it is interesting to see that in the go library, two different different were chosen to represent the same 'thing', don't you think?

Second, from your reference, I don't understand the following sentence:

Second, Go intentionally discards half of memory, and takes the philosophy that if you want a container which can hold more values than fit in a signed int, you should write a special purpose large container.

Two questions regarding this argument:

  1. You are talking about a unit to represent memory, there are other cases that you would like to use bytes such as disk sizes, file sizes, network bandwidth and so on. Also, 'having a container' is only one usage, sometimes you might want to represent a size. Does the claim still relevant?
  2. What is the reason of 'intentionally discarding'?

My opinion in this matter, if you just need to choose a type for counting bytes, I don't see any concrete reason not to choose uint64.

@ianlancetaylor
Copy link
Contributor

@posener I don't agree that the Go library uses two different types to represent the same thing. The Go library consistently uses int to represent the size of an object in memory. It consistently uses the int64 to represent the size of a file in the file system. Those are two different things.

The POSIX C library makes the same distinction. The size of an object in memory is size_t, and the size of a file in the file system is off_t.

Second, Go intentionally discards half of memory, and takes the philosophy that if you want a container which can hold more values than fit in a signed int, you should write a special purpose large container.

Two questions regarding this argument:

You are talking about a unit to represent memory, there are other cases that you would like to use bytes such as disk sizes, file sizes, network bandwidth and so on. Also, 'having a container' is only one usage, sometimes you might want to represent a size. Does the claim still relevant?

As discussed above, memory sizes are not the same as disk sizes or file sizes, and there is no reason to use the same type for both.

What is the reason of 'intentionally discarding'?

I'm sorry, I'm not sure precisely what you are asking. Go is choosing to use a signed integer to represent the size of an object in memory, and therefore objects that use up more than half of available memory can not be represented. This choice is made because signed values are easier to work with than unsigned values, for the reason I mentioned in the blog post: they don't have odd behavior around 0.

@ianlancetaylor
Copy link
Contributor

In particular, with regard to the type used for values in memory and file sizes, I will note that these days it's easy to create files that are far too large to fit in memory, especially on 32-bit systems.

@griesemer
Copy link
Contributor

Independent of the merits of this proposal, putting such a type in package bytes seems like the wrong place. The main affinity of the proposal with that package is the name - less so actual functionality. It seems that if we had such a type, it would have to be "below" package bytes.

I do see the usefulness of such a type for flags and human-readable input (flags again) or output. Perhaps it would suffice to have strconv have functionality that understands sizes (convert from/to size units, with the result always being an int representing bytes). That same conversion code would be used by flags (which might have a new flag.Size function).

@posener
Copy link
Author

posener commented Mar 5, 2017

Thanks @ianlancetaylor , updated the proposal from unit64 to int64.
@griesemer , I think that the time.Duration is a beautiful part of the language. I think about the similarity between the two problems, and really wish that the language had the same API for byte sizes.
about the level of the code, wouldn't you define the level according to the code dependency. So it it fits with it dependencies to be in the bytes package, it is good enough?

@griesemer
Copy link
Contributor

@posener I agree about time.Time and time.Duration. I'm somewhat worried about giving a byte size, which is much more frequent than time.Time or time.Duration, and which has often a more fluid interpretation (as size, offset, alignment, delta, etc.), the same treatment. I'm afraid it will make a lot of "straight-forward" code rather clunky.

And even if we did, putting it into the bytes package would pull in that package when it's not needed at all.

@rsc
Copy link
Contributor

rsc commented Mar 6, 2017

time.Duration exists because there are APIs in package time that need to take a time.Duration.
I don't see any APIs in package bytes that would take a bytes.Size.

@rsc
Copy link
Contributor

rsc commented Mar 13, 2017

go doc bytes says "Package bytes implements functions for the manipulation of byte slices." So that's clearly not the right package.

time.Duration solved the problem that there were many reasonable base units for times: maybe float64 seconds, maybe nanoseconds, maybe microseconds, and so on. In contrast, there are not really many choices for byte counts: they're essentially always in the unit "single bytes".

We also can't retrofit this into existing APIs like io.Reader because that would break compatibility. But even if we could, it's not clear that it pulls its weight. io.Reader in particular returns int because you want to allow 'return len(p)' without a type conversion. If it returned io.ByteSize then you'd need a conversion and I guess to avoid being unit-dependent it would have to be something strange like (io.ByteSize is the type, io.Byte is a predefined constant).

return io.ByteSize(len(p)) * io.Byte

(to parallel time.Sleep(time.Duration(n)*time.Second)).

Or I guess you could assume that io.ByteSize's unit is bytes, like size_t in C, and drop the multiply. But then it's still extra conversions.

It seems like a fun API to design but not necessarily an important one to have.

-rsc for @golang/proposal-review

@posener
Copy link
Author

posener commented Mar 14, 2017

Thanks @rsc for the very descriptive decline! I appreciate that.
What about adding the strconv and flag functions proposed by @griesemer above?

I do see the usefulness of such a type for flags and human-readable input (flags again) or output. Perhaps it would suffice to have strconv have functionality that understands sizes (convert from/to size units, with the result always being an int representing bytes). That same conversion code would be used by flags (which might have a new flag.Size function).

@rsc
Copy link
Contributor

rsc commented Mar 14, 2017

Package flag already lets one define custom flag implementations by implementing flag.Value. We'd want to see that this is something lots of people need and are already doing before thinking about promoting a canonical implementation to the standard library.

@posener
Copy link
Author

posener commented Mar 14, 2017

As was referenced in this thread before (a message that was deleted), there is the go-humanize package. A gopher could use this one for now, but how would you know if that is "something lots of people need"? :-)
Thanks

@golang golang locked and limited conversation to collaborators May 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants