New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: spec: change int to be arbitrary precision #19623
Comments
I'm a big fan of this, myself. It would elevate In many cases (loop iteration variables) the compiler may be able to prove that an |
Let's put this and #19624 in the Thunderdome! Two proposals enter, one proposal leaves... |
A minor related point about this: The |
Representing an int in a single machine word will be tricky. We run across the same problem we had with scalars being in direct interfaces - the GC sees a word which is sometimes a pointer and sometimes isn't. |
Actually, I think they're completely compatible. I recommend both! |
Could we fix that with better stack or heap maps? Instead of each word being "either a pointer or a non-pointer", it would be "a pointer, a non-pointer, or an int". I suppose that would require two bits instead of one per address, though ― might bloat the maps a bit. FWIW, the ML family of languages worked around that issue by making the native types |
@bcmills Yes, two bits per word should work. That just buys us more flexibility to put the distinguishing bits somewhere other than the top bits of the word - not sure if that is worth it. |
I love this proposal in abstract, but I'm very concerned about the performance impact. I think it's not by chance that "no language in its domain has this feature". If we use bound-checking elimination has a similar problem, Go compiler isn't very good at it even nowadays, it basically just handles obvious cases, and doesn't even have a proper VRP pass (the one proposed was abandoned because of compile time concerns). Stuff like a simple multiplication would become a call into the runtime in the general case, and I would surprised if the Go compiler could avoid them in most cases, if we exclude obvious cases like clearly bounded for loops. |
@rasky Languages likes Smalltalk and Lisp (and more recently, JavaScript) have pioneered the implementation of such integers - they can be implemented surprisingly efficiently in the common case. A typical implementation reserves the 2 least significant bits as "tags" - making an One way of using the tag bits is as follows: Given this encoding, if we have two It might be worthwhile performing a little experiment where one generates this additional code for each integer addition, using dummy conditional branches that will never be taken (or just jump to the end of the instruction sequence) and see what the performance impact is. |
Not a fan of this proposal - currently its quite simple to argue about resulting code and its performance characteristics when doing simple arithmetics. Also - even if losing two bits on 64 bit platform is not important, on 32 bit one it is. Maybe we could have an arbitrary precision ints implemented in new built-in type (like we do with complex currently)? |
Can you discuss how such Encoding to JSON should be easy and map really well. As far as I know, JSON spec does not place restrictions on size of numbers, so a really large {"number": 12312323123131231312312312312312321312313123123123123123123} Would map to an What about something like |
Re: 3, note that the compiler's import/export format already encodes arbitrary-sized integer values because of precise constants. |
@shurcooL @griesemer I believe encoding/gob already uses a variable-length encoding for all integer types. |
Go should have an arbitrary precision number type that is more convenient than math.Big. That type should not attempt to masquerade as int/uint, as these aren't just used semantically as "number" but more so as "number compatible with c/foolang code that uses the natural local word size". The root problem here is that golang's design prevents a library from defining an arbitrary precision type that is as semantically and syntactically convenient as a type provided by the runtime/compiler with internal knowledge and cludges. Solve that problem with golang 2.0, or instead we will find ourselves years from now with countless ad hoc accretions like this. Edit: I'm a fan of this design/feature in scripting languages. I don't see how it works as the base int type in a systems language to replace c/c++/java. I absolutely think we should have a great and convenient arbitrary precision number type, and think the road to that is a golang where library defined types are not second class to ad hoc boundary flaunting extensions of the runtime and compiler provided types. |
It's true that one of the roles of Perhaps more importantly, C APIs don't actually use |
It does matter. People make design decisions concerning the size of the address space and how indexes into it can be represented in serializations or other transformed values. Making the type that spans the address space arb precision offloads many complexities to anyone that wants to ship data to other systems. What does a c abi compatible library consumer of golang look like in a world where int/uint is arb precision? Is it really better if the golang side is blind to any issue at the type level and the c side has no choice but panic? I do see the value of these types, I just don't want them conflated with int/uint. I'm entirely in favor of a numeric tower in golang being the default numeric type, I just don't want it to pretend to have the same name as the 32bit/64bit machine/isa determined types. |
Can you give some examples? I'm having a hard time thinking of anything other than a syscall that spans a process boundary but should be sized to the address space, and programs using raw syscalls generally need to validate their arguments anyway.
The same thing it looks like today: using |
@robpike |
I want to capture that in the types, not in a dynamic value range check. I don't think this is an unreasonable expectation of a language that markets itself as a replacement for c/c++/java for systems programming. |
I honestly thought it's 11 days more than it really is. I don't want to lose normal I have nothing against adding an arbitrary precision |
What about floats? JSON can have values like Personally, I would keep |
Please do not make int operation slower by default. I do not need some like |
I don't understand the performance concerns - even if the performance hit would be noticeable in some scenarios, could you not use sized types like |
I am 100% for this proposal. Worrying about the performance impact of arbitrary precision ints is like worrying about the performance impact of array bounds checks, in my opinion. It's negligible if implemented correctly. I also like this because of how inconvenient the big package is and how I always fall back to Python when I'm dealing with big integers. |
I think the complex - yet common - case is going to be the function that has 'int' parameters or an 'int' result that calls out to other functions. The chain of complexity in determining where a bounded int could be used instead of unbounded one will cause any optimizations to be futile except in the most trivial cases (e.g. an 'int' used as a loop counter). This is not typical 'range value propagation' because by definition an 'int' will be unbounded so it has infinite range so many of the techniques in this space cannot be applied, furthermore, vrp is designed mainly for static analysis which does not help in the above cited general case. A decent summary:
This is not going to help in the common case cited above. |
If that was actually what is happening then most computations performed by apps in C/C++ would already be needing arbitrary precision integers. The fact/evidence is: most apps work fine with just 32-bit or 64-bit integers and the 32/64-bit limits are rarely exceeded. The chain of complexity is much smaller than what is being suggested above, optimizations aren't futile and are applicable to complex cases. |
They work fine because that is the only option - and the compiler knows that. So the worst case scenario is a default platform int size which is 32/64 bits. With arbitrary sized ints there is no maximum - so the int receiver/return value must support an arbitrary sized int for a generalized function. As Ian points out - you could compile two different functions - but this quickly breaks down when the function calls another - you would have two functions for every in the chain AND need to ensure that at every step the bounds were not exceeded - which becomes extremely difficult if not impossible. This can be "efficiently" implemented using a flag bit, but as I pointed out, you will need a mask and branch at a minimum (with unbounded or unable to determine to also require slow path function calls). This will be the default except for the trivial locally bounded case (loop index). Consider an arbitrary function call that multiplies two 'int's. It cannot possibly know that every call will always pass two 'ints' that when multiplied will fit in a return int64 value (e.g. the input comes from a file). Even compiling two function calls won't work, because the calling code would need additional checks to understand that the called function might overflow so to call the other. |
as another data point, you can look at BigInteger in Java. It uses a highly efficient storage (typically a native int), and even with Java's dynamic inlining and intrinsics - it is still orders of magnitude slower than using a native int (the object overhead can be factored out by comparing int/Integer). You can compare ints/bigint in Go because most cases outside of loop indexes are going to degenerate and force the use of bigint. I think that is a bad tradeoff for having developers use a bigint when they need "big ints". Most likely what everyone will end up doing is using int64 for everything - so all you will have done is changed the names and added verbosity... |
This has been mentioned. In particular, the thesis is that this is not "the most trivial case", but the only important one. Again, I would like to urge you to stop speculating on the performance of this, until we have actual data to talk about. If the performance impact turns out to be prohibitive, we won't do it. The argument should thus be assumed under consideration. |
Do you have any evidence to support that? Because I believe we have pretty strong evidence that it won't be the case, by analogy to slice bounds-checking. Bounds-checks can be eliminated using various techniques. You can a) do things like This analogy is apt, because bounds-checks elimination and this proposal would use very similar optimization techniques and have very similar overhead in practice. It stands to reason, that people will react similarly to their cost. Again, for the actual performance impact, we'd have to look at an actual implementation first. But I believe the evidence is thin, that Go programmers tend to work around language safeguards en-masse to avoid their performance impact. |
That could very well be true. My Go exposure is in the systems and financial spaces and I think (right or wrong) the tendency is to optimize early (prematurely?).
I agree that the problem is similar to bounds checks - but the compiler is already pretty good at eliminating those - because they are local. The complexity of this change is going to occur at the call site - and my hunch (just a) is that this is going to prove to be very difficult to statically optimize - leading to a function that always needs type checks and conversions.
… On Jan 27, 2022, at 12:25 AM, Axel Wagner ***@***.***> wrote:
Most likely what everyone will end up doing is using int64 for everything
Do you have any evidence to support that? Because I believe we have pretty strong evidence that it won't be the case, by analogy to slice bounds-checking.
Bounds-checks can be eliminated using various techniques. You can a) do things like _ = a[42] at the top of your function, to consolidate them into a single check, or b) use unsafe.Pointer and arithmetic to avoid them altogether, or c) use -B when compiling, to have the compiler not emit them. Yet, in actual practice, we don't commonly see any of these techniques applied. The first two are used occasionally, in performance-sensitive code, but even there, they tend to be carefully applied not to leak out of the API boundary of the respective package. You just don't see code which accepts a struct{ p unsafe.Pointer; n int } instead of a slice.
This analogy is apt, because bounds-checks elimination and this proposal would use very similar optimization techniques and have very similar overhead in practice. It stands to reason, that people will react similarly to their cost.
Again, for the actual performance impact, we'd have to look at an actual implementation first. But I believe the evidence is thin, that Go programmers tend to work around language safeguards en-masse to avoid their performance impact.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you were mentioned.
|
What would the memory layout of I also second with @ianlancetaylor comment that this proposal should go with the idea that overflows result in panics #19623 (comment) Coming from a physics background, fixed size ints, bitwise syntax, and endianess has been the three most rediculous nonsense in programming. Imagine how much readable this line would be with this proposal https://cs.opensource.google/go/go/+/refs/tags/go1.18.1:src/image/draw/draw.go;l=290 I also suggest that image processing to to be a great place to stress test performance issues of this proposal. |
What do you mean by "computers are slow only when there are loops"? CPUs in the near future will be able to process multiple iterations of a small loop in parallel in a single clock cycle, which is a natural consequence of the CPU being able to predict&execute multiple branch instructions per clock cycle and being able to execute multiple load-store instructions per cycle. |
Arguably. I think there is a difference, in that a
I find bit-shift syntax significantly easier to understand, if that's what you want. That is, I find |
@atomsymbol Without a loop, a computer will terminate fast. It takes a loop to slow it down. Asserting that a slow program must have a loop, as I have said many times, is not the same as saying a loop cannot be fast. |
(1) Does an event-loop count? Each iteration of an event-loop might be computing something very different from other iterations of the event-loop. (2) @robpike The slowest computations on any machine are searches through vast spaces. Some purely-recursive search algorithms do not contain any loops, not counting the CPU's own/internal fetchInstruction-loadInputs-executeInstruction-saveResults loop. Thus, the claim that "a slow program must [by necessity] have a loop" is false. |
@atomsymbol ISTM that discussion is off-topic here. The statement "computers are only slow in loops" is certainly pithy, but I think we all understand what it means and the limits of its accuracy. It's not helpful to nitpick it here. |
@Merovius I believe in the existence of what I read with my own eyes. I don't believe in the existence of a hypothetical object indirectly mentioned in a text. In other words, if you truly knew "what it means" and understood "the limits of its accuracy" then you would have posted a formal math-like description of the meaning and of the limitations, or would have posted a link to such a description. Sorry for being so direct/harsh. I will try to avoid posting another message on the topic of the relationship between program slowness and loops, because this discussion is about arbitrary precision integers. |
This comment was marked as resolved.
This comment was marked as resolved.
If this proposal is implemented, would it be done in a way that existing packages would retain the existing behavior? If yes, is it known what that approach would be? I assume that it wouldn't be possible to use the go.mod go directive approach (described in this talk by @rsc: https://youtu.be/v24wrd3RwGo) because the int values have to be passable between packages? Or, would this be a change that could be applied to all programs without a compatibility feature gating access to it, because for all programs the int and uint types already have a flexible bit size depending on the machine? |
Given that |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as spam.
This comment was marked as spam.
Re |
Is this proposal still being considered? How does it play with the decision that there will be no Go2? |
In order to be backwards compatible, it would require adding a new type ( |
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as duplicate.
How does this statement impact this proposal?
I'm wondering the same in light of Go 1.21's new forward compatibility features. A few people have suggested a distinct For cases where arbitrary precision math is important, what @griesemer outlined with 2 reserved bits should be faster than the current If that is possible to build as a library, we would be one step closer to making this proposal a reality. The further step of providing a built-in type in the language would make arbitrary precision math more ergonomic for complex calculations. As a thought experiment, let's imagine we do that.
Then, would it be possible, down the road, to modify the type inference to make If that change was restricted to a module based on If this is feasible, the transition could be done if/when we are satisfied with the performance of the If it turned out to work really well, then perhaps we could even deprecate I'm sure there are things I haven't thought of, so I look forward to your thoughts. |
It means it is extremely unlikely to happen. The value of the proposal only really becomes realized if we'd change how
I don't think it is practical to build that as a library. It requires interaction with the garbage collector, which has to know which memory contains pointers and which doesn't. So, at the very least, the compiler has to "magically" recognize if you use that type and insert instrumentation for the GC. Given that you could then also do
I don't think so. For example func main() {
for v := (1<<63)-1; v > 0; v++ {
}
} Currently (when
I agree with @robpike that a lot (probably most) of the value comes from redefining what |
Thanks @Merovius. I think there may still be some value in arbitrary precision integers being first class as a separate type (e.g. If this proposal isn't feasible, here's hoping for checked integers. #30613 |
An idea that has been kicking around for years, but never written down:
The current definition of
int
(and correspondinglyuint
) is that it is either 32 or 64 bits. This causes a variety of problems that are small but annoying and add up:int
typeint
values can overflow silently, yet no one depends on this working. (Those who want overflow use sized types.)I propose that for Go 2 we make a profound change to the language and have
int
anduint
be arbitrary precision. It can be done efficiently - many other languages have done so - and with the new compiler it should be possible to avoid the overhead completely in many cases. (The usual solution is to represent an integer as a word with one bit reserved; for instance if clear, the word points to a big.Int or equivalent, while if set the bit is just cleared or shifted out.)The advantages are many:
int
(anduint
, but I'll stop mentioning it now) become very powerful typeslen
etc. can now capture any size without overflowint
without ceremony, simplifying some arithmetical calculationsMost important, I think it makes Go a lot more interesting. No language in its domain has this feature, and the advantages of security and simplicity it would bring are significant.
The text was updated successfully, but these errors were encountered: