Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Immutable data #37303

Open
embeddedgo opened this issue Feb 19, 2020 · 34 comments
Open

proposal: Immutable data #37303

embeddedgo opened this issue Feb 19, 2020 · 34 comments
Labels
LanguageChange Proposal v2 A language change or incompatible library change
Milestone

Comments

@embeddedgo
Copy link

embeddedgo commented Feb 19, 2020

This issue describes language feature proposal to immutable data.

There are more general proposals for Go 2 that postulate changes in the language type system:

Support read-only and practical immutable values in Go

Read-only and immutability

Immutable type qualifier

This proposal isn't as general as the ones mentioned above and focuses only on the data embedded in the code as in the example below (taken from the unicode package):

var _C = &RangeTable{
        R16: []Range16{
                {0x0000, 0x001f, 1},
                {0x007f, 0x009f, 1},
                {0x00ad, 0x0600, 1363},
                {0x0601, 0x0605, 1},
                {0x061c, 0x06dd, 193},
                {0x070f, 0x08e2, 467},
                {0x180e, 0x200b, 2045},
                {0x200c, 0x200f, 1},
                {0x202a, 0x202e, 1},
                {0x2060, 0x2064, 1},
                {0x2066, 0x206f, 1},
                {0xd800, 0xf8ff, 1},
                {0xfeff, 0xfff9, 250},
                {0xfffa, 0xfffb, 1},
        },
        R32: []Range32{
                {0x110bd, 0x110cd, 16},
                {0x1bca0, 0x1bca3, 1},
                {0x1d173, 0x1d17a, 1},
                {0xe0001, 0xe0020, 31},
                {0xe0021, 0xe007f, 1},
                {0xf0000, 0xffffd, 1},
                {0x100000, 0x10fffd, 1},
        },
        LatinOffset: 2,
}

The problems this proposal tries to solve

  1. If a package exports some data (explicitly or implicitly) that is intended to be immutable there is no way in the current language specification/implementation to ensure immutability or to detect that some faulty code changes the exported data.

  2. In case of microcontroller based embedded systems the mutable data is copied from Flash to RAM at the system startup. In such systems there is a very little RAM because the immutable parts of the program (text and read-only data) are intended to be executed/read by the CPU directly from Flash. There is no way in the current language implementation to leave the immutable data in Flash which causes that the available RAM overflows very quickly as you import more packages.

Language changes

This proposal doesn't require changes to the language specification. It can be implemented by adding a new compiler directive as in the example bellow:

//go:immutable
var _C = &RangeTable1{
         R32: []Range32{
                {0x0000, 0x001f, 1},
        },
}

Edit: There is another syntax proposed that requires change in the language specification:

const var _C = &RangeTable1{
         R32: []Range32{
                {0x0000, 0x001f, 1},
        },
}

Unlike const x = 2 the const var y = 2 allows to take address of y.

Implementation

The go:immutable directive should make the variable and any composite literals used to construct it immutable. The compiler should return an error if the data on the right hand side cannot be generated at the compile time. Immutable data should be placed in .rodata section.

The go:immutable directive can be documented as a hint directive that may or may not be implemented by the compiler, the hardware or the operating system.

An immutability violation is detected at runtime and cause the program abort. The detection relies on the operating system which usually uses read-only pages for read-only sections. In case of embedded systems the immutability violation can be detected by hardware and generate an exception.

Design decision argumentation

Using the compiler directive instead of new keyword or an existing keyword combination like const var has the advantage that it doesn't introduce any changes to the language specification. If the more general approach for immutability will be developed the directive can be easily removed from the compiler specification.

Tests

I've done some tests simulating the go:immutable directive at the linker level by adding the following code to the Link.dodata function:

for _, s := range ctxt.Syms.Allsym {
        if strings.HasPrefix(s.Name, "unicode..stmp_") {
                s.Type = sym.SRODATA
        }
}

It moves to the .rodata section all "static temp" symbols from the unicode package that correspond mainly to the composite literals used to initialize global variables. The impact on the code generated for simple Hello, World! program:

package main

import "fmt"

func main() {
	fmt.Println("Hello, World!")
}

is as follow:

before:

   text    data     bss     dec     hex filename
 883610   58172   11128  952910   e8a4e helloworld.elf

after:

   text    data     bss     dec     hex filename
 931847    9700   11128  952675   e8963 helloworld.elf

As you can see about 48 KB have been moved from the data segment to the text segment and they are all from unicode package only. It isn't impressive from OS capable system point of view but it's a game changer in case of microcontroller based embedded systems which rarely have more than 256 KB of RAM.

Impact on the existing code

Introducing go:immutable directives for immutable data in the standard library and other packages shouldn't affect the correct code in any way. The faulty code can stop work.

Additional explanation

See additional explanation below which is also an example of using const var instead of //go:immutable.

@gopherbot gopherbot added this to the Proposal milestone Feb 19, 2020
@ianlancetaylor
Copy link
Contributor

I don't see the advantage of using a magic comment over using const as suggested in #6386.

@embeddedgo
Copy link
Author

If you have:

const A = 3

const B int = 3

const var C int = 3

//go:immutable
var D int = 3

you can't take address of A or B but you can in case of C or D.

I agree that a new compiler directive is something magic but it does its job and has the advantage that it doesn't introduce changes in the language specification.

In my opinion, const and const var suggests that the immutability is guaranteed by the language specification and should be ensured at compile time what would be desirable but not required by this proposal.

But of course the const var can also be considered because it can do its job and doesn't introduce any new keyword.

@davecheney
Copy link
Contributor

davecheney commented Feb 20, 2020 via email

@embeddedgo
Copy link
Author

This code will compile but the write access will be detected at runtime and the program will abort with stack trace (the OS will send SIGSEGV or SIGBUS).

@davecheney
Copy link
Contributor

This code will compile but the write access will be detected at runtime and the program will abort with stack trace (the OS will send SIGSEGV or SIGBUS).

How will this happen, because the data is in a page marked read only?

What about this case

const var C = new(int)

func f(p *int) { *p++ }

is this permissible ?

f(C)

@embeddedgo
Copy link
Author

The immutable data will be placed by compiler in .rodata section which is a part of TEXT segment. The most (all?) of current operating systems make the TEXT segment read-only using read-only pages for it. This was described in the first post.

The example code you provided is some corner case of this proposal.

Let's show more general example:

type S struct {i int}

const var (
        A = &S{}
        B = new(S)
)

The A = &S{} definitely should compile and both A and S{} should be placed in .rodata section.

But what about B = new(S)? Should the compiler treat new(S) as the other form of &S{0} or simply return an error because there is a function call on right-hand side? I opt for the second choice.

@davecheney
Copy link
Contributor

davecheney commented Feb 20, 2020 via email

@embeddedgo
Copy link
Author

Yes it is. The write access will be detected at runtime and the program will abort.

@davecheney
Copy link
Contributor

davecheney commented Feb 20, 2020 via email

@embeddedgo
Copy link
Author

Why it must go on the heap? In case of immutable object it can definitely be placed in read-only data section. The f() will always return the same address.

@davecheney
Copy link
Contributor

davecheney commented Feb 20, 2020 via email

@embeddedgo
Copy link
Author

embeddedgo commented Feb 20, 2020

The const var N = &n will not compile because the compiler cannot generate the whole data on the right hand side at compile time.

But your examples shows the weakness of the const var syntax. You treat it as a thing guaranteed by the language specification but the original proposal was about a hint for the compiler to place some data intended to be immutable in read-only section. I mentioned this in the first post. This is why I prefer //go:immutable over const var.

@embeddedgo
Copy link
Author

I was wrong in the first part of the previous answer. Your example code is valid. The n is a global variable and the relocation on the right hand side can be determined at compile time. So yes a and b are the same address. This proposal in the current form doesn't forbid it.

@davecheney
Copy link
Contributor

If a and b have the same address, are there contents equal?

It sounds like you’re proposing C’s static storage class. I don’t think that is what most people think of when they think immutable nor something which makes a lot of sense in a multi threaded program.

@embeddedgo
Copy link
Author

I don't understood your point in the case of the last example. The compiler can simply optimize out const var N = &n and directly return &n.

const var N = &n means the N is an immutable object so the compiler can avoid dynamic allocation for it and can place it in the read-only memory. It doesn't require the n must be constant but its address must be.

The proposal is about the data embedded in the code that are intended to be immutable. Nothing more. I mentioned that i don't like the const var syntax because it can confuse people. It seems I was right.

@embeddedgo
Copy link
Author

embeddedgo commented Feb 21, 2020

After some time with const var I've got used to it. If you read it as an addressable immutable object it shouldn't confuse anyone. I use the word object instead of variable because a variable means something mutable by definition.

The following code:

const var A = &S{1, &C{2}, &n}

is a condensed form of the following declaration:

const var (
    A = &s
    s = S{1, &c, &n}
    c = C{2}
)

It should be read as:

  1. A is an immutable object of type *S. The compiler must check that the whole data on the right hand side of its declaration can be determined at compile time. The const var is also a hint for the compiler to place A in the read-only section of memory. It's also information for the programmer that he/she shouldn't change the A.

  2. The compiler can also place S{1, &c, &n} and C{2} in read-only memory. The programmer shouldn't change their values.

  3. n must be a global variable or a local immutable object to allow the compiler to determine its address (relocation) at compile time.

  4. The immutability isn't guaranteed by the compiler. It may or may not be checked at runtime.

  5. The other code has no idea that A is immutable and points to immutable object. So its value or pointer to it can be used everywhere where *S or **S can be used.

You can use const var to declare immutable objects in the function body:

var n int

func f() *S {
    const var s = S{1, &n}
    return &s
}

and it's equivalent of:

var n int

const var s = S{1, &n}

func f() *S {
    return &s
}

but in the first case s isn't directly accessible outside f body.

@rsc rsc added the v2 A language change or incompatible library change label Feb 26, 2020
@ianlancetaylor
Copy link
Contributor

If we are going to permit taking the address of an immutable value, we must prohibit changing that value through to the pointer. Nothing else makes sense. We can't say it "may or may not be checked at runtime." If we can't detect the invalid change at compile time, we must panic at run time. And that means we need some way to implement that efficiently, which seems hard.

If we can't take the address of an immutable value, then this is essentially the same as #6386.

@embeddedgo
Copy link
Author

embeddedgo commented Mar 12, 2020

If we are going to permit taking the address of an immutable value, we must prohibit changing that value through to the pointer. Nothing else makes sense. We can't say it "may or may not be checked at runtime."

The original proposal postulated using a compiler directive so it made sense to say that the compiler may ignored it without affecting the correct code. I agree that in case of const var which is a language change the spec can be more strict.

If we can't detect the invalid change at compile time, we must panic at run time. And that means we need some way to implement that efficiently, which seems hard.

I think that without introducing something like const vars as function parameters and struct/maps/slice elements the general detection at compile time is impossible. I don't know if I would like to see such big change in the language.

My intention was to introduce very lightweight change to the compiler behavior that leaves some data in the read-only section of memory. This allows to use OS and hardware mechanisms to detect invalid changes efficiently just as it's done in the case of nil pointer dereference.

@ianlancetaylor
Copy link
Contributor

The proposed go:immutable directive can only work for package-scope variables. The only difference I see between this and permitting const for package-scope variables, as in #6386, is that the go:immutable directive would permit taking the address of an immutable variable, whereas we probably don't want to permit taking the address of a const variable. The go:immutable directive would then put the variable's value in read-only memory.

One concern here is that some variable initializers must be constructed at run time, such as code like

//go:immutable
var V = F()

How should that be implemented?

I understand the desire to have a simple mechanism to put values into read-only memory, but I think that to be useful it needs to be consistent and simple and completely reliable. I don't yet see how this proposal accomplishes that.

@embeddedgo
Copy link
Author

One concern here is that some variable initializers must be constructed at run time, such as code like

//go:immutable
var V = F()

How should that be implemented?

This proposal disallows this: "The compiler should return an error if the data on the right hand side cannot be generated at the compile time."

I understand the desire to have a simple mechanism to put values into read-only memory, but I think that to be useful it needs to be consistent and simple and completely reliable. I don't yet see how this proposal accomplishes that.

Two facts:

  1. There are data intended to be immutable in the code that cannot be expressed using the current const construct.

  2. The addresses/references to these data are exported by packages (see: unicode package for example). This allow accidental / intentional change and looks like a certain shortcoming in the language specification.

Questions:

  1. Is there a need to do something about it? We live with it a long time and nothing seems to happen.

  2. If it is, is the checking at runtime enough for us as it's in case of nil pointers?

The solution for Go should be simple, elegant and cheap. That's what this discussion is for.

I agree the //go:immutable directive isn't very elegant. Hence the proposal for const var that can have slightly more general usage.

I've an interest to leave immutable data in the read-only memory because I try to program in Go for systems with very small RAM (< 512 KB). For now I'm dealing with such data in an ugly way at the linker level and for now this is enough to keep things moving forward. I'm fine to live with this for some time in the hope that we will come up with something sensible here.

@deanveloper
Copy link

I don't like the idea of using //go:immutable. Comments should never change semantics, regardless of what they are.

There are some compiler directives in place, and some build flags, however they don't change the semantics of the language.

@bcmills
Copy link
Contributor

bcmills commented Mar 27, 2020

@deanveloper, if you never actually try write to the variable, then the semantics of an immutable variable are indistinguishable from a mutable one. (That is: if the program is correct, then the semantics don't change — the comment only changes the behavior of programs that are not correct to begin with.)

@deanveloper
Copy link

I guess that my opinion is more just that a comment should not dictate whether a program is able to compile or not, whether it is "correct" or not.

ie:

//go:immutable
var x = 5

func main() {
    x = 10
}

This does not compile because of a comment in the code. If we want immutability in Go, then we should introduce it to the language itself, not only into the compiler. People shouldn't have to use a compiler directive just to achieve immutability.

At a high level, people should be able to remove all comments from a program and have it still function the same. At the low level, some compiler directives may be needed in order for a program to function properly. However, I would argue that //go:immutable is a pretty high-level change, and it would make more sense to be a language feature than a compiler directive.

@embeddedgo
Copy link
Author

//go:immutable
var x = 5

func main() {
    x = 10
}

This does not compile because of a comment in the code.

It will be compiled.

It will fail at runtime if the hardware will detect the write to the read only memory.

@embeddedgo
Copy link
Author

Let's forget the syntax for now. There is alternate syntax proposed without a magic comment:

const var x = 5

The question is whether the current lack of the way to make data immutable bothers us at all?

@deanveloper
Copy link

It will be compiled.

I was not aware of this, I had assumed that the compilation would fail. My mistake.

The question is whether the current lack of the way to make data immutable bothers us at all?

It does bother me personally. There have been several instances where I would have liked to define constant structs, slices, etc., which Go of course does not support. My personal solution was to try to introduce constant array/slice/map/struct types, including untyped constants for each "class" of types. Although that proved harder to implement than I originally thought it would once I started writing the proposal.

@ianlancetaylor
Copy link
Contributor

The compiler should return an error if the data on the right hand side cannot be generated at the compile time.

This restriction makes this seem very similar to #6386. Perhaps we should close this one as a dup. The main difference seems to be that this clearly permits addressability, whereas #6386 is less clear on that.

The main goal of this issue seems to be moving initializers into read-only memory. That in itself does not require a language change. In principle, the toolchain could observe that nothing changes a value, and in that case move the value into read-only memory. This is harder for an exported variable, but not impossible.

@embeddedgo
Copy link
Author

The main goal of this issue seems to be moving initializers into read-only memory. That in itself does not require a language change.

The goal is also to detect invalid changes of data intended to be read-only. Using read-only memory is simple way to do that at runtime.

This proposal puts a clear demarcation between constants and read-only variables.

The constants remain unaffected. They are simple scalar values with strings as some exception. You can't take address of constant.

The read-only variables have no restrictions on the internal structure but there are restrictions when it comes to the way of initialization. You can access them by reference and this is a very important part of this proposal because passing by value can be impractical in case of data large in size.

@ianlancetaylor
Copy link
Contributor

I'm not sure that every target that Go supports can reliably put things into read-only memory. Although it may be that the restriction to only constant initializers does permit that.

There is clearly an interest in adding some form of immutable data to Go, as there have been several issues opened about it. I'm not sure that restricting immutable data to constant initializers will be satisfactory to everyone who wants the feature.

@embeddedgo
Copy link
Author

I'm not sure that every target that Go supports can reliably put things into read-only memory. Although it may be that the restriction to only constant initializers does permit that.

If we allow const var to be only hint for the compiler it ceases to be a problem. The benefit is simple semantic and simple implementation.

I follow the #6386 and I see you are considering more strict and more complicate solutions. But be open to some nonstandard approach. Does a hint for the compiler and at the same time an information for the programmer isn't all we need?

The const before var saves you a few words in the documentation of this variable and gives a chance to detect incorrect use by most platforms. That's enough for me. But I understand that any change to the Go language have to be very well thought out and I probably don't see something important. Be sure I don't want to explode our simple language by to many ill-considered features.

There is clearly an interest in adding some form of immutable data to Go, as there have been several issues opened about it. I'm not sure that restricting immutable data to constant initializers will be satisfactory to everyone who wants the feature.

It would be very nice to initialize the variable at runtime and make it read-only at any time. But it smells to me like C++++. Does it fit our language?

@ianlancetaylor
Copy link
Contributor

I don't think it's acceptable to have const var just be a hint. That will let people write programs that work on one platform and then fail in an inexplicable way when moved to a different platform. That is a bad user experience. Sometimes that kind of thing is unavoidable. But we should do our very best to avoid it whenever possible.

@embeddedgo
Copy link
Author

OK. If you want to address this in a better way I waiting for it.

But the current state is:

  1. People can write wrong code that can work on any platform with no chance to detect this early.

  2. You need to document that some variables shouldn't be changed and there is a lack of such information even in standard library (I'm not surprised because continuous documenting such simple thing can be tedious).

I think we've exhausted the topic. Maybe someone can add something. If not, this issue can be closed preferably with a link to it and small summary in the #6386 which seems to be the main thread on this topic.

@bcmills
Copy link
Contributor

bcmills commented Apr 27, 2020

I don't think it's acceptable to have const var just be a hint. That will let people write programs that work on one platform and then fail in an inexplicable way when moved to a different platform.

One option might be to use memory protection on platforms that support it, and best-effort run-time checks on other platforms.

For example, “read-only” objects could be tagged with an additional header containing a checksum, which the collector could re-verify during every mark phase: instead of (just) scanning the object for pointers, the collector could also scan it for mutations.

The race detector could be similarly employed: a “read-only” object effectively has a “reader” at every possible point in time, which could be modeled as a reader in a background goroutine without any other happens-before relationship. Then, testing the package with the race detector enabled would clearly diagnose the source of the erroneous write.

@maodou1990
Copy link

maodou1990 commented May 13, 2020

I think read-only is necessary in some situation.In my project,the situation in which read only params are changed unknowingly has occurred several times.Simplified code likes below.

type Item struct{
    ID int32
    Count int32
}

var rewardA *Item
func init(){
   rewardA = LoadFromExcel()  // load RewardA from some configs, rewardA should not be changed
}
func GetRewardA() *Item{
    return rewardA  // if random value
} 

// file b
func GetReward() []*Item{
    items = make([]*Item,0)
    // complex condition code
    items = append(items,GetRewardB())
   items = append(items,GetRewardA())
   items = append(items,GetRewardC())
   items = append(items,GetRewardD())
   return items
}

// file c
func GetMultipleRewards(mul int32) []*Item{
   items := b.GetReward()
   for _,item := range items{
        item.Count *= mul
   }
   return items
}

In this usual situation,if we call the function "GetMultipleRewards" ,rewardA.Count will be changed unknowingly.It is easy for new golang coder to write this code but it will cost several hours to find the bug.
We have found two methods to avoid the problem.First is to use "[]item" instead of "[]*item".That means the code will copy the slice every time we use it.That is unacceptable.Another is privatization.

type Item struct{
   id int32
   count int32
}

func (i *Item)ID() int32{
  return i.id
}

func (i *Item)Count() int32{
  return i.count
}

But if Item has 100 propertis,we have to write 100 get functions.And this is only one struct,if there are 100,200 or more struct,what happens?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LanguageChange Proposal v2 A language change or incompatible library change
Projects
None yet
Development

No branches or pull requests

9 participants