Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/xml: Data race in xml/encoder due to getTypeInfo and shared slice #9796

Closed
ghost opened this issue Feb 6, 2015 · 13 comments
Closed
Milestone

Comments

@ghost
Copy link

ghost commented Feb 6, 2015

Analysis described in http://stackoverflow.com/a/28374852/723947. For convenience, here's a copy of the analysis:

A user described a data race with the following trace:

2015/02/06 13:53:43  Total GoRoutine Channels Created 2
2015/02/06 13:53:43 <family><famil_name></famil_name><members><person><name>ABCD</name><age>0</age></person><person><name>dummy</name><age>0</age></person></members></family>
==================
WARNING: DATA RACE
Write by goroutine 6:
  runtime.slicecopy()
      /usr/local/go/src/runtime/slice.go:94 +0x0
  encoding/xml.(*parentStack).push()
      /usr/local/go/src/encoding/xml/marshal.go:908 +0x2fb
  encoding/xml.(*printer).marshalStruct()
      /usr/local/go/src/encoding/xml/marshal.go:826 +0x628
  encoding/xml.(*printer).marshalValue()
      /usr/local/go/src/encoding/xml/marshal.go:531 +0x1499
  encoding/xml.(*Encoder).Encode()
      /usr/local/go/src/encoding/xml/marshal.go:153 +0xb8
  encoding/xml.Marshal()
      /usr/local/go/src/encoding/xml/marshal.go:72 +0xfb
  main.ToXml()
      /Users/kadalamittai/selfie/go/src/github.com/ivam/goal/command/data_race.go:51 +0x227
  main.func·001()
      /Users/kadalamittai/selfie/go/src/github.com/ivam/goal/command/data_race.go:61 +0x74

Previous read by goroutine 5:
  encoding/xml.(*parentStack).trim()
      /usr/local/go/src/encoding/xml/marshal.go:893 +0x2ae
  encoding/xml.(*printer).marshalStruct()
      /usr/local/go/src/encoding/xml/marshal.go:836 +0x203
  encoding/xml.(*printer).marshalValue()
      /usr/local/go/src/encoding/xml/marshal.go:531 +0x1499
  encoding/xml.(*Encoder).Encode()
      /usr/local/go/src/encoding/xml/marshal.go:153 +0xb8
  encoding/xml.Marshal()
      /usr/local/go/src/encoding/xml/marshal.go:72 +0xfb
  main.ToXml()
      /Users/kadalamittai/selfie/go/src/github.com/ivam/goal/command/data_race.go:51 +0x227
  main.func·001()
      /Users/kadalamittai/selfie/go/src/github.com/ivam/goal/command/data_race.go:61 +0x74

Goroutine 6 (running) created at:
  main.AsyncExecute()
      /Users/kadalamittai/selfie/go/src/github.com/ivam/goal/command/data_race.go:67 +0x15d
  main.main()
      /Users/kadalamittai/selfie/go/src/github.com/ivam/goal/command/data_race.go:80 +0x2bf

Goroutine 5 (finished) created at:
  main.AsyncExecute()
      /Users/kadalamittai/selfie/go/src/github.com/ivam/goal/command/data_race.go:67 +0x15d
  main.main()
      /Users/kadalamittai/selfie/go/src/github.com/ivam/goal/command/data_race.go:80 +0x2bf
==================

I analyzed this. There's an implicit shared value due to the use of getTypeInfo() which is a type description of the struct. Other parts of the XML encoder assume that they can write and read type info, and this is unsafe, as the race detector is telling us.

The p.stack attribute that's reporting as the source of the data race originates from a part of the typeInfo shared value, where a slice of tinfo.parents gets injected on line 821.

if err := s.trim(finfo.parents); err != nil {

That's ultimately where the sharing is happening with the potential for read and writing, because later on there are appends happening on the slice, and that can do mutation on the slice.

What should probably happen instead is that the finfo.parents should be sliced and capacity-restricted so that any potential append won't do a write on the shared value.

That is, line 821 could probably be be changed from:

if err := s.trim(finfo.parents); err != nil {

to:

parentLen := len(finfo.parents)
if err := s.trim(finfo.parents[:parentLen:parentLen)]); err != nil {

to correct the issue.

@ghost
Copy link
Author

ghost commented Feb 6, 2015

Actually, my "fix" still looks dangerous. We really should be applying the capacity restriction on the initial assignment to s.stack on line 897.

Change:

 897        s.stack = parents[:split]

to:

 897        s.stack = parents[:split:split]

Apologies for not reading this as carefully as I should have.

@jeffallen
Copy link
Contributor

I cannot repro this with Go built from the tip. What Go version are you and the OP using?

@ghost
Copy link
Author

ghost commented Feb 6, 2015

Doing the analysis from Go 1.4. Here's the point where the sharing happens:

  1. The call to getTypeInfo (https://golang.org/src/encoding/xml/marshal.go#L409), which returns global state in typeinfo.go (https://golang.org/src/encoding/xml/typeinfo.go#L44)
  2. The passing of finfo.parents (https://golang.org/src/encoding/xml/marshal.go#L821)
  3. The slice in parentStack.trim() (https://golang.org/src/encoding/xml/marshal.go#L897)

Ultimately, the shared value flows into parentStack.stack, and there's both reads and writes going on. The read from parentStack.trim:

s.stack = parents[:split]

and the write from parentStack.push

s.stack = append(s.stack, parents...)

Technically, I would guess that the append should logically be a no-op on the underlying array, because the parents that got split off should be the ones that are being appended back on, but that's something the data race detector can't possibly know.

@ghost
Copy link
Author

ghost commented Feb 7, 2015

Patch submitted for review in https://go-review.googlesource.com/#/c/4152/

@mikioh mikioh changed the title Data race in xml/encoder due to getTypeInfo and shared slice encoding/xml: Data race in xml/encoder due to getTypeInfo and shared slice Feb 7, 2015
@ghost
Copy link
Author

ghost commented Feb 7, 2015

@jeffallen Hmmm... something is strange and I am confused and need help in tracing this, before submitting the patch.

Let me show what I'm seeing. Here's a reduced program to demonstrate the issue:

package main

import "encoding/xml"

type Family struct {
    Members []Person `xml:"members>member"`
}

type Person struct{}

func ToXml() []byte {
    p := []Person{Person{}, Person{}}
    family := Family{Members: p}
    xml, _ := xml.Marshal(family)
    return xml
}

func AsyncExecute() <-chan []byte {
    ch := make(chan []byte)
    go func() {
        response := ToXml()
        ch <- response
        close(ch)
    }()
    return ch
}

func main() {
    const N = 2
    chlist := make([]<-chan []byte, N)
    for i := 0; i < N; i++ {
        chlist[i] = AsyncExecute()
    }
    for _, ch := range chlist {
        <-ch
    }
}

As of commit bcadab9, which is the last time that src/runtime/slice.go has been touched according to git history, I can still see the data race:

dannyyoo@melchior:~/work/go/src$ git branch
* (detached from bcadab9)
  data-race
  datarace
  master
dannyyoo@melchior:~/work/go/src$ ../bin/go run -race ~/tmp/xml-data-race.go
==================
WARNING: DATA RACE
Write by goroutine 6:
  runtime.slicecopy()
      /usr/local/google/home/dannyyoo/work/go/src/runtime/slice.go:98 +0x0
  encoding/xml.(*parentStack).push()
      /usr/local/google/home/dannyyoo/work/go/src/encoding/xml/marshal.go:908 +0x2fb
  encoding/xml.(*printer).marshalStruct()
      /usr/local/google/home/dannyyoo/work/go/src/encoding/xml/marshal.go:826 +0x5b2
  encoding/xml.(*printer).marshalValue()
      /usr/local/google/home/dannyyoo/work/go/src/encoding/xml/marshal.go:531 +0x1499
  encoding/xml.(*Encoder).Encode()
      /usr/local/google/home/dannyyoo/work/go/src/encoding/xml/marshal.go:153 +0xb8
  encoding/xml.Marshal()
      /usr/local/google/home/dannyyoo/work/go/src/encoding/xml/marshal.go:72 +0xfb
  main.ToXml()
      /usr/local/google/home/dannyyoo/tmp/xml-data-race.go:14 +0x108
[output omitted]

However, when I get to master (ac90d9a), I'm not seeing the reporting of the data race anymore, as you note! But the data race is still logically there, right? Unless I'm misunderstanding something. Did something break in terms of error reporting the data race?

Unfortunately, I ran out of time tonight to binary search exactly at what point things erroneously succeed. I'll try to check it out when I have time during the weekend.

@dvyukov dvyukov added this to the Go1.5 milestone Feb 7, 2015
@jeffallen
Copy link
Contributor

I think you are on to something here. Keep after it when you have a bit of
time. Minimum, you'll make XML better. But possibly you have found an
indication that tip is somewhat broken w.r.t. data race reporting. Sorry I
don't have enough expertise to follow you down this rabbit hole...

On Sat, Feb 7, 2015 at 5:06 AM, Danny Yoo notifications@github.com wrote:

@jeffallen https://github.com/jeffallen Hmmm... something is strange
and I am confused and need help in tracing this, before submitting the
patch.

Let me show what I'm seeing. Here's a reduced program to demonstrate the
issue:

package main

import "encoding/xml"

type Family struct {
Members []Person xml:"members>member"
}

type Person struct{}

func ToXml() []byte {
p := []Person{Person{}, Person{}}
family := Family{Members: p}
xml, _ := xml.Marshal(family)
return xml
}

func AsyncExecute() <-chan []byte {
ch := make(chan []byte)
go func() {
response := ToXml()
ch <- response
close(ch)
}()
return ch
}

func main() {
const N = 2
chlist := make([]<-chan []byte, N)
for i := 0; i < N; i++ {
chlist[i] = AsyncExecute()
}
for _, ch := range chlist {
<-ch
}
}

As of commit bcadab9
bcadab9,
which is the last time that src/runtime/slice.go has been touched according
to git history, I can still see the data race:

dannyyoo@melchior:~/work/go/src$ git branch

  • (detached from bcadab9)
    data-race
    datarace
    master
    dannyyoo@melchior:~/work/go/src$ ../bin/go run -race ~/tmp/xml-data-race.go

    WARNING: DATA RACE
    Write by goroutine 6:
    runtime.slicecopy()
    /usr/local/google/home/dannyyoo/work/go/src/runtime/slice.go:98 +0x0
    encoding/xml.(_parentStack).push()
    /usr/local/google/home/dannyyoo/work/go/src/encoding/xml/marshal.go:908 +0x2fb
    encoding/xml.(_printer).marshalStruct()
    /usr/local/google/home/dannyyoo/work/go/src/encoding/xml/marshal.go:826 +0x5b2
    encoding/xml.(_printer).marshalValue()
    /usr/local/google/home/dannyyoo/work/go/src/encoding/xml/marshal.go:531 +0x1499
    encoding/xml.(_Encoder).Encode()
    /usr/local/google/home/dannyyoo/work/go/src/encoding/xml/marshal.go:153 +0xb8
    encoding/xml.Marshal()
    /usr/local/google/home/dannyyoo/work/go/src/encoding/xml/marshal.go:72 +0xfb
    main.ToXml()
    /usr/local/google/home/dannyyoo/tmp/xml-data-race.go:14 +0x108
    [output omitted]

However, when I get to master (ac90d9a
ac90d9a),
I'm not seeing the reporting of the data race anymore, as you note! But the
data race is still logically there, right? Unless I'm misunderstanding
something. Did something break in terms of error reporting the data race?

Unfortunately, I ran out of time tonight to binary search exactly at what
point things erroneously succeed. I'll try to check it out when I have time
during the weekend.


Reply to this email directly or view it on GitHub
#9796 (comment).

@ghost
Copy link
Author

ghost commented Feb 9, 2015

Ok, I ran a git bisect and traced the data-race detection regression down to ff97962, whose commit message says:

Author: Russ Cox <rsc@golang.org>
Date:   Mon Dec 29 16:26:25 2014 -0500

    cmd/gc: add write barrier for append(slice, slice...)

    Found with GODEBUG=wbshadow=2 mode.
    Eventually that will run automatically, but right now
    it still detects other missing write barriers.

    Change-Id: I5624b509a36650bce6834cf394b9da163abbf8c0
    Reviewed-on: https://go-review.googlesource.com/2310
    Reviewed-by: Rick Hudson <rlh@golang.org>

Ok, this sounds very related to the problem at hand. I don't quite understand the comment in the commit log though.

Let me try "GODEBUG=wbshadow=2" on ff97962 and see if we can detect the data race then...

dannyyoo@melchior:~/work/go/src$ GODEBUG=wbshadow=2 ../bin/go run -race ~/tmp/xml-data-race.go
fatal error: cannot map shadow heap

runtime stack:
runtime.throw(0x8cf370, 0x16)
    /usr/local/google/home/dannyyoo/work/go/src/runtime/panic.go:508 +0x98
runtime.wbshadowinit()
    /usr/local/google/home/dannyyoo/work/go/src/runtime/malloc1.go:240 +0x87
runtime.schedinit()
    /usr/local/google/home/dannyyoo/work/go/src/runtime/proc1.go:125 +0x6f
runtime.rt0_go(0x7fff6676a338, 0x4, 0x7fff6676a338, 0x0, 0x7f33c7edb30a, 0x4, 0x7fff6676c541, 0x7fff6676c54b, 0x7fff6676c54f, 0x7fff6676c555, ...)
    /usr/local/google/home/dannyyoo/work/go/src/runtime/asm_amd64.s:96 +0x116
dannyyoo@melchior:~/work/go/src$ ../bin/go run -race ~/tmp/xml-data-race.go
dannyyoo@melchior:~/work/go/src$

I don't understand what's going on. This seems very tied to the garbage collector.

Can we notify @rsc about this to ask what the right behavior is supposed to be here?

@davecheney
Copy link
Contributor

@DannyYoo for completeness, please include a link to your xml-data-race.go repro.

@ghost
Copy link
Author

ghost commented Feb 9, 2015

xml-data-race.go: http://play.golang.org/p/he6O0wVbj5

@davecheney
Copy link
Contributor

Confirmed on go1.4.1, freebsd/amd64

[dfc@deadwood ~/go1.4]$ bin/go run -race ~/src/x.go
==================
WARNING: DATA RACE
Write by goroutine 6:
  runtime.slicecopy()
      /usr/home/dfc/go1.4/src/runtime/slice.go:94 +0x0
  encoding/xml.(*parentStack).push()
      /usr/home/dfc/go1.4/src/encoding/xml/marshal.go:908 +0x2fb
  encoding/xml.(*printer).marshalStruct()
      /usr/home/dfc/go1.4/src/encoding/xml/marshal.go:826 +0x628
  encoding/xml.(*printer).marshalValue()
      /usr/home/dfc/go1.4/src/encoding/xml/marshal.go:531 +0x1499
  encoding/xml.(*Encoder).Encode()
      /usr/home/dfc/go1.4/src/encoding/xml/marshal.go:153 +0xb8
  encoding/xml.Marshal()
      /usr/home/dfc/go1.4/src/encoding/xml/marshal.go:72 +0xfb
  main.ToXml()
      /home/dfc/src/x.go:14 +0x108
  main.func·001()
      /home/dfc/src/x.go:21 +0x35

Previous read by goroutine 5:
  encoding/xml.(*parentStack).trim()
      /usr/home/dfc/go1.4/src/encoding/xml/marshal.go:893 +0x2ae
  encoding/xml.(*printer).marshalStruct()
      /usr/home/dfc/go1.4/src/encoding/xml/marshal.go:836 +0x203
  encoding/xml.(*printer).marshalValue()
      /usr/home/dfc/go1.4/src/encoding/xml/marshal.go:531 +0x1499
  encoding/xml.(*Encoder).Encode()
      /usr/home/dfc/go1.4/src/encoding/xml/marshal.go:153 +0xb8
  encoding/xml.Marshal()
      /usr/home/dfc/go1.4/src/encoding/xml/marshal.go:72 +0xfb
  main.ToXml()
      /home/dfc/src/x.go:14 +0x108
  main.func·001()
      /home/dfc/src/x.go:21 +0x35

Goroutine 6 (running) created at:
  main.AsyncExecute()
      /home/dfc/src/x.go:24 +0xfc
  main.main()
      /home/dfc/src/x.go:32 +0x7d

Goroutine 5 (finished) created at:
  main.AsyncExecute()
      /home/dfc/src/x.go:24 +0xfc
  main.main()
      /home/dfc/src/x.go:32 +0x7d
==================
Found 1 data race(s)
exit status 66

@dvyukov
Copy link
Member

dvyukov commented Feb 10, 2015

Thanks for investigating this!

I've mailed https://go-review.googlesource.com/4370 with a fix for race detector.

With this change the race in your program is detected:

WARNING: DATA RACE
Write by goroutine 5:
  runtime.slicecopy()
      src/runtime/slice.go:98 +0x0
  encoding/xml.(*parentStack).push()
      src/encoding/xml/marshal.go:908 +0x2fe
  encoding/xml.(*printer).marshalStruct()
      src/encoding/xml/marshal.go:826 +0x62d
  encoding/xml.(*printer).marshalValue()
      src/encoding/xml/marshal.go:531 +0x150e
  encoding/xml.(*Encoder).Encode()
      src/encoding/xml/marshal.go:153 +0xb8
  encoding/xml.Marshal()
      src/encoding/xml/marshal.go:72 +0xfb
  main.ToXml()
      /tmp/xml.go:14 +0x108
  main.func·001()
      /tmp/xml.go:21 +0x35

Previous read by goroutine 4:
  encoding/xml.(*parentStack).trim()
      src/encoding/xml/marshal.go:893 +0x2ae
  encoding/xml.(*printer).marshalStruct()
      src/encoding/xml/marshal.go:836 +0x22d
  encoding/xml.(*printer).marshalValue()
      src/encoding/xml/marshal.go:531 +0x150e
  encoding/xml.(*Encoder).Encode()
      src/encoding/xml/marshal.go:153 +0xb8
  encoding/xml.Marshal()
      src/encoding/xml/marshal.go:72 +0xfb
  main.ToXml()
      /tmp/xml.go:14 +0x108
  main.func·001()
      /tmp/xml.go:21 +0x35

dvyukov added a commit that referenced this issue Feb 12, 2015
typedslicecopy is another write barrier that is not
understood by racewalk. It seems quite complex to handle it
in the compiler, so instead just instrument it in runtime.

Update #9796

Change-Id: I0eb6abf3a2cd2491a338fab5f7da22f01bf7e89b
Reviewed-on: https://go-review.googlesource.com/4370
Reviewed-by: Russ Cox <rsc@golang.org>
@dvyukov
Copy link
Member

dvyukov commented Feb 12, 2015

The fix is submitted, the race should be reproducible on tip again.

@gopherbot
Copy link

CL https://golang.org/cl/12687 mentions this issue.

rsc added a commit that referenced this issue Jul 27, 2015
This race was identified in #9796, but a sequence of fixes
proposed in golang.org/cl/4152 were rolled into
golang.org/cl/5910 which both fixed the race and
modified the name space behavior.

We rolled back the name space changes and lost the race fix.

Fix the race separate from the name space changes,
following the suggestion made by Roger Peppe in
https://go-review.googlesource.com/#/c/4152/7/src/encoding/xml/marshal.go@897

Fixes #9796.
Fixes #11885.

Change-Id: Ib2b68982da83dee9e04db8b8465a8295259bba46
Reviewed-on: https://go-review.googlesource.com/12687
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
@golang golang locked and limited conversation to collaborators Aug 5, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants