Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: SIGBUS occurred during slice append #29208

Closed
templarzq opened this issue Dec 13, 2018 · 21 comments
Closed

runtime: SIGBUS occurred during slice append #29208

templarzq opened this issue Dec 13, 2018 · 21 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@templarzq
Copy link

go version 1.11.1

system: linux centos 7.2.1511

unexpected fault address 0xecc6a02000
148734 fatal error: fault
148735 [signal SIGBUS: bus error code=0x2 addr=0xecc6a02000 pc=0x458641]
148736
148737 goroutine 15317 [running]:
148738 runtime.throw(0x8794cf, 0x5)
148739 E:/software/golang/go/src/runtime/panic.go:608 +0x72 fp=0xc04af8cd50 sp=0xc04af8cd20 pc=0x42a762
148740 runtime.sigpanic()
148741 E:/software/golang/go/src/runtime/signal_unix.go:387 +0x2d7 fp=0xc04af8cda0 sp=0xc04af8cd50 pc=0x43f4e7
148742 runtime.memmove(0xecc6a02000, 0xecc683c7c0, 0x20)
148743 E:/software/golang/go/src/runtime/memmove_amd64.s:180 +0x151 fp=0xc04af8cda8 sp=0xc04af8cda0 pc=0x458641
148744 runtime.growslice(0x7ddba0, 0xecc683c7c0, 0x4, 0x4, 0x5, 0xecc683c7c0, 0x2, 0x4)
148745 E:/software/golang/go/src/runtime/slice.go:198 +0x240 fp=0xc04af8ce10 sp=0xc04af8cda8 pc=0x4407c0
148746 ElasticComputeEngine/ComputeTaskHandler/Operations.(*Op_DiscreteRatio).Compute(0xc001874c60, 0x893a60, 0xc001874c60)
148747 F:/project/goproj/GOPATH/src/ElasticComputeEngine/ComputeTaskHandler/Operations/Op_DiscreteRatio.go:205 +0xc76 fp=0xc04af8cfb0 sp=0xc04af8ce10 pc=0x74ee06
148748 ElasticComputeEngine/ComputeTaskHandler/Operations.(*Op_DiscreteRatio).Receive.func1(0xc001874c60)
148749 F:/project/goproj/GOPATH/src/ElasticComputeEngine/ComputeTaskHandler/Operations/Op_DiscreteRatio.go:93 +0x51 fp=0xc04af8cfd8 sp=0xc04af8cfb0 pc=0x7589f1
148750 runtime.goexit()
148751 E:/software/golang/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc04af8cfe0 sp=0xc04af8cfd8 pc=0x457411
148752 created by ElasticComputeEngine/ComputeTaskHandler/Operations.(*Op_DiscreteRatio).Receive
148753 F:/project/goproj/GOPATH/src/ElasticComputeEngine/ComputeTaskHandler/Operations/Op_DiscreteRatio.go:87 +0x51f
148754
148755 goroutine 1 [chan receive, 1501 minutes]:
148756 main.main()148757 F:/project/goproj/GOPATH/src/ElasticComputeEngine/main.go:39 +0x2c0
148758148759 goroutine 5 [chan receive, 1501 minutes]:
148760 github.com/AsynkronIT/protoactor-go/log.(*ioLogger).listenEvent(0xc0000b6150)
148761 F:/project/goproj/GOPATH/src/github.com/AsynkronIT/protoactor-go/log/string_encoder.go:31 +0x75
148762 created by github.com/AsynkronIT/protoactor-go/log.init.0
148763 F:/project/goproj/GOPATH/src/github.com/AsynkronIT/protoactor-go/log/string_encoder.go:26 +0x107
148764
148765 goroutine 6 [syscall, 1501 minutes]:
148766 os/signal.signal_recv(0x0)
148767 E:/software/golang/go/src/runtime/sigqueue.go:139 +0x9c
148768 os/signal.loop()
148769 E:/software/golang/go/src/os/signal/signal_unix.go:23 +0x22
148770 created by os/signal.init.0
148771 E:/software/golang/go/src/os/signal/signal_unix.go:29 +0x41
148772
148773 goroutine 10 [IO wait, 2 minutes]:
148774 internal/poll.runtime_pollWait(0x7fe805961f00, 0x72, 0x0)
148775 E:/software/golang/go/src/runtime/netpoll.go:173 +0x66
148776 internal/poll.(*pollDesc).wait(0xc00019e098, 0x72, 0xdfd08c1500, 0x0, 0x0)
148777 E:/software/golang/go/src/internal/poll/fd_poll_runtime.go:85 +0x9a
148778 internal/poll.(*pollDesc).waitRead(0xc00019e098, 0xffffffffffffff00, 0x0, 0x0)
148779 E:/software/golang/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
148780 internal/poll.(*FD).Accept(0xc00019e080, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
148781 E:/software/golang/go/src/internal/poll/fd_unix.go:384 +0x1a0

@templarzq
Copy link
Author

1544673784

@templarzq
Copy link
Author

this sigbus occurred during 2000 concurrent compute executing

@randall77
Copy link
Contributor

Strange.
The code dies copying a small slice when growing it. The fault is on the write, so it's initializing the first 32 bytes of a presumably 64-byte allocation.
This is the first allocation on a page, so potentially there's an issue with how that page was mapped.
The access is aligned, so it isn't an alignment issue.
I don't see how Go could cause a SIGBUS in this situation. If the Go heap management was wrong, I would expect a SIGSEGV. Possibly a kernel bug, although I hate to blame the bogeyman like that.

Any chance this is reproducible? If so, could you try a different kernel?

@agnivade
Copy link
Contributor

Please kindly fill the complete issue template.

Specifically, it would help if you can tell us whether the issue occurs with 1.10 too. And also kindly mention the steps for us the reproduce the issue, preferably with a self-contained code sample.

And a code paste is always preferred than a screenshot. It is practically impossible to copy the code from the screenshot.

@agnivade agnivade changed the title SIGBUS occurred in go1.11.1 runtime: SIGBUS occurred during slice append Dec 13, 2018
@agnivade agnivade added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Dec 13, 2018
@templarzq
Copy link
Author

templarzq commented Dec 13, 2018

compute.tar.gz
this is the executable zipped file . this bug can be reproduced randomly.
GOOS:linux, GOARCH:amd64

@templarzq
Copy link
Author

templarzq commented Dec 13, 2018

package main

import (
	"encoding/binary"
	"fmt"
	"io/ioutil"
	"math/rand"
	"net"
	"os"
	"strconv"
	"time"

	console "github.com/AsynkronIT/goconsole"
	"github.com/shiena/ansicolor"
	"github.com/sirupsen/logrus"
)

func checkError(err error) {
	if err != nil {
		fmt.Println("Error:", err.Error())
		os.Exit(1)
	}
}

func readDescFile(filePath string) string {
	descContent := ""
	fileBytes, err := ioutil.ReadFile(filePath)
	if err != nil {
		fmt.Println("Error:", err.Error())
		return ""
	} else {
		descContent = string(fileBytes)
	}
	return descContent
}

func main() {
	logger := logrus.New()
	logger.SetOutput(ansicolor.NewAnsiColorWriter(os.Stdout))
	logger.Formatter = &logrus.TextFormatter{
		ForceColors:            true,
		DisableLevelTruncation: true,
		FullTimestamp:          true,
		TimestampFormat:        "2006-01-02 15:04:05",
	}
	logger.SetLevel(logrus.DebugLevel)
	logger.Debug("start tcp client..")

	descContent := readDescFile("e:\\NBLabFlow_10724.txt")

	for i := 0; i < 2000; i++ {
		go func(i int) {
			//conn, err := net.Dial("tcp", "172.16.41.14:8089")
			conn, err := net.Dial("tcp", "127.0.0.1:8089")
			checkError(err)
			defer conn.Close()

			dpid1 := strconv.Itoa(41878)
			dpid2 := strconv.Itoa(41897)
			//register sensor data type
			datatype := make([]byte, 0)
			datatype = append(datatype, 0x89)
			values := fmt.Sprintf(dpid1 + ",device001_6001_0-1.0-71-202,1;" + dpid2 + ",device001_6001_0-1.0-71-221,1")
			length := len([]byte(values))
			lengthByte := make([]byte, 4)
			binary.BigEndian.PutUint32(lengthByte, uint32(length))
			datatype = append(datatype, lengthByte...)
			datatype = append(datatype, []byte(values)...)
			conn.Write(datatype)

			time.Sleep(2000 * time.Millisecond)
			//任务信息
			taskInfo := make([]byte, 0)
			taskInfo = append(taskInfo, 0x99)

			description := descContent

			length = len([]byte(description)) + 4
			lengthByte = make([]byte, 4)
			binary.BigEndian.PutUint32(lengthByte, uint32(length))
			flowID := 10800 + i*2
			IDByte := make([]byte, 4)
			binary.BigEndian.PutUint32(IDByte, uint32(flowID))

			taskInfo = append(taskInfo, lengthByte...)
			taskInfo = append(taskInfo, IDByte...)
			taskInfo = append(taskInfo, []byte(description)...)
			conn.Write(taskInfo)

			time.Sleep(2000 * time.Millisecond)

			data := make([]byte, 0)
			for k := 0; k < 200; k++ {
				data = data[:0]
				data = append(data, 0x88)
				valueStr := ""
				for j := 100; j < 105; j++ {
					valueStr += fmt.Sprintf("%d,%s,%f,;%d,%s,%f,;",
						12345*i+j, dpid1, float64(j), 12345*i+j, dpid2, float64(j+1))
				}
				length = len([]byte(valueStr))
				lengthByte = make([]byte, 4)
				binary.BigEndian.PutUint32(lengthByte, uint32(length))
				data = append(data, lengthByte...)
				data = append(data, []byte(valueStr)...)
				conn.Write(data)
				if k%1000 == 0 {
					logger.Info("send msg in routine ", i)
				}
				rand.Seed(time.Now().UnixNano())
				for n := 0; n < 12+rand.Intn(3); n += 1 {
					time.Sleep(1000 * time.Millisecond)
				}
			}

		}(i)
	}

	console.ReadLine()
}

@templarzq
Copy link
Author

these are client codes.

@templarzq
Copy link
Author

NBLabFlow_10724.txt

@templarzq
Copy link
Author

this is the file used in the client

@templarzq
Copy link
Author

you can run the server code ,listening port num:8089

@templarzq
Copy link
Author

run the client to send messages for several times. the bug will occur randomly.

@templarzq
Copy link
Author

templarzq commented Dec 13, 2018

func (pOpDiscreteRatio *Op_DiscreteRatio) Compute() error {
	for !pOpDiscreteRatio.m_exitSig {
		if pOpDiscreteRatio.M_state < 4 {
			continue
		}

		//获取每个输入的数据
		for {
			for i := 0; i < pOpDiscreteRatio.m_inputCount; i++ {
				pOpDiscreteRatio.m_signal[i].Release()
			}
			pOpDiscreteRatio.M_logger.Debug("all sem released.")
			inputData := make([]interface{}, 0)
			pOpDiscreteRatio.m_dataSig.Wait()
			for i := 0; i < pOpDiscreteRatio.m_inputCount; i++ {
				inputData = append(inputData, pOpDiscreteRatio.M_InputData[i][0])
				if len(pOpDiscreteRatio.M_InputData[i]) > 1 {
					pOpDiscreteRatio.M_InputData[i] = pOpDiscreteRatio.M_InputData[i][1:]
				} else {
					pOpDiscreteRatio.M_InputData[i] = pOpDiscreteRatio.M_InputData[i][:0]
				}
			}
			pOpDiscreteRatio.m_dataSig.Add(pOpDiscreteRatio.m_inputCount)
			pOpDiscreteRatio.M_logger.Debug("wg added:", pOpDiscreteRatio.m_inputCount)
			//开始计算
			switch colData := inputData[0].(type) {
			case *Messages.IntegerSegData:
				dataCnt := len(colData.Data)
				result := make([]float64, 0)
				sum := int64(0)
				avg := float64(0.00000000000001)
				stdv := float64(0)

				for i := 0; i < dataCnt; i++ {
					sum = 0
					stdv = 0
					for j := 0; j < pOpDiscreteRatio.m_inputCount; j++ {
						sum += inputData[j].(*Messages.IntegerSegData).Data[i]
					}
					avg = float64(sum) / float64(pOpDiscreteRatio.m_inputCount)

					//求标准差
					for j := 0; j < pOpDiscreteRatio.m_inputCount; j++ {
						stdv += math.Pow(float64(inputData[j].(*Messages.IntegerSegData).Data[i])-avg, 2.0)
					}
					stdv = math.Sqrt(stdv / float64(pOpDiscreteRatio.m_inputCount))

					//离散率=stdv/avg
					result = append(result, stdv/avg)
				}
				msg := Messages.NewFloatData()
				msg.LastNodeId = pOpDiscreteRatio.NodeID
				msg.TS = colData.TS
				msg.Data = result
				for _, outPin := range pOpDiscreteRatio.NextOP {
					for _, outNode := range outPin.DstOp {
						outNode.Tell(msg)
					}
				}
			case *Messages.FloatSegData:
				dataCnt := len(colData.Data)
				result := make([]float64, 0)
				sum := float64(0)
				avg := float64(0.00000000000001)
				stdv := float64(0)

				for i := 0; i < dataCnt; i++ {
					sum = 0
					stdv = 0
					for j := 0; j < pOpDiscreteRatio.m_inputCount; j++ {
						sum += inputData[j].(*Messages.FloatSegData).Data[i]
					}
					avg = sum / float64(pOpDiscreteRatio.m_inputCount)

					//求标准差
					for j := 0; j < pOpDiscreteRatio.m_inputCount; j++ {
						stdv += math.Pow((inputData[j].(*Messages.FloatSegData).Data[i])-avg, 2.0)
					}
					stdv = math.Sqrt(stdv / float64(pOpDiscreteRatio.m_inputCount))

					//离散率=stdv/avg
					result = append(result, stdv/avg)
				}
				msg := Messages.NewFloatData()
				msg.LastNodeId = pOpDiscreteRatio.NodeID
				msg.TS = colData.TS
				msg.Data = result
				for _, outPin := range pOpDiscreteRatio.NextOP {
					for _, outNode := range outPin.DstOp {
						outNode.Tell(msg)
					}
				}
			}
		}
	}
	return nil
}

@templarzq
Copy link
Author

here is the server bug code.

@randall77
Copy link
Contributor

Without the full code for the server so I can rebuild it, it's hard to debug.

@templarzq
Copy link
Author

type implementation struct {
sem chan struct{}
timeout time.Duration
}

func (s *implementation) Release() error {
select {
case s.sem <- struct{}{}:
return nil
//case <-time.After(s.timeout):
// return ErrNoTickets
}
}

func (s *implementation) Acquire() error {
select {
case _ = <-s.sem:
return nil
//case <-time.After(s.timeout):
// return ErrIllegalRelease
}

return nil

}

func NewSemaphore(tickets int, timeout time.Duration) ISemaphore {
return &implementation{
sem: make(chan struct{}, tickets),
timeout: timeout,
}
}

@templarzq
Copy link
Author

case <-time.After(s.timeout)

this line will cause mem leak........why??

@agnivade
Copy link
Contributor

Hi @templarzq - It would be great if you can provide us a fully self contained program to reproduce the crash on our side and debug this. What you are showing us is just a snippet of a larger codebase.

@agnivade agnivade added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Dec 23, 2018
@templarzq
Copy link
Author

sorry,this is a business project . u can reproduce that by using timeAfter and setting the timeout to INTMAX, then calling this function many times.

@templarzq
Copy link
Author

@agnivade

@agnivade
Copy link
Contributor

Thanks. Upto @randall77 and folks to take it forward.

@randall77
Copy link
Contributor

I'm going to close this as not actionable. Without source there's no way to debug on this end. Please reopen if you can provide source, or provide more information below.

  1. Which versions of Go does this fail on? 1.11.1 is reported - what about 1.10? tip?
  2. Does this fail on other OSes?
  3. Why is it generating a SIGBUS? Can you catch it in a debugger and check the page mappings?

@golang golang locked and limited conversation to collaborators Jan 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

4 participants