Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/blog: slices-intro: A possible "gotcha" with different result #43235

Closed
kwangh opened this issue Dec 17, 2020 · 8 comments
Closed

x/blog: slices-intro: A possible "gotcha" with different result #43235

kwangh opened this issue Dec 17, 2020 · 8 comments

Comments

@kwangh
Copy link

kwangh commented Dec 17, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.6 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/root/workspace/test/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build091743500=/tmp/go-build -gno-record-gcc-switches"

What did you do?

// main.go
package main

import (
"fmt"
"io/ioutil"
"regexp"
)

var digitRegexp = regexp.MustCompile("[0-9]+")

func CopyDigits(filename string) []byte {
b, _ := ioutil.ReadFile(filename)
return digitRegexp.Find(b)
}

func main() {
digits := CopyDigits("./digits")
fmt.Println(len(digits), cap(digits))
}

// digits
// random numbers with letters
12342!!!!J)))##$(*)&ASsadfsdfas132123adfsd

What did you expect to see?

42 554

What did you see instead?

5 5

https://blog.golang.org/slices-intro
As it says, "This code behaves as advertised, but the returned []byte points into an array containing the entire file." What I expected was "42 554", but "5 5".

@gopherbot gopherbot added this to the Unreleased milestone Dec 17, 2020
@ianlancetaylor
Copy link
Contributor

Why did you expect to see "42 554"?

What do you think should change in the blog post?

In general we don't use the issue trackers for questions about how Go works. Please see https://golang.org/wiki/Questions. Thanks.

I'm going to lose this because I see nothing wrong here. Please comment if you disagree.

@shmsr
Copy link
Contributor

shmsr commented Dec 17, 2020

@ianlancetaylor I think he's right. Previously I was also of the impression that it's wrong but I think @kwangh is correct.

Look at: #30169

So, with this CL, Find, FindAll, FindAllSubmatch have the capacity the same as their length.

But the blog reads:

This code behaves as advertised, but the returned []byte points into an array containing the entire file. 
Since the slice references the original array, as long as the slice is kept around the garbage collector 
can't release the array; the few useful bytes of the file keep the entire contents in memory.

So, with that CL, return []byte points into a subset of that array and not the entire file anymore and can't be resliced back to the original capacity (if needed) in this particular case.

So, if I understand this correctly, maybe the blog could be updated with a different example to show that "Gotcha" but not using the Find and friends from regexp. And it'd really nice if I could contribute if there's a need!

@davecheney
Copy link
Contributor

Would a simpler example be sub slicing the last 10 bytes off a multi megabyte string; that would remove the debate between the subslices Len and cap as neither can be used to reference the megabytes of prefix

@kwangh
Copy link
Author

kwangh commented Dec 23, 2020

@shmsr Thank you for the extra explanation.
I think the blog post need a new example.

@ianlancetaylor
Copy link
Contributor

@shmsr The blog post is still correct with the current garbage collector. The fact that the capacity is now restricted doesn't matter. The entire underlyng array is still pinned.

@kwangh
Copy link
Author

kwangh commented Dec 24, 2020

@ianlancetaylor It's a bit late but sorry for using issues. So in short, the underlying array is in heap place, and even after the CopyDigits function returns a result, the underlying array still stays in the heap space. The capacity does not matter. Thank you.

@shmsr
Copy link
Contributor

shmsr commented Dec 25, 2020

Yes, I get that. I didn't say that there's a problem with the underlying array getting GC'ed. I think I wasn't really clear before in explaining my thought. I just felt that the following line could be changed a little (or the example):

This code behaves as advertised, but the returned []byte points into an array containing the entire file

Because it points to an array which is not containing the entire file anymore (but a subset of that file); hence can't be resliced back even if the users want to. A slight change to the blog might make things very clear (maybe).

I hope I'm clear now. I explained the same thing before (#43235 (comment)) as well.

So, do you think that the following line needs some correction?

This code behaves as advertised, but the returned []byte points into an array containing the entire file

But that again is also not true every time. The blog post is totally fine is the file contains only numbers as regexp.Find will return the leftmost match and basically it'll point to an array containing the entire file. Now suppose the file contains:

123abc456

Then it's is a bit confusing if one looks at the implementation of regexp.Find and the blog. Maybe my interpretation of the blog is wrong; feel free to correct me.

@ianlancetaylor
Copy link
Contributor

I think the word "into" make the sentence correct.

Do you have a suggestion for how to make it more clear? Thanks.

@golang golang locked and limited conversation to collaborators Dec 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants