Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/gob: excessive memory consumption #10490

Closed
dvyukov opened this issue Apr 17, 2015 · 15 comments
Closed

encoding/gob: excessive memory consumption #10490

dvyukov opened this issue Apr 17, 2015 · 15 comments
Milestone

Comments

@dvyukov
Copy link
Member

dvyukov commented Apr 17, 2015

go version devel +8ea2438 Fri Apr 17 13:44:30 2015 +0300 linux/amd64
with https://go-review.googlesource.com/#/c/8942/ appled.

The following program consumes 2GB while the input is few bytes. I suspect it is possible to modify the input to force the program to allocate arbitrary amount of memory and run for arbitrary period of time.

Allocations happen while receiving a wire type (encoding/gob.(*Decoder).recvType).

package main

import (
    "bytes"
    "encoding/gob"
    "encoding/hex"
)

type X struct {
}

func main() {
    data, _ := hex.DecodeString("53ff8f03010106696e7075745401ff900001fc0701044d61786901040001044d" +
        "696e6901040001044d61787501060001044d61786601080001044d696e660108" +
        "0001044d617863010e0001044d696e63010e00000007ff9001fe0000")
    s := ""
    gob.NewDecoder(bytes.NewReader(data)).Decode(&s)
    var b []byte
    gob.NewDecoder(bytes.NewReader(data)).Decode(&b)
    var f float64
    gob.NewDecoder(bytes.NewReader(data)).Decode(&f)
    var x X
    gob.NewDecoder(bytes.NewReader(data)).Decode(&x)
}
@dvyukov dvyukov added this to the Go1.5 milestone Apr 17, 2015
@dvyukov
Copy link
Member Author

dvyukov commented Apr 17, 2015

This input is probably a dup, but please test separately.

    data, _ := hex.DecodeString("53ff8f03010106696e7075745401ff900001fc0701044d61786901040001044d" +
        "696e6901040001044d61787501060001044d61786601080001044d696e660108" +
        "0001044d617863010e0001044d696e63010e00000007ff9001fe010000")

@dvyukov
Copy link
Member Author

dvyukov commented May 8, 2015

@robpike, ping. This is the last one for gob.

@robpike
Copy link
Contributor

robpike commented May 8, 2015

No need to ping. The milestone is sufficient.

@dvyukov
Copy link
Member Author

dvyukov commented May 29, 2015

No need to ping. The milestone is sufficient.

The later this bug is fixed the later in the release cycle the rest of the bugs will be discovered.

@jeffallen
Copy link
Contributor

I investigated this a bit, but need to stop for now. What I found is that when I turn on encoding/gob/debug.go, and then call Debug(bytes.NewReader("...the hex...")) the variable numFields in debug.go gets set to a too big number, undoubtedly due to the corrupted gob input.

I have not yet worked backwards to find out if there's a way that gob can tell that the number of fields it is being asked to keep track of is too big.

@dvyukov
Copy link
Member Author

dvyukov commented May 29, 2015

We know input size. If claimed number of entries is N and minimum entry size is S, can't we check that N*S <= DATA_SIZE?

@robpike
Copy link
Contributor

robpike commented Jun 1, 2015

Working as intended, if somewhat unfortunate.

The slices falsely encoded in these data are just under the cutoff for "too big". If they were slightly larger, they would be rejected outright.

In general, we can't compare the data length against the buffer size because the stream may contain type definitions, which mean some of the data will be in a second, as yet unread, message. We could put in a really complicated test for those cases that are guaranteed to be in this message, but since the goal here is hardening, that's pointless: other data types can still trigger the big allocation

@robpike robpike closed this as completed Jun 1, 2015
@dvyukov dvyukov reopened this Jun 2, 2015
@dvyukov
Copy link
Member Author

dvyukov commented Jun 2, 2015

The data claims to contain a type definition with 117507149 fields, the type should be followed by 117507149 field definitions, each field definition takes non-zero number of bytes, a 92-byte data can't possibly contain that.

@robpike
Copy link
Contributor

robpike commented Jun 2, 2015

That's not true in general. A slice can be stored in many messages, depending on its contents. Please leave this closed.

@robpike robpike closed this as completed Jun 2, 2015
@dvyukov
Copy link
Member Author

dvyukov commented Jun 2, 2015

@robpike How is it possible that a slice is stored in several messages? The code seems to just read in all slice elements from the provided buffer.

I don't understand why it does not make sense to sanity check claimed data size against real data size. If we have N elements with minimal wire size S, input data must be at least N*S bytes. Please elaborate.

@robpike
Copy link
Contributor

robpike commented Jun 2, 2015

Because the slice may contain interface fields that require a type definition.

@dvyukov
Copy link
Member Author

dvyukov commented Jun 2, 2015

Is it answer to the first question (slice stored in several messages) or to the second (checking of data size)?
If slice contains interface fields that require a type definition, what is huge here (slice len, single object, type definition or something else)?
I am not following you.

@robpike
Copy link
Contributor

robpike commented Jun 2, 2015

A slice can in general require as many messages as there are elements. Therefore it is not easy to say, this message is too short for the slice data. As I said above, it's feasible to check for certain slice types but not all, so I don't believe it's worth doing at all since it can't be done properly.

@jeffallen
Copy link
Contributor

Rob, have you already commented somewhere on the question, "how can I
safely use package x when the input is untrusted and I want to parse an
input without allowing it to consume an arbitrarily giant chunk of
memory"? We ran across this same question w.r.t. decoding maliciously
giant images (issue #5050 IIRC).
On Jun 2, 2015 8:18 PM, "Rob Pike" notifications@github.com wrote:

A slice can in general require as many messages as there are elements.
Therefore it is not easy to say, this message is too short for the slice
data. As I said above, it's feasible to check for certain slice types but
not all, so I don't believe it's worth doing at all since it can't be done
properly.


Reply to this email directly or view it on GitHub
#10490 (comment).

@robpike
Copy link
Contributor

robpike commented Jun 2, 2015

I have not.

@golang golang locked and limited conversation to collaborators Jun 25, 2016
@rsc rsc unassigned robpike Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants