Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/pem: add DecodeStrict #34069

Closed
tux21b opened this issue Sep 4, 2019 · 10 comments
Closed

proposal: encoding/pem: add DecodeStrict #34069

tux21b opened this issue Sep 4, 2019 · 10 comments

Comments

@tux21b
Copy link
Contributor

tux21b commented Sep 4, 2019

The "encoding/pem".Decode function in the standard library was designed to find and decode PEM blocks within arbitrary text (like emails according to a comment by rsc). Therefore, it tries to find BEGIN blocks anywhere and if parsing fails, it automatically backtracks and tries to decode the next block. No errors are reported.

Nowadays, this function is mainly used to decode TLS certificates as also stated by the package comment. Within this context, it is really annoying to not see any errors at all. Lot's of server software is written in Go, and invalid characters within those files (including white spaces or a windows line-ending at a wrong location) usually causes the software to silently ignore the certificate.

I currently work at a consulting company and we regularly have reports from clients that can be traced back to silently ignored errors when decoding PEM files and until now, I didn't know about this rather unusually behavior in the standard library either.

Can we please add a "encoding/pem.DecodeStrict" function for the common case of decoding certificates? We should probably also add a warning to the Decode function as well, because I think it is often used wrongly.

What version of Go are you using (go version)?

$ go version
go version go1.13 linux/amd64
@ALTree ALTree changed the title encoding/pem: add DecodeStrict proposal: encoding/pem: add DecodeStrict Sep 4, 2019
@gopherbot gopherbot added this to the Proposal milestone Sep 4, 2019
@rsc
Copy link
Contributor

rsc commented Sep 12, 2019

Right now we have:

func Decode(data []byte) (p *Block, rest []byte)

There's no way to say "this is a malformed block". This was definitely a mistake in retrospect.

The proposal is to add

func DecodeBlock(data []byte) (*Block, error)

where data is exactly a PEM-encoded block, with no non-blank text before or after. This isn't as useful for certificate chains (sequences of PEM-encoded things).

This doesn't seem quite right either. Maybe a callback interface that is passed text chunks and decoded blocks and decode errors?

/cc @bradfitz @FiloSottile @agl;
thoughts about what the best new PEM decoding API would be?

@agl
Copy link
Contributor

agl commented Sep 12, 2019

PEM is intended to ignore everything that's not a PEM block. This is commonly used so that people can put comments around the, otherwise completely opaque, blocks and so that tools can extract them from text files etc. See https://pki.goog/roots.pem for an example of this. Adding a strict function would break this and misunderstands the point of PEM. Maybe PEM should have defined a clear comment syntax so that anything else could be rejected, but I'm afraid it didn't.

If there's single PEM block expected then not finding it is a clear indication of a problem. If there's an unknown number of PEM blocks expected, however, then the flexibility of PEM can be an issue. In that case, perhaps PEM isn't the right tool and you should just distribute binary DER files?

@rsc
Copy link
Contributor

rsc commented Sep 25, 2019

@agl, I agree with all that. But is it worth distinguishing "we found what looked like the beginning of a PEM block but not the end", or "the base64 data in the middle is not valid" or other kinds of problems? That is, should the API at least be able to distinguish "nothing is here" from "something starts here but is wrong"?

Maybe a Decode that returned all the blocks and an error would solve the error reporting problem and also relieve people of writing the loop?

@agl
Copy link
Contributor

agl commented Sep 26, 2019

That is, should the API at least be able to distinguish "nothing is here" from "something starts here but is wrong"?

That would be sub-setting the file format, but there might be utility in doing that. The cost is that reality is defined by the intersection of what's accepted by all the common tools: so if Go considers pattern x to be indicative of a problem, and something else considers pattern y to be so, then nobody can do either x nor y in practice. Also, since these behaviours will be obscure, parties might only find out that they're hitting these issues late, when things are expensive to fix etc.

But, if broken PEM files are causing issues, that's a cost too which has to be balanced against.

So I still believe that a “strict” interface would be a mistake because that maximally favours the latter interest, but the current code follows normal PEM rules, which maximally favour the former.

Off the cuff, I could see that finding the substring “--- BEGIN ” in the input, but not part of a recognised PEM block, would be the sort of pattern that might strike a balance. Also making things up on the spot, since this is only an issue when an unknown number of PEM blocks are expected in an input:

// DecodeSeveral parses zero or more PEM blocks from input and returns them.
// It also detects patterns that indicate that a PEM block might have been
// intended, but which weren't part of a valid PEM block, and returns a sorted
// slice of the line numbers on this these concerns were found. If this slice is
// non-empty, one may wish to indicate to a human that the PEM input may
// be ill-formed and direct their attention to the indicated lines.
func DecodeSeveral(input []byte) (blocks []Block, contraindicationLines []int)

@rsc
Copy link
Contributor

rsc commented Oct 2, 2019

Based on the conversation here it seems like there is not really a consensus that this is a problem that needs to be solved in the standard library, nor what the solution would be.

I would suggest to @tux21b to write and publish a helper package that does what you think would work best and see if others feel the same way and start using it. That would give us more signal for whether something in the core encoding/pem is worth adding.

For now, this seems like a likely decline.

Leaving open for a week for final comments.

@rsc
Copy link
Contributor

rsc commented Oct 9, 2019

No final comments. Declined

@rsc rsc closed this as completed Oct 9, 2019
@anitgandhi
Copy link
Contributor

anitgandhi commented May 19, 2020

I came across this as I was in need of the functionality outlined in this discussion.

Usecase

We provide a managed certificates product that supports bring-your-own-certificate, which is then attached to cloud resources, which may use OpenSSL under the hood, which has stricter PEM requirements.

(One can replication the OpenSSL behavior with openssl crl2pkcs7 -nocrl -out /dev/null -certfile chain.pem)

We saw a situation where a customer provided a certificate chain with multiple PEM blocks, one or more of which were invalid. Our existing Go based validation did not catch any errors, while OpenSSL down the line did. i.e. something like this:

-----BEGIN CERTIFICATE-----
<valid base64>
-----END CERTIFICATE-----[redacted@redacted redacted]# 
[redacted@redacted redacted]# 
[redacted@redacted redacted]# 
[redacted@redacted redacted]# 
[redacted@redacted redacted]# cat ca-bundle.pem
-----BEGIN CERTIFICATE-----
<valid base64>
-----END CERTIFICATE-----

-----BEGIN CERTIFICATE-----
<valid base64>
-----END CERTIFICATE-----

Note how the first certificate is otherwise valid, except the ending line, which makes it invalid; at which point, Decode just carries on - correctly skipping the terminal shell lines, and processes the last 2. As mentioned, OpenSSL is stricter, and errored out earlier.

Possible solution

I ended up creating a drop-in replacement that adds a new exported function DecodeStrict that returns an error on the first invalid PEM block found.

It moves all of the decode logic to an internal helper decodeWithErrorHandler otherwise unchanged. This helper takes in f which is a callback for custom error handling. Decode simply calls decodeWithErrorHandler(data, decodeError) , which maintains all existing behavior.

This wasn't exactly what was described above, nor is it necessarily something we want in the stdlib in its current form, but it was the smallest diff to get things working, and all existing tests pass as expected.

DecodeStrict as I implemented it is a bit of an all-or-nothing solution, so the API is still open to debate. But in the future if more granularity is needed, decodeWithErrorHandler could also be exported, and the callback could be extended in a way that allows for a more meaningful error to be bubbled up to the caller.

Proposal

Personally, I think this is worth solving in the stdlib rather than folks having to copy the implementation details of Decode and its helpers.

@ianlancetaylor
Copy link
Contributor

@anitgandhi This issue is closed, for the reasons given above. As @rsc suggests above, the path forward is to create a helper package and see if people start using it. Thanks.

@anitgandhi
Copy link
Contributor

Yeah I understand. I should have been clear that my intention was largely to provide a link and context to any future people that visit this issue; my bad on that.

Thanks!

@ianlancetaylor
Copy link
Contributor

@anitgandhi Ah, OK, thanks.

@golang golang locked and limited conversation to collaborators May 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants