Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/base64: document if (*Encoding).Decode arguments may overlap #56070

Open
Merovius opened this issue Oct 6, 2022 · 7 comments
Open
Labels
Documentation NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@Merovius
Copy link
Contributor

Merovius commented Oct 6, 2022

base64.Decoding.Decode accepts two byte-slices. As encoded base64 is never smaller than the decoded data, it would be possible to save allocations by giving the same argument for both - that is, to store the decoded data in the same slice as the original base64 data. If the encoding is invalid, this might of course result in garbage ending up in that buffer. But that is often fine.

The documentation doesn't explicitly state that this is allowed. Empirically, it seems to work, but I'd feel more comfortable doing it, if it was documented. Could we document that it is allowed and what happens if the user does it? If we want to preserve the flexibility of disallowing it, should we document that it is not allowed?

A similar question arises for

  • base64.(*Encoding).Encode, though in that case it seems less useful to do so, as the encoded data is likely larger than the input
  • encoding/{ascii85,base32.(*Encoding),hex}.{Decode,Encode}
@robpike
Copy link
Contributor

robpike commented Oct 6, 2022

I would rather document that they should not overlap, to give more freedom to the implementation.

@mvdan
Copy link
Member

mvdan commented Oct 6, 2022

@robpike what kind of freedom are you thinking about? If we're talking about room for future performance improvements, the existing code allowing the overlap allows saving an allocation, which is in the same camp. Perhaps you're thinking of something like vector instructions?

@Merovius
Copy link
Contributor Author

Merovius commented Oct 6, 2022

As a potential compromise, cipher.BlockMode.CryptBlocks documents:

Dst and src must overlap entirely or not at all.

This allows to efficiently check if the arguments are overlapping. An implementation could then fall back on a slower path or allocate a temporary buffer, as appropriate. i.e. if we disallow overlapping arguments, the user always has to allocate a temporary buffer, but with this restriction, they could let the implementation choose whether or not it needs that.

@cagedmantis cagedmantis changed the title encoding/base64: Document if (*Encoding).Decode arguments may overlap encoding/base64: document if (*Encoding).Decode arguments may overlap Oct 6, 2022
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 6, 2022
@cagedmantis cagedmantis added this to the Backlog milestone Oct 6, 2022
@cagedmantis
Copy link
Contributor

cc @rsc

@twmb
Copy link
Contributor

twmb commented Oct 10, 2022

For both encoding and decoding, I've relied on the current implementation many times in somewhat performance sensitive code paths. If this officially is documented as disallowed, can it come with a corresponding vet check to allow checking and fixing affected codebases?

@mvdan
Copy link
Member

mvdan commented Oct 11, 2022

@twmb that would likely not meet the "frequency" criteria of vet, unless we can show that many Go users are really relying on this undocumented behavior.

@twmb
Copy link
Contributor

twmb commented Oct 11, 2022

For context, I only started using the pattern after I saw it in other code. It's something I've found quite nifty and I've reused it since.

My worry is more that the documentation is changed but the implementation isn't. A few years go by and maybe only then, vectorized instructions are introduced. How much code will this accidentally break then? I doubt people are really re-reading the documentation of base64's Encode / Decode functions, and I doubt that all code out there tests that base64 encoding/decoding is working the way the user expects. Why test something that the standard library is already testing itself, and how often am I looking at code (as a maintainer) that I wrote 2+ years ago?

I am fairly certain that at least one area I used the current code in the past is tested because the result was used in another computation that I then unit tested. I'm not sure if every area I've used the current code is fully tested around base64 changing its behavior. I don't have access to all code bases anymore, and I doubt some areas of former codebases are being revisited because they were (at the time) tested well enough and full stable. I'm worried about upgrading Go silently changing behavior in unexpected areas that aren't tested and likely wont be noticed until it's too late.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants