Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: build: define standard way to recognize machine-generated files #13560

Closed
dmitshur opened this issue Dec 10, 2015 · 105 comments
Closed

Comments

@dmitshur
Copy link
Contributor

dmitshur commented Dec 10, 2015

Latest Status

Edit in 2018: This is a large issue with many comments (so many, that GitHub hides most of them by default). Here is the summary of the latest status.

This proposal has been accepted and implemented by Rob Pike. The final format can be seen here (it links to a comment in this thread by Rob Pike, with the final format that was chosen):

https://golang.org/s/generatedcode

By now, most of the generated Go code uses a comment that matches the format that's described there.

The original proposal is below.


Abstract

I propose Go creates a standardized format, which would enable code-generating tools to reliably communicate to humans and other machine tools that the output is in fact a generated file. Additionally, Go should add a recommended style for a simple code generated disclaimer (which satisfies the above criteria).

Proposed Definition

A file is considered to be "generated" if and only if the maintainer(s) of the project consider it a non-canonical source. In order to make long-term changes to such files, another source must be modified, and the file in question is then fully (re)generated by a reproducible machine tool.

A distinguishable property of generated files is that they can be deleted and re-generated with a zero diff.

Background

One of the strong values that Go brings are conventions and best practices that reduce bikeshedding, increase consistency and readability across diverse teams of Go programmers. Having a well defined convention, format, or standard for things that are unimportant to the key task, but need to have some value saves time.

During Gopherfest SV 2015, Rob Pike gave a talk Go Proverbs (video, bullet-point summary) that mentioned:

Gofmt's style is no one's favorite, yet gofmt is everyone's favorite.

To expand on that, there are many examples for things that have have a recommend format/style in Go that let you simply reuse that and not force you (and other people) to invent your own style:

Description

There is one type of comment which is commonly used, but has no existing well-defined officially suggested style recommended by Go.

It is a comment that most tools that generate Go code tend to write somewhere at the top of the code.

There are currently many variations of such disclaimer headers in the wild, and they often vary insignificantly (in spacing, punctuation, etc.). New variations come to be when authors look at how other tools do this, see a large variance, end up picking their favorite and tweaking it.

Consider the following examples in the wild:

// generated by stringer -type Pill pill.go; DO NOT EDIT

// Code generated by "stringer -type Pill pill.go"; DO NOT EDIT

// Code generated by vfsgen; DO NOT EDIT

// Created by cgo -godefs - DO NOT EDIT

/* Created by cgo - DO NOT EDIT. */

// Generated by stringer -i a.out.go -o anames.go -p ppc64
// Do not edit.

// DO NOT EDIT
// generated by: x86map -fmt=decoder ../x86.csv

// DO NOT EDIT.
// Generate with: go run gen.go -full -output md5block.go

// generated by "go run gen.go". DO NOT EDIT.

// DO NOT EDIT. This file is generated by mksyntaxgo from the RE2 distribution.

// GENERATED BY make_perl_groups.pl; DO NOT EDIT.

// generated by mknacl.sh - do not edit

// DO NOT EDIT ** This file was generated with the bake tool ** DO NOT EDIT //

// Generated by running
//  maketables --tables=all --data=http://www.unicode.org/Public/8.0.0/ucd/UnicodeData.txt --casefolding=http://www.unicode.org/Public/8.0.0/ucd/CaseFolding.txt
// DO NOT EDIT

/*
* CODE GENERATED AUTOMATICALLY WITH github.com/ernesto-jimenez/gogen/unmarshalmap
* THIS FILE SHOULD NOT BE EDITED BY HAND
*/

This creates 2 problems.

  1. It's a problem for authors of code generator tools. Such authors need to spend time figuring out what format of a disclaimer header they want to use; there is no canonical standardized format.
  2. It's a problem for authors of tools that want to be able to use the information (in a helpful way) whether certain files are generated or not. There is no simple implementation that will catch all of the variations above, and there is no standardized machine-readable format that they can detect.

This leads to circular arguments and PRs/CLs. For example, see the discussion and the change itself at https://go-review.googlesource.com/#/c/15073/. It started from https://github.com/github/linguist/blob/473282d/lib/linguist/generated.rb#L241, which led to CL 15073. That lead to shurcooL/vfsgen@b2aab1c and shurcooL/go@43b2166. But the initial GitHub behavior came from protobuf disclaimers.

I've created the following func to try to answer the question if a file is generated. At this time, it uses heuristics and best-effort to tell if a file is generated. https://github.com/shurcooL/go/blob/master/analysis/generated_detection.go If there was a well defined requirement for tools to follow, this code can be made simpler and more reliable. Ideally, that helper should be moved into external library for people to reuse, and for generator tools that wish to be compliant to be able to use it for verification.

Goals

The goals of this proposal are twofold.

Primarily, to resolve the current impossibility of reliable communication between code generator tool output, and tools that try to determine if a file is code generated.

There should be a way for code generator tool authors to be able to express in their generated output that the file is generated, such that it's possible to reliably detect if a file is generated by other tools.

Secondarily, for code generator authors that simply don't care about what their disclaimer header looks like, provide a recommended style (that satisfies the first condition) template to use.

The implementation details should be defined in a design doc.

Non-goal

It is a non-goal to figure out how existing tools should choose to use or not use the fact whether a given file is generated.

There is some fear that if it's possible to determine if a file is generated reliably, then tools that display code differences will hide generated code differences. That is absolutely the choice of the tool, and in my opinion it should not enforce any behavior that users are unhappy with.

Having additional information (whether a file is generated or not) should enable tools to offer better user experiences - it should not cause tools offer worse experiences than currently.

This proposal focuses solely on enabling code generator tool authors that wish to use a standard disclaimer header to do without forcing them to invent their own format, and for tool authors that wish to make use of information whether a file is generated or not to be able to use that information as they wish. Details of how they do that is outside the scope.

Conclusion

By not standardizing a way for those two types of tools to communicate, it leads to ad-hoc solutions that are sub-optimal emerging, as can be seen above. Go has an opportunity to de-fragment this space and create a recommended standard format that will resolve the needs above, and allow people to migrate existing tools to use the specified format.

Once there's a standard, it's easy to begin updating existing tools towards it over time, and new generator/other tools can start relying on it.

Meta-disclaimer

I expect coming up with a recommended style may likely cause a lot of bikeshedding. However, I think it's a cost that's worth incurring, to go through this process once, so that we can avoid having to continuously suffer it while there's no standard at all. I personally don't care too much about what the actual format is (as much as I do about resolving the higher level problems described); I'm okay with whatever Go authors come up with. Any standard is better than no standard at all.

@dmitshur
Copy link
Contributor Author

This proposal comes from the result of discussion in CL 15073. /cc @bradfitz @dominikh @minux

@ianlancetaylor ianlancetaylor changed the title proposal: Standardize format for simple code generator disclaimer headers; enable a reliable machine-readable way to determine if a file is generated. proposal: standardize format for simple code generator disclaimer headers; enable a reliable machine-readable way to determine if a file is generated. Dec 10, 2015
@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Dec 10, 2015
@zellyn
Copy link

zellyn commented Dec 10, 2015

How about // DO NOT EDIT. Generated by $GENERATOR_NAME_OR_COMMANDLINE on the first line of the file?

@bradfitz
Copy link
Contributor

@zellyn, I think we need to agree that this is a good idea to standardize before we pick paint colors. :)

@zellyn
Copy link

zellyn commented Dec 10, 2015

@bradfitz Agreed. Just curious why the proposal didn't include a strawman proposal for format. :-)

@dmitshur
Copy link
Contributor Author

I think a good and simple way to approach it would be to add somewhere (Go style guide, or a more internal document) a section that says something like this:

If a code generator tool wishes to mark a file as "generated" in a way so it can be reliably understood by other tools to be generated, then it must satisfy these criteria:

  1. {{Simple rule to follow.}}
  2. {{Another simple rule that must be followed.}}

As an example, the following disclaimer will satisfy both the criteria above:

{{Example disclaimer template.}}

That way, people who don't want to be creative can just copy/paste that example disclaimer template and use it.

However, if someone wants to add more details or make their disclaimer different, they can still do that. They just need to follow rules 1 and 2 so that it can still be recognized by tools as a generated disclaimer header.

To detect if a given file is generated, you'd only need to check if those described rules are satisfied - no need for any other heuristics. (If there are some false negatives, then the generator can be updated to satisfy the rules, instead of muddying the detection algorithm.)

I think putting it that way makes this workable. I don't think it's possible to enforce a single header format that will satisfy everyone's needs/wants and have people be okay with it. But providing simple rules that the header will follow and giving an example would work splendidly.

@rakyll
Copy link
Contributor

rakyll commented Dec 11, 2015

/cc @hyangah for gobind generated files.

@adg
Copy link
Contributor

adg commented Aug 15, 2016

This proposal sounds like a good idea. We should use the same string that is recognized by GitHub.

@robpike
Copy link
Contributor

robpike commented Aug 15, 2016

Seems like a reasonable way to address the problem that IDEs that depend on tools are too fussy about generated code.

@robpike robpike self-assigned this Aug 15, 2016
@Kroc
Copy link

Kroc commented Sep 6, 2016

And what about people writing in languages that are not, er, English? :P
There could be syntax for "this is machine readable". Off the top of my head, a special form of comment; Rust does this for embedding tests within doc-blocks.

/// ... the triple slash
//! ... double-slash-bang

What actual machine-readable text should be used to denote a generated file is wider issue. This quickly explodes in scope and potential bloat in the future -- e.g. Rusts function decorations.

@dmitshur
Copy link
Contributor Author

dmitshur commented Sep 8, 2016

And what about people writing in languages that are not, er, English? :P

That's a valid point, and I think it should be considered.

We should use the same string that is recognized by GitHub.

The outcome may happen to be a string that is recognized by GitHub with their current code in linguist, but I respectfully disagree with this being a high priority criteria (if that's what was implied, I'm not sure).

As I mentioned in the proposal, what GitHub came to recognize as a Go generated file had a high degree of luck and variance in it (and circular arguments):

This leads to circular arguments and PRs/CLs. For example, see the discussion and the change itself at https://go-review.googlesource.com/#/c/15073/. It started from https://github.com/github/linguist/blob/473282d/lib/linguist/generated.rb#L241, which led to CL 15073. That lead to shurcooL/vfsgen@b2aab1c and shurcooL/go@43b2166. But the initial GitHub behavior came from protobuf disclaimers.

Specifically, see these two PRs that have shaped what GitHub recognizes today:

  1. https://github.com/github/linguist/pull/2152/files - Detect Go files generated by Protocol Buffers
  2. https://github.com/github/linguist/pull/2426/files - Detect Go files generated by go-bindata

Notice the motivation of those PRs (first one was to detect protobuf generated files, and the second was modeled after the first to detect go-bindata output). Also notice how easy it was to get them merged in.

Now, maybe we got lucky with the above sequence of events and what github recognizes today is a great format. But if not, I think it's a relatively low effort followup to submit a PR to GitHub's linguist – similar to those 2 PRs above – to make any necessary corrections after this proposal is resolved and Go has an officially recommended and recognized way to indicate that a file is generated.

Compare that with the alternative of Go using a potentially suboptimal mechanism for many, many years to come.

Seems like a reasonable way to address the problem that IDEs that depend on tools are too fussy about generated code.

robpike self-assigned this on Aug 15

I am very happy with that outcome and I trust whatever Rob Pike comes up with is going to be a great resolution for this issue.

@uluyol
Copy link
Contributor

uluyol commented Sep 8, 2016

And what about people writing in languages that are not, er, English? :P

I don't see this as a big issue. Autogenerated files are usually not meant for being read by people, and I don't know how someone would write Go code without being able to read English anyway.

@robpike
Copy link
Contributor

robpike commented Oct 16, 2016

Not happening in 1.8

@robpike robpike modified the milestones: Go1.9, Go1.8 Oct 16, 2016
@rakyll
Copy link
Contributor

rakyll commented Oct 16, 2016

Autogenerated files are usually not meant for being read by people

This is not always true, e.g. protoc output makes the majority of the generated code. Type definitions are very central and easily one the one of the mostly read parts of a code base.

@stanim
Copy link

stanim commented Oct 23, 2016

How about using the file extension ".gen.go" (or suffix _gen.go) for generated files, instead of standardising comments? This had two advantages:

  1. The format of the comment is irrelevant (and as such language independent)
  2. There is even no need to open the file to check if the file is generated, both by humans and by tools. This will also make tools faster (such as golint) which can skip the file immediately based on filename, instead of forcing them to open the file and inspect the source code for comments.

@dominikh
Copy link
Member

The benefit of standardizing on a comment is that fewer tools would need to be changed, as most tools already generate comments that can be matched by a common phrase. Virtually no tool, however, uses a gen suffix.

@robpike
Copy link
Contributor

robpike commented Oct 23, 2016

@stanim - That's a non-starter. Many generator tools, including multilingual ones, already make decisions about how to name their files. One I know personally is that the protocol buffer compiler uses .pb.go.

So a comment it will be.

@minux
Copy link
Member

minux commented Oct 25, 2016 via email

@Kroc
Copy link

Kroc commented Oct 25, 2016

Actually, I was just thinking today of suggesting that generated files could be zipped; i.e. ".gogz"; go doc could still read these, but they wouldn't be readable in a normal text-editor, a good way of hinting that you shouldn't be touching the file and that the contents are not important outside of the API (go doc, if used)

silbinarywolf added a commit to silbinarywolf/gml-go that referenced this issue Oct 21, 2019
* room: delete unused code and simplify

* alarm: move alarm code to its own package

* gmlgo: add SetSize to RoomInstanceIndex, add IsRunning to alarm package, update Camera to update after user code

* gmlgo: remove unused layer index code and add code to marshal binary data for Object and SpriteState

* gmlgo: update to use internal struct

* camera: add CameraSetUpdateFunction

* instance: start on InstanceRestore functionality (to bring old instances back)

* gmlgo: update serialization logic, change InstanceExists to not require the room to exist

* gmlgo: update camera

- update camera to render with the last created room if no instance following
- update camera code to not call follow logic if an update function is set
- fix state to ignore instances that were destroyed

* alarm: add gob serialization of alarm

* gmlgo: add "gmlgo build" command, remove unused "gmlgo fix" command code and start moving away from cobra package

* gmlgo: remove cobra and add simple tests

* gmlgo: change marshalling of structs to just use binary encoding (less payload size)

* gmlgo: add manual serialization to various object related structs

* gmlgo: more work on serialization

* gmlgo: start on generating serialization code

* gmlgo: more work on generating serialization code

* gmlgo: basic serialization code working

* gmlgo: add tests for build

* gmlgo: add test for serialization error case

* gmlgo: adjust code generation to only import certain packages if their needed

* gmlgo: fix generating serialization code for int type

* gmlgo: remove debug log functions

* gmlgo: fix DrawSpriteRotated in tests (headless) and fix tags being passed into "go build"

* gmlgo: update sprite updating to work like Game Maker Studio 2

* gmlgo: update asset system to be more flexible (allow nesting assets into sub-folders)

* gmlgo: fix config loading not working

* gmlgo: update code generation of asset IDs to generate into the "asset" folder

* gmlgo: fix code generation tests

* gmlgo: update assets to no longer auto-prefix, add error handling to stop duplicate asset names, add a test and rename assets in the example games

* gmlgo: update DrawText to take color parameter, fix backspace key and various other changes

* gmlgo: add support for "custom" asset folder

* gmlgo: fix example games

* gmlgo: add json tagging and add error-check to using UnsafeObjectTypeList

* gmlgo: minor fixes

* gmlgo: add MouseScreenPosition(), expose time functions and update text to draw relative to camera

* gmlgo: add basic support for custom assets in code generation

* go: add support for Go Modules and fix broken tests

* generate: fix go generate logic and paths provided to engine

* keyboard: fix backspace speed behaviour not resetting when backspace key is released

* modules: run "go mod tidy"

* draw: update draw text code so that smaller characters like "." and "_" are offset correctly and not drawn at the top

* travis: fix build by updating to latest go version (1.12.5)

* travis: remove sonarqube and remove manually downloading dependencies

* travis: add dependencies back

* generate: fix gmlgo_gen code to work use "packages" module for Go Modules support

* travis: attempt to fix build

* generate: update comments around generate

* file: change from deprecated gopherjs/gopherwasm/js package to syscall/js

* instance: fix deletion of entities to occur after GamePostUpdate and prevent crash if room was deleted before entity deletion is handled

* vet: resolve go vet warnings

* vet: resolve go vet warnings #2

* asset: add basic asset manager system

* asset: make asset loading use go routines / be async

this was done so that WebAssembly builds would download multiple assets at once, as loading is tied to the remote downloading of asset files

* asset: fix race condition that could occur if asset count was below 4 for sprites and don't initialize audio context if no audio is in the game

* time: rename timeprec to monotime

* debug: remove code to open now deprecated room editor

* asset: rename LoadAll to UnsafeLoadAll, to deter users from calling this in their own code

* window: add WindowSetSize, WindowSetScale

- remove WindowWidth, WindowHeight
- replace calls to WindowWidth with WindowSize().X
- replace calls to WindowHeight with WindowSize().Y

* room: improve error message

* draw: add DrawRectangleAlpha function

* gml: add GameEnd method

* instance: add InstancePauseAll method so that users can pause the game

* window: add WindowCursorVisible and WindowSetCursorVisible

* gmlgo: update gmlgo to use Short descriptions if you run "gmlgo" by itself

* gmlgo: start on publish command

* travis: fix xvfb

* travis: fix xenial

* travis: fix xvfb #2

* publish: more work on publish command

* publish: fix publishing relative directories

* travis: run gmlgo publish on examples/spaceship

* manifest: add generation of manifest.json file in debug mode

* manifest: update index.html to preload all asset files

* manifest: fix preload warnings in Chrome

* window: add IsBrowser will return true if the game is playing in a browser

* worm: fix sound not playing and add asset/manifest.json to .gitignore

* chore: update documentation, rename example/ to example/ and various other changes

* gmlgo: improve error messages with gmlgo publish

* gmlgo: improve error messages with gmlgo publish #2

* gmlgo: only publish for the current platform and web

* uid: add snowflake uid generation

* instance: Add InstanceFree function for freeing an instance without triggering its Delete method

* gmlgo: update "gmlgo generate" to follow standard generated code conventions

Standard generated code conventions are here
golang/go#13560 (comment)

* test: add code to ensure asset dir exists for tests

* test: add code so that tests can run headed

* test: update documentation around testing

* publish: improve error message

* camera: fix camera rendering jankily

* steam: add paniccatch to log out crashes to files (for debugging steam / production crashes)

* draw: add DrawSpriteCutMask method to allow for masking

* instance: simplify InstanceCreate code

* draw: remove DrawTextColor as it duplicated DrawText

* git-actions: add config

* mac: fix various bugs with Mac build

* git-actions: fix PATH to access /usr/local/go/bin

* git-actions: fix gl

* git-actions: fix xvfb

* git-actions: add libglfw3-dev

* git-actions: xvfb

* git-actions: fix actions

* git-actions: fix publish

* git-actions: try improve error messages

* git-actions: try improve error messages 2

* git-actions: disable publish as it fails

* git-actions: try setup windows

* git-actions: try fix windows

* git-actions: try fix windows

* git-actions: try fix windows 3

* git-actions: try fix windows 4

* git-actions: try fix windows 5

* git-actions: fix windows code coverage

* headless: fix headless

* git-actions: add macos

* git-actions: try fix macos

* git-actions: add gfx for macos

* git-actions: remvoe gfx for macos and add a note

* remove travis badge
thaJeztah added a commit to thaJeztah/docker that referenced this issue Nov 5, 2019
As described in https://golang.org/s/generatedcode, Go has
a formalized format that should be used to indicate that a
file is generated.

Matching that format helps linters to skip generated files;

From https://golang.org/s/generatedcode (golang/go#13560 (comment));

> Generated files are marked by a line of text that matches the regular expression, in Go syntax:
>
>     ^// Code generated .* DO NOT EDIT\.$
>
> The `.*` means the tool can put whatever folderol it wants in there, but the comment
> must be a single line and must start with `Code generated` and end with `DO NOT EDIT.`,
> with a period.
>
> The text may appear anywhere in the file.

This patch updates the template used for our generated types
to match that format.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit to thaJeztah/docker that referenced this issue Nov 5, 2019
…rmat

As described in https://golang.org/s/generatedcode, Go has
a formalized format that should be used to indicate that a
file is generated.

Matching that format helps linters to skip generated files;

From https://golang.org/s/generatedcode (golang/go#13560 (comment));

> Generated files are marked by a line of text that matches the regular expression, in Go syntax:
>
> ^// Code generated .* DO NOT EDIT\.$
>
> The `.*` means the tool can put whatever folderol it wants in there, but the comment
> must be a single line and must start with `Code generated` and end with `DO NOT EDIT.`,
> with a period.
>
> The text may appear anywhere in the file.

This patch updates the autogenerated code to match that format.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
docker-jenkins pushed a commit to docker/docker-ce that referenced this issue Nov 6, 2019
As described in https://golang.org/s/generatedcode, Go has
a formalized format that should be used to indicate that a
file is generated.

Matching that format helps linters to skip generated files;

From https://golang.org/s/generatedcode (golang/go#13560 (comment));

> Generated files are marked by a line of text that matches the regular expression, in Go syntax:
>
>     ^// Code generated .* DO NOT EDIT\.$
>
> The `.*` means the tool can put whatever folderol it wants in there, but the comment
> must be a single line and must start with `Code generated` and end with `DO NOT EDIT.`,
> with a period.
>
> The text may appear anywhere in the file.

This patch updates the template used for our generated types
to match that format.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 3df4f86f21fbcae3535e2231828dce16a1940dbb
Component: engine
docker-jenkins pushed a commit to docker/docker-ce that referenced this issue Nov 6, 2019
…rmat

As described in https://golang.org/s/generatedcode, Go has
a formalized format that should be used to indicate that a
file is generated.

Matching that format helps linters to skip generated files;

From https://golang.org/s/generatedcode (golang/go#13560 (comment));

> Generated files are marked by a line of text that matches the regular expression, in Go syntax:
>
> ^// Code generated .* DO NOT EDIT\.$
>
> The `.*` means the tool can put whatever folderol it wants in there, but the comment
> must be a single line and must start with `Code generated` and end with `DO NOT EDIT.`,
> with a period.
>
> The text may appear anywhere in the file.

This patch updates the autogenerated code to match that format.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 6186e9fe8794660d669f19a2e2ba7127321b817f
Component: engine
tv42 added a commit to tv42/becky that referenced this issue Dec 30, 2019
efskap added a commit to efskap/statics that referenced this issue Mar 2, 2020
@golang golang locked and limited conversation to collaborators May 12, 2020
@rsc rsc unassigned robpike Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests