Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: [modules + integration] provide foreign content extension points #31326

Open
nim-nim opened this issue Apr 7, 2019 · 18 comments
Open
Labels
modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@nim-nim
Copy link

nim-nim commented Apr 7, 2019

This report is part of a series, filled at the request of @mdempsky, focused at making Go modules integrator-friendly.

Please do not close or mark it as duplicate before making sure you’ve read and understood the general context. A lot of work went into identifying problems points precisely.

Needed feature

Go needs to provide system extension points, either in go mod descriptor file, or another module-specific metadata file.

Constrains

The metadata file should allow declaring the replacement of an in-module:

  • directory path
  • file path
  • variable

… by some system-specific path or value.

Motivation

Source code is not sufficient to build complex software. It needs other files (documentation, legal files, content files, protobuf files, other language source files…).

Because go get is the only distribution tool (or familiar distribution tool) at the disposition of many Go developers, they will add all this foreign (#31319 ) content to their Go modules. However, while better than nothing, go get is not really adapted to distributing foreign content.

At the last stages of system integration, when the integrator has fine knowledge of the target OS, and the necessary tooling to coordinate Go modules with other content providers, it is possible to relocate some of this content to the correct places of the filesystem, or even replace it with better versions. However this is difficult to do without module surgery, if Go modules do not provide an official mechanism for this kind of extension/replacement.

Applications:

  • move module documentation files to /usr/share/doc where humans can read them without dezipping a module file
  • move module legal files to /usr/share/licenses where legal audit scripts can find them
  • tell Go code to use javascript files managed by npm or yarn, instead of an embedded copy in the Go module file
  • point golang/x/mobile to the full system copy of Noto (all x Gigs of it) instead of the limited subset it embeds
  • move golang/x/image Go fonts to /usr/share/fonts, so Go developers can write documents that use the Go fonts, without needing to extract them manually
  • make all Go modules use the same protobuf files, instead of private copies in various stages of obsolescence

Some of those elements are quite bulky and a system replace will usually happen at the same time of a removal from the zip payload file. Removal also makes it easier to diagnose problems, when you don't have to wonder which copy is in use at any given time.

@beoran
Copy link

beoran commented Apr 8, 2019

Sorry, but I think it's not very clear what exactly you are suggesting? I suppose some directives could be added t the go.mod file, but which ones would you need and how would they work?

@nim-nim
Copy link
Author

nim-nim commented Apr 8, 2019

@beoran

Just a directive that says to the compiler “while compiling with this module, replace references to, internal module directory path internal module filesystem path, variable with those values”. For anything which is not a code import, this is about foreign content, not Go code.

This way Go projects can continue to ignore system-specific file locations while coding, and system integrators can redirect them to the correct place at system integration time.

@beoran
Copy link

beoran commented Apr 8, 2019

Sorry, but I still fail to see how that would work, and how that is different than a replace directive. Are you talking about applications own usage of paths? Maybe you could illustrate your idea with a short example?

@nim-nim
Copy link
Author

nim-nim commented Apr 8, 2019

It's not a replace because replace deals with Go package names and foreign resources do not obey Go package name rules.

The system filename or upstream filename of a foreign resource will not necessarily match the one inside the Go module (some upstreams are notorious for changing their naming regularly).

Other foreign resources can be split in many more files than the layout embedded in the Go module (full Noto, for example, will use a lot more files than a cut-down module-embedded version).

Other foreign resources, like protobuf, use PATH hierarchies, not simple directories, so pointing a Go module to an external canonical protobuf hierarchy involves replacing a PATH variable, not individual directory names.

So basically, you need to replace variable not paths in some cases, and when you do need to replace paths the replacing may require replacing the whole filename, not just the directory name.

At minima, the metadata file should permit a module author to declare a list of variables that can be replaced at integration time with another value (typically, protobuf PATHs)

Allowing to replace foreign resource directories and filenames is only interesting in non-variable mode, if it results in something like python monkey-patching (ie does not require an explicit declaration by the module author).

@rsc
Copy link
Contributor

rsc commented Apr 11, 2019

There is already an established way to change the compiler's behavior on a given source file, one that Linux distributions make heavy use of: edit the source file before invoking the compiler. Why is that solution not applicable here?

@nim-nim
Copy link
Author

nim-nim commented Apr 12, 2019

@rsc You can indeed do everything with patches, but that’s not saying much.

It is laborious to maintain a series of patches at a high quality level over a long span of versions and time. Sometimes, changing a location or a path-like option requires patching lots of different code points, if upstream didn't envision this change beforehand. You end up maintaining not a series of patches, but a brittle custom patch-generating script.

That’s why Linux distributions have a strong preference for getting fixes merged upstream as soon as possible, and change at little as possible in the upstream dependency graph in the meanwhile.

However, many Go upstreams, while friendly and aware than go get is not the good tool to distribute and update non-Go code:

  • do not want to bother mastering system or other language package managers,
  • are very reluctant to carry system-specific changes,
  • do not want to get into the business of defining custom extension mechanisms

This issue is an attempt to define a simple standard extension/handover mechanism, which is convenient for both parties:

  • the upstream Go project can go on making good Go code
    • forgetting about foreign non-Go supply chain problems
    • embedding a (possibly stale) copy of the foreign non-Go bits it needs inside its Go module
    • declaring in the module metadata, the variables that allow substituting this embedded copy with something else
  • the system integrator can correct the handling of foreign non-Go parts, without invasive changes to the upstream module. Correcting can mean:
    • relocating foreign content from the zip payload to the correct system location, or
    • using a better more complete or up-to-date copy of this content (for example, a js bundle managed by npm or yarn).

@nim-nim
Copy link
Author

nim-nim commented Apr 12, 2019

And to give honor where honor is due: it’s a massively simplified variant of the foreign depends suggested by @perillo on golang-dev

@beoran
Copy link

beoran commented Apr 12, 2019 via email

@nim-nim
Copy link
Author

nim-nim commented Apr 12, 2019

@beoran I may eventually get there. But we need to find a way to move our hundreds of Go system components to Go modules first. That's why this report is at the very end of this list.

And the feature would not be useful just to distribution maintainers. Whenever we encountered problems managing the non Go code parts of Go projects, the Go upstreams I interfaced with lamented, that there was no built-in Go feature to handover foreign material. They wanted to be relieved of the burden of worrying, how to manage non-Go material within Go tooling.

@perillo
Copy link
Contributor

perillo commented Apr 12, 2019

[...]

Needed feature

Go needs to provide system extension points, either in go mod descriptor file, or another module-specific metadata file.
[...]

Motivation

Source code is not sufficient to build complex software. It needs other files (documentation, legal files, content files, protobuf files, other language source files…).

Applications:

  • move module documentation files to /usr/share/doc where humans can read them without dezipping a module file
  • move module legal files to /usr/share/licenses where legal audit scripts can find them
  • tell Go code to use javascript files managed by npm or yarn, instead of an embedded copy in the Go module file
  • point golang/x/mobile to the full system copy of Noto (all x Gigs of it) instead of the limited subset it embeds
  • move golang/x/image Go fonts to /usr/share/fonts, so Go developers can write documents that use the Go fonts, without needing to extract them manually
  • make all Go modules use the same protobuf files, instead of private copies in various stages of obsolescence

IMHO what go tools need is a standard directory where assets (fonts, CSS, JavaScript files) are stored, so that tools know where to access them. And Go packages need an API to access these assets, using custom policies (find them from the user directory, or from system directories), so that an integrator only needs to change the policy to use.

I would like to have a data directory, that works in a similar way to the testdata directory.

There also should be a standard doc directory, where standard documentation is stored.

Finally the license file should always be defined in a know file.

Alternatively a Go module can have an associated manifest file to declare where these files are stored.

A possible tool that will access this data is go-install, that will install files in the system (/usr/share/<xxx>), where xxx should be specified when calling go-install, unless the full package/module import path is used.

@nim-nim
Copy link
Author

nim-nim commented Apr 12, 2019

IMHO what go tools need is a standard directory where assets (fonts, CSS, JavaScript files) are stored, so that tools know where to access them. And Go packages need an API to access these assets, using custom policies (find them from the user directory, or from system directories), so that an integrator only needs to change the policy to use.

Of course, we have this standard directory setup system-side (at least each system evolves one after a while), and it would be nice to have the pendant within Go modules, to map one to the other easily. This request is only about the mapping part, I didn't want to get into the business of prescribing any particular Go module internal layout.

As described in one of the messages, some things like protobuf PATH structures do not lend themselves to direct one-to-one mapping, you need to map a whole PATH variable, not individual directory paths.

@mdempsky
Copy link
Member

How is this handled by other languages? E.g., C++, Java, Python, JavaScript, Rust, Swift?

For things like documentation and licensing, those seem like conventions that transcend languages and that it's reinventing the wheel for each language to redefine conventions. For example, https://github.com/licensee/licensee is a programming language agnostic solution to identify licenses within a package.

For things like tweaking file paths or protobufs, those seem very package-specific. I'm having a hard time imagining a solution general and flexible enough to handle those use cases that's substantially different from just patching the source, like @rsc suggested earlier.

Maybe you can give examples of how those problems would be addressed if the packages written in a programming language other than Go.

@nim-nim
Copy link
Author

nim-nim commented Apr 13, 2019

@mdempsky

Most other languages use one or several make utilities that allow:

  • conditionalizing builds,
  • use (or not) the embedded copy of foreign parts,
  • and if not using it set where the system copy is located.

Go is relatively unusual in removing the make layer by default. That is very nice from a simplification point of view, but that also means the conditionalizing function needs to be done by the Go tooling layer.

One way to read this proposal is just the migration of conditionalizing options inside Go module metadata.

Because setting a huge number of options all the time is not fun, Linux systems have consolidated on common filesystem standards (FHS, XDG…). A lot of things can be assumed to be in a standard place and do not need explicit location passing. The existence of this standard directory structure is one reason devs dropped proprietary Unixes like hot potatoes as soon as Linux x86 systems were powerful enough. Proprietary Unixes never achieved this level of standardization.

One thing we will need to define Fedora-side, BTW, is the default location of the system GOPROXY directory (#31304); upstream guidance is welcome.

That is, however, a drag in porting software to systems less normalized (ie Windows, though Microsoft has been steadily fleshing out its own default directory structure in past years).

To limit even further the amount of build variables that need a manual set the C/C++ guys have defined the pkgconfig system. That allows a component to drop metadata in a standard place, with the variables needed to build against it, and their local value. A lot of standard metadata fields are C/C++ oriented but the format permits freeform variables so that's not limitating (and, let’s be honest, C and Unix are deeply imbricated, it is natural that standardization efforts come from the C/C++ side).

pkgconfig is used directly by modern languages that wish to integrate cleanly with others (for example, rust, python). Of course the level of integration varies from language to language: Java devs, for example, could never wrap their head around integrating properly with non-Java things, which is why Java has been a dismal failure except inside specific environments that isolate poor Java devs from the rest of the software world (Java application servers, Android). And even in protected Java closets like application servers, one of the most common failure points/questions is “how do I access fonts, my Java code needs to render text”.

So basically:

  • level 0 of integration is monkey-patching access to foreign files and directories behind the Go developers' back, without any effort on their part.
    • it's a lot of technical work and I’m not aware of a language doing it that way.
    • but it may be "the Go way". A lot of Go design decisions try to do things for developers transparently.
    • this monkey patching could cover a lot of foreing content cases, but is not sufficient by itself, because some things to not lend themselves to transparent substitution
  • level 1, that pretty much all languages get to do at one point, is to define a mechanism to define and set build variables.
    • that is most of this issue, with level 0 as enhancement
  • level 2 is the same as level1, plus
    • awareness at the language/dev level of default system locations
    • ie if GOOS=foo, set all those variable to those values by default, do not ask the builder, do not ask the developer
    • of course any default can get an explicit build override
    • quite often the location of something implies the location of the something configuration files, so you do not need to pass more variables, they can be read in those configuration files
  • level 3 is the same as level 2, plus allowing the developer to declare:
    • the pkg-config files he wants to be read by default,
    • the build variables he wants read inside those
    • the pkg-config file he wants generated
    • variables he wants to export in this file to others
    • exported variable value may depend on the variables read in other files

One way to handle all this would be to make the "build option" layer a pure metadata override of the Go module content, with no change it its payload zip. Though that keeps an unused embedded copy of the foreign content inside the zip payload, which may be confusing, and is definitely inefficient for very bulky foreign content.

Another option would be to have go mod pack (#31302) generate a system-specific cut down variant of the module. I think that's what most languages end up doing. The inefficiencies of having multiple copies of the same material all over the place are just too great, even when you're not targeting embedded deployments.

@perillo
Copy link
Contributor

perillo commented Apr 13, 2019

@mdempsky

Most other languages use one or several make utilities that allow:

  • conditionalizing builds,
  • use (or not) the embedded copy of foreign parts,
  • and if not using it set where the system copy is located.

I believe that GNU autoconf is used for this, not make. And I'm glad that Go does not use autoconf. Optional parts should, ideally, be supported by plugings.

To limit even further the amount of build variables that need a manual set the C/C++ guys have defined the pkgconfig system.

Note that Cgo do support pkg-config. The problem is that many Go projects don't use it because:

  1. It is not available on all platforms supported by Go
  2. Not all C/C++ libraries in a OS distribution have a pkg-config configuration file.
    I don't know about Fedora, but as an example I found that Archlinux does not have
    a .pc file for libmagic.
    Moreover I'm not even sure if the pkg-config module names are standard between
    different platforms supported by pkg-config.

[...]

@nim-nim
Copy link
Author

nim-nim commented Apr 13, 2019

@perillo autoconf is just one of the many build systems used to conditionalize builds. A lot of them are more modern and convenient that autoconf. But discussing various build systems is academic in a Go context, Go chose not to rely on external build layers, the build functions are assured by the go tools themselves.

pkg-config names are standard and owned by the upstream project. So they are the same for all systems. The only thing that changes from system to system is their default location (and there is an env variable to point to the correct location if the guessed one is wrong). The variability in name or availability only happens when a project does not take ownership of its pkg-config file, forcing distributors to step in and provide and name it themselves.

And lastly pkg-config is neither C/C++ specific nor used only to pass C/C++ related build information. It's a generic inter-project langage-independant build variable communication framework. C/C++ projects use it most heavily, true, but that's only because C/C++ projects tend to care about good integration more than others. Historically, all the integrated system utilities were written in C/C++, and the authors of those utilities educated other C/C++ devs.

In the absence of pkg-config integration the fallback is to set build variable manually, which sucks from an effort point of view, but works everywhere.

@julieqiu julieqiu added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 28, 2019
@tv42
Copy link

tv42 commented Oct 26, 2019

One of the persistent nightmares of the C world was how much OSes/distros messed with what goes where and what's done how. The less of that we carry over to Go, the better. I for one consider e.g. Debian's attempt to package Go libraries as "golang-xxx-dev" just miserable.

There's a lot of Go/Node/Rust/etc code being written these days that is typically not installed via the old school linux distro way of putting some files in /usr/share, etc. From my end user/power user/ex-DebianDeveloper perspective, that fresh start seems to have really made many things simpler and more comprehensible. (I'd personally much rather take a distro with pervasive containers for system services than yet another "we have a global package database and everything in /usr" design.)

It would really help sell this issue if you split out specific use cases into their own issues, for example:

  • You seem to be asking for a way to replace a Go variable with a file, somehow?
  • You seem to be ignoring that many Go applications bundle their assets into the binary on purpose, to not depend on these kinds of file distribution mechanisms and their complexity. See e.g. fonts. Are you asking that there's a special version of the font library for your distro that does not bundle the fonts, but reads them from files? What happens when an unsuspecting gopher builds software using such a library and tries to run it in a minimal container?
  • You seem to be asking for a way to make all software "just use the same protobuf files", while those are actually used as input for code generation and not as part of a regular build (and globally coordinating protobuf schemas is a fallacy anyway due to never being able to eliminate version skew in real systems) -- also, how is placing the protobuf into a package/module and importing that not the answer, without inventing new mechanisms?

Right now the whole thing just comes across as big ball of vague "you are not autoconf, make, or C!". To which my personal answer is: Good!

@nim-nim
Copy link
Author

nim-nim commented Nov 1, 2019

One of the persistent nightmares of the C world was how much OSes/distros messed with what goes where and what's done how.

One of the persistent nightmare of deploying software in production is the way foo language devs feel entitled to copy non-foo language elements in their projects and leave those elements in a dismal state because they are foo language devs and do not really understand non-foo things.

Anyone who had to audit the usual pile of CVE-ridden legal-questionnable stuff project devs like to accumulate will react the same way as distribution. I don't care if the dev feels convenient to copy things and let them rot. I am the person deploying the software in production. I own the systems that will get holed via those security holes. I can get dragged to the judge if I deploy fonts or other things without clear licensing.

So by all means let devs cut and paste mountains of things they do not care triaging correctly in their project code. But let us replace those things with checked and trusted elements before going into production.

devs don't want to deal with the complexity of checking things properly (because that's where the complexity is, not inside distro packaging tech that any IT student can master in a month). That's why distributors exist. The day devs do the correct thing by default there will be no market for distributions.

@tv42
Copy link

tv42 commented Nov 1, 2019

@nim-nim I think you've finally reached something concrete and actionable. A standardized way of using npm javascript modules as assets. I personally have both projects that embed just a few static files (e.g. grabbed off the project release or a CDN), and I have projects that run npm from go generate (and then embed the resulting bundle.js into the Go executable). For the former, versions are in the file names but not in a very standardized way; for the latter, it's a genuine package.json with all of the javascript infrastructure baggage that comes with that. I do consider changing those versions of both a development time decision, needing test runs etc, so I'm still not clear how you see those "extension points" behaving beyond "make a commit that bumps package.json and all dependent files".

But I would absolutely like to have better, more uniform, community standards for the above. (Doesn't sound like a compiler change, but just conventions for source layout.)

If you were to split that concrete thing from this vague issue, I'd support the concrete thing. I don't see what the above has to do with extracting licenses.

@seankhliao seankhliao added this to the Unplanned milestone Aug 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

9 participants