Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: x/tools/cmd/godoc: GORDO enriched Go documentation format. #35947

Closed
ohir opened this issue Dec 3, 2019 · 13 comments
Closed

proposal: x/tools/cmd/godoc: GORDO enriched Go documentation format. #35947

ohir opened this issue Dec 3, 2019 · 13 comments
Labels
FrozenDueToAge Proposal Tools This label describes issues relating to any tools in the x/tools repository.
Milestone

Comments

@ohir
Copy link

ohir commented Dec 3, 2019

Proposal: GORDO enriched Go documentation format.

Author: Ohir Ripe [Wojciech S. Czarnecki]

Last updated: 2019/01/24

Discussion at https://golang.org/issue/35947

Related to: #7873, #16666, #35896, #18342, #25444 and other "rich format please" issues.

Abstract

GORDO (dʒɔrˈdo) stands for GO Rich DOcs

This proposal is a try to make godoc ecosystem robust enough to be a single documentation method that can serve also end-user programs and production services.

Background

Current state of Go's source documentation processing is good enough for documenting single implemented things, ie. functions, variables, constants. It falls short if one must convey a new idea, an unobvious implementation of an algorithm, or even just describe a sequence of events (no lists, sadly).

Godoc heuristic does not allow to keep overall (package) docs close to the source, as parts of docs from different files are merged in the lexical order of the source filenames. This makes almost impossible to document a chunk of API in the very file that defines it. (This proposal tackles this with "refid" identifiers that can be put on documentation parts then used to provide merging order and in-text references.)

Proposal

I propose using a lightweight annotations that allow plain text documentation to have styling and structure hints added by the author. Gordo annotations use 11 non-ascii characters that can be entered as ascii digraphs led by a semicolon:

 ┌───────────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬───┐
  character:    ˘    ´    ¨    ˉ    °    «    »    þ    ¶    §    •  esc
    digraph:   ;b   ;/   ;'   ;-   ;.   ;[   ;]   ;t   ;p   ;s   ;l   ;;
 └───────────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴───┘

(Users accustomed to chords may configure translation via a GORDOIC environment variable. See previous revisions for elaborate description of avaliable entry methods.)

Translation is done by the gofmt, then godoc recognizes and interprets these 11 characters according to specification laid out hereafter.

styling

 °  degree         °escape || back to normal   aka "dismiss" char
 ´  acute          ´italics´       ´italics°   𝑖𝑡𝑎𝑙𝑖𝑐𝑠
 ¨  diaeresis      ¨bold¨             ¨bold°   𝐛𝐨𝐥𝐝
 ˘  breve          ˘ibold˘    ˘bold+italics°   𝒃𝒐𝒍𝒅-𝒊𝒕𝒂𝒍𝒊𝒄𝒔
 ˉ  macron         ˉfixedˉ           ˉfixedˉ   fixed width span
 «» guillemets     «notable or related text»   p͜a͜y͜ ͜a͜t͜t͜e͜n͜t͜i͜o͜n͜ span

An emphasis (styled text) begins after either acute, diaeresis, or breve character - none followed by a degree - and ends at a breve, acute, or diaeresis of the other emphasis' start, or this emphasis stop. It ends also at a macron, at a left guillemet, or at a degree "dismiss" character. The 'fixed' and 'notable' spans begin and end only with their respective special characters so other three emphases can be used inside. An empty line ends all running emphases and spans.

Editing software may apply styles while keeping the syntax visible. In the final form a style is applied and syntax characters are hidden.

accessibility

For the screen-readers usage document author can make a style to convey a semantics hint.
Aria labels are introduced in the form of a short list with items starting at bullet-style digraphs.

In this document styles mean:

   •´ cited from other text´
   •¨ endpoint name¨
   •˘ call parameter˘
   •ˉ codeˉ

Seeing users will see this rendered as bulleted list with styled items, not-seeing will hear either a label text or audible hint when reader enters into labelled region. Note that regions are marked in the source, hence accessibility tools will be more useful at the terminal, too.

the refid

A short string identifier that can be attached to a section, paragraph, or quotable span:

 §  section        quotable section head   §(refid)
 ¶  pilcrow        quotable paragraph lead ¶(refid)
 »  rguillemet             « quotable span »(refid)

Refid strings are used to identify parts of the main documentation that can then be referenced elsewhere. Refid tagged part can then be quoted, linked to (in html output), and searched for by the go doc tool. Refids should not resemble godoc-searchable identifiers of the package's code, as go doc tool should allow to display a part of documentation pointed to by a refid. Refids should be short but informative.

structure

 «' lguillemet     quote here a text span, heading or item:
                       «'refid'    'quote in apostrophes'
                       «"refid"    "quote in double quotes"
                       «(refid)     use no quote characters

The «"refid" quote an internal link token always outputs its target's text put between quotation marks as seen after the «, or without if parenthesized «(refid) form was used. Console output always prints the refid in parentheses after the quotation, Html version outputs quoted text as a link to the place of origin instead. Eg. the source of:

    Annolex Editor  §(Sect 2)
    ... Please read «"Sect 2" for the primer.

 should output on the console:

    Annolex Editor (Sect 2)
    ... Please read "Annolex Editor" (Sect 2) for the primer.

 but in html it is expected to output a link:

    ✻ Annolex Editor
    ... Please read "͟A͟n͟n͟o͟l͟e͟x͟ ͟E͟d͟i͟t͟o͟r" for the primer.

lists

 •  bullet         •  bulleted list item
 •a                a) lettered list item
 •1                1. numbered list item
 þ  thorn        see: link/url list item
  • List items need to be given without blank lines inbetween.
  • List ends at an empty line as any other gordo introduced styling.
  • List items are recognized as such even if user-indented.
  • Console output imposes uniform indentation of lists.
  • Gofmt may impose uniform indentation of consecutive list items in the source.
    (Other gordo processors may allow for nesting though).
  • List item start (bullet or thorn) is recognized as such only if placed as the first printable in a line and followed by a space.

external links

 »þ        « link description »þ          // text description of
             þ somesite.tld/path/tolink   // an url listed below

External links are introduced via the « note ending in a »þ digraph. The url path — without protocol — must be given as an url list item (þ) in the last line of the paragraph. This line can be indented. Up to three »þ references can be present in a single paragraph, then all their respective url paths are given in separate lines below:

  in our «IEEE-ITSS Open Journal »þ and also on « our faculty »þ site.
     þ www.ieee-itss.org/oj-its
     þ www.ivt.ethz.ch

The final form of the output, including hypertext protocol used, is defined by the gordo processor. This specification only mandates that the plain text renderer — if used at all — removes gordo special characters and any superfluous space left after this removal — including spaces following the « of notable or link description span. Also, links rendered under the sentence should be given numerical indice and be prefixed with protocol:

  in our IEEE-ITSS Open Journal¹ and also on our faculty² site.
     ¹ https://www.ieee-itss.org/oj-its
     ² https://www.ivt.ethz.ch

Gordo processor can be configured on public www sites to render external links as indexed plain text urls to prevent link-spam.

table of contents, in order

Manual TOC is introduced either by a heading that starts with the "TOC" string, or one that have the "toc" refid set:

TOC — Table of Contents
Sisällysluettelo §(toc)

Manual TOC entries, in the form of •§ or •¶ digraphs follwed by a refid, are used to provide a display order. This allow documentation parts to be written close to the relevant code. Any section or paragraph not listed in a manual TOC is added at the end of generated TOC under the "Misc" top level heading.

   •§ refid         // a section head,    at the main level
   •¶ refid         // a paragraph lead,  at a subsection level
   •¶ "with spaces" //   use quotes if refid contains space

The rest of the line after refid is reserved for documentation housekeeping.

TOC list needs not to be consecutive. It is ok to have subheadings or even a paragraphs of text between parts of the list. (Eg. to have TOC divided by "experimental", "staged", "stable", and "deprecated" headings. Then docs maintainer may simply move a toc line between sections to mark its current
status
.)

The TOC imposing order on dispersed chunks of documentation is the crux of this proposal

With this implemented a documentation maintainer can be a separate role, and her edits go to the single file while many individual developers may write docs for their code only. Structure, distinguished spans and refids all are means for that ultimate goal. Styling is just a useful byproduct. One that completes the professional documentation process.

docs housekeeping

This should be a subject of other proposal but is provided here to explain reserved space of the toc-line.

During gofmt processing of the file that contains the TOC, toc lines are amended with a relative path to the file where refid was declared, a hash of code, and hash of related doc-comment. These hashes and paths are then checked by the local godoc instance. If (computed now) hash of code does not match one in the toc, and (computed now) hash of the doc-comment still matches, it is a strong signal that documentation diverged from the code (code was edited but its documentation was not). Generated output may then inform reader that documentation is possibly outdated.

toc-bar

A lone section heading with refid of "toc-bar C" will output (html) TOC as a block separated by the character C. Eg. §(toc-block ⬩) for this document would produce:

AbstractBackgroundProposalRationaleCompatibilityImplementationOpen issuespost scriptum

Order of the bar items is set by the §(toc) section.

console -toc

TOC and "toc-bar" sections are elided from the go doc -all tool output. The separate -toc flag lists all refids, and these refids can be used to select appropriate part of main documentation to show. Refids of places normally are printed in parentheses on the console, so user can follow them in the next invocation of go doc tool. Where output format allows for hypertext (linking), the manual TOC entries should be displayed though.

escapes

  • Doubled semicolon lead is always translated to a single dismiss that
    immediately disables translation of a next digraph:
    ;;;; => °;;, ;;;. => °;.
  • Any special character doubled is ordinary: As bolded ¨under 20°°C¨
  • One or more special characters following a dismiss character are ordinary:
    single macron: °ˉ, a digraph °»þ, or superiors °¹²³.
  • The "escape" function of dismiss character has higher priority than "end of style":
    ¨bolded °«¶ digraph¨
  • Degree character that has nothing to dismiss or escape is ordinary.
  • Degree character does not output if it has already been used to dismiss or escape.

Of all possible gordo "specials":

   °    ´    ¨    ˘    ˉ    «    »    •    þ    ¶    §   ´   ¨   ˘   ˉ   •   þ
  ;.   ;/   ;'   ;b   ;-   ;[   ;]   ;l   ;t   ;p   ;s   ¹   ²   ³   ¦   ¤   …
  •1   •a   •¶   •§   «'   «(   «"   «.   »þ   ¶(   §(  •´  •¨  •˘  •ˉ  »(

only guillemots, and superior numbers must be escaped, and degree — if styled. Other escapes are unlikely to be needed except for gordo-related docs.

Items of • ¤ … þ need escape only if are first, and are followed by a space. Section and paragraph out of their digraphs are ordinary. The Icelandic þ never may come before space, and the Old English script is not common in technical docs. Nor gordo digraphs are used in natural languages. None ascii digraphs are of valid Go code, too. It leaves: the styled degree, guillemots, and superior numbers ¹²³.

The «. digraph itself is an escape for a notable span that must start with one of "'(. Use two dots for span that should begin with a dot: «.. dot leaded notable span».

Rationale

Documentation that can be styled even with only bold and italics, and one that can be structured to fit the domain, may help package authors to be more precise and unambigous, and help documentation consumers to avoid misunderstandings. Now Go packages of just middle complexity often resort to external descriptions of their algorythms and api.

Not because their authors love to use yet other doc tools and are eager to do chores with keeping it synchronized. It is for the (lack of) godoc capabilities that restrict godoc uses to the standard libs. Or at best to the general-purpose Go libraries consumed by other Go code. Just for a lack of rudimentary emphases godoc-compliant documentation sources cannot be used to create user-facing documentation if said user is expected to be not a Go programmer.

This needs to change, as Go now is used to build really huge systems. End-users — admins and api-consuming developers — need documentation that is easy to browse and reflects all changes made to the just staged product.

Gordo allows package level documentation to be kept close to the code it describes and gives the author more control as to its shape and placement of its parts. This should ease us to maintain a well structured documentation being placed at the most relevant file and updated as related code changes.

Compatibility

Gordo uses no semantic constructs that can be mistaken for a technical text written in any language — neither natural nor formal. Out of all gordo "specials" only a few seldomly used non-ascii characters — degree, guillemots and three superscript numbers — may need to be escaped.

Nonetheless, as this proposal extends documentation source syntax, and this syntax parsing methods, there is a miniscule but non-zero possibility that gordo translation step may alter the visible html output of some existing documentation.

Even if this would happen, such a change would likely effect in the font decoration or size and would not affect the meaning.

Implementation

Enabling gordo annotations would need support from both gofmt and godoc. While implementation of basic formatting could be trivial, the real power of the proposed format and methods lie in the ability to make documentation both easy to skim at console and useable as an interactive manual in the browser. The last one needs working internal links between "quotable" and "quote" places implemented as well. Implementing this might need more resources, as implementing the toc-based documentation checks might too. But this work may benefit Go ecosystem as a whole and allow us to keep a single source of truth for both external (eg. grpc) api and for the code implementing it.

post scriptum

Someone whom I respect confessed recently:

I remember thinking that changing fmt.printf to fmt.Printf in my code was ugly, or at least jarring: to me, fmt.Printf didn’t look like Go, at least not the Go I had been writing. [...] I got used to it, and now it is fmt.printf that doesn’t look like Go to me.

Gordo may look unusual at first sight but I hope for its syntax to be regarded comfortable soon. Unlike styling syntax of markdown, and other markups used only to generate html, gordo stylings are barely noticeable in source, unless reader is wilfully scanning for the formatting hints. Structure annotations converse: are concise but stand out on the console.


Revisions

  • r2 [16 December 2019]
    • make ¹²³ as styling surrogates with default GORDOIC=us map enabling many
      national layouts' users to type gordo styling without learning new chords.
    • fix section/paragraph swap (US/EU differences kicked in)
    • explain that authors need almost no characters escaping
    • escape by prefix, so parser need not to look back
    • add unix xmodmap for us-ansi layout users
    • explain functionality of a toc section
    • degree is a dismiss by itself now
    • more elaborate Rationale
    • concise chords table
    • post scriptum added
  • r3 [23 January 2020]
  • r4 [24 January 2020]
    • Promote ascii digraphs to be a main entry method.
    • Remove most of the text related to entry methods and keyboard.
    • Add stress to the "ordering by toc" importance

@gopherbot gopherbot added this to the Unreleased milestone Dec 3, 2019
@gopherbot gopherbot added the Tools This label describes issues relating to any tools in the x/tools repository. label Dec 3, 2019
@taruti
Copy link
Contributor

taruti commented Dec 4, 2019

There are tons of readily available lightweight markup syntaxes (markdown, asciidoc, reStructuredText, Textile, ...). Why are you proposing yet another markup language?

This seems hard to type. And having to type different things on different operating systems that are translated to various symbols (with a per os GORDO environment variable) seems like a bad idea.

Also using accents in formatting does not make the documents very readable in my personal opinion.

@cagedmantis cagedmantis changed the title x/tools/cmd/godoc: GORDO enriched Go documentation format. proposal: x/tools/cmd/godoc: GORDO enriched Go documentation format. Dec 4, 2019
@rsc
Copy link
Contributor

rsc commented Dec 4, 2019

Go docs are meant to be unobtrusive plain text. Obscure Unicode markup does not count as plain text. When reading your example, I did notice the "gordo annotations", but I thought something was wrong with the browser's text rendering. That's not a good thing for documentation.

If we add any more support, it is most likely going to be using a very limited subset of Markdown, like maybe just adopting one bullet list syntax. Even that is still a ways down the priority list though.

@rsc rsc added this to Incoming in Proposals (old) Dec 4, 2019
@ohir
Copy link
Author

ohir commented Dec 5, 2019

@rsc

Go docs are meant to be unobtrusive plain text.

Gordo is meant to preserve Go docs to be unobtrusive plain text.

Obscure Unicode

All characters used in gordo came with the brand new DEC's VT100 terminal unit in the year 1983. Thirty six years ago. This set I used in the 1989' software and these characters were available on the dated daisy wheel printers my first client then had.

Obscure

Used daily with latin letters by a billion people or more.

Unicode markup does not count as plain text. When reading your example, I did notice the "gordo annotations", but I thought something was wrong with the browser's text rendering.

These will not render in the browser. These might be visible in the source and there they are the least obtrusive. Click through the raw button, please.

If we add any more support, it is most likely going to be using a very limited subset of Markdown,

Does really **bold**, _italics_, **_bold-italics_** and lists introduced by a significant whitespace allows one to better make sense of the words than ¨´˘ with a space under?


@taruti

There are tons of readily available lightweight markup syntaxes (markdown, asciidoc, reStructuredText, Textile, ...). Why are you proposing yet another markup language?

Because other markups are obtrusive for anyone who reads them in the source.

markdown source:
this version uses the [**Atkin**](https://fylux.github.io/2017/03/16/Sieve-Of-Atkin/) sieve
instead of previously used [**Pritchard's wheel**](https://link.springer.com/article/10.1007/BF00264164) one.

gordo source:
this version uses the «¨Atkin¨»þ sieve instead of previously used «¨Pritchard's wheel¨»þ one.
    þ fylux.github.io/2017/03/16/Sieve-Of-Atkin/
    þ link.springer.com/article/10.1007/BF00264164

markdown renders:
this version uses the Atkin sieve
instead of previously used Pritchard's wheel one.

gordo renders:
this version uses the Atkin sieve instead of previously used Pritchard's wheel one.

This seems hard to type. And having to type different things on different operating systems that are translated to various symbols (with a per os GORDO environment variable) seems like a bad idea.

Please re-read. I on my side will try to edit this part to have it not being understood exactly the opposite.

This seems hard to type.

It is an user's choice how to type gordo. The example provided in the proposal even shows how to type it using only ASCII characters — just like a markdown.

that are translated to various symbols

No. The opposite!

Various characters of user's choice are translated to the fixed set of eleven "gordo" characters.

Author types whatever keystrokes she wants and whatever she finds convenient/avaliable on her national keyboard layout, considering an IDE or editor she uses.
It is the target (cannonical) 11 charcters set that does not change.
GORDO table sets the input, output is fixed and same on all OSes and in all editors.

Also using accents in formatting does not make the documents very readable in my personal opinion.

It depends of what one does want to focus on. If it is the markup a reader needs to analyse, then yes - single dots or rings at top of the line need special attention.

Note though, that for all readers but author the less noticeable markup is, the better.

We (me at least) work with source documentation laid out with fixed-width fonts on screens of certain capacity. The html version is important before - lets us read faster and assess quality better. Where I work with other's source, In my vim I have marked parts of the docs (source) four to six keystrokes away.

Gordo aims to be unobtrusive in the source. So to allow it be as readable on the terminal as on the web while keeping the web version searchable and interactive, in a way.

@rsc
Copy link
Contributor

rsc commented Dec 5, 2019

I didn't say anything about **bold**, _italics_, **_bold-italics_**.
In general we don't want markup in doc comments.
I said we might recognize bullets.

@rsc
Copy link
Contributor

rsc commented Jan 22, 2020

Based on the discussion above and the reactions to the original proposal, this seems like a likely decline.

@rsc rsc moved this from Incoming to Likely Decline in Proposals (old) Jan 22, 2020
@ngrilly
Copy link

ngrilly commented Jan 22, 2020

@rsc Bullet list would be useful but what is proposed here is way too much complex.

@ohir
Copy link
Author

ohir commented Jan 23, 2020

@ngrilly May you elaborate more about "too much" complexity, please?

For the styling part I see simple substitutions. The most complex part would be to gather toc references then produce output in order. But IMO it is right price for keeping chunks of documentation right in the files they describe.

The gofmt "complexity"/price is confined to the simple substitutions as well — just to allow both US-English, and other languages users to use ascii digraphs instead of chords.

@rsc,
Note that now there is no other way to impose order but having a single giant doc.go. Lexical sorting of api is good for indexing libraries. Services' api more often than not needs to be described in order.

I sustain my original claim, that in its current state godoc — simple and useful for the general-purpose library code — is not enough for the vast area of today's Go usage.

Dismissing, without a real discussion, proposed way to have documentation kept by the code, and ordered, and readable both in source and in the webbrowser in my opinion stands firm against adverised meritocracy of the proposal consideration process.

I consciously did not announce this proposal on the general list — in hope for meritful discussion with the team here. I apparently was wrong in that hopes.

@ianlancetaylor
Copy link
Contributor

Note that now there is no other way to impose order but having a single giant doc.go. Lexical sorting of api is good for indexing libraries. Services' api more often than not needs to be described in order.

I sustain my original claim, that in its current state godoc — simple and useful for the general-purpose library code — is not enough for the vast area of today's Go usage.

That is a defensible position.

But substantial amounts of this proposal are about styling text. See the comments above. "Go docs are meant to be unobtrusive plain text. Obscure Unicode markup does not count as plain text." And "In general we don't want markup in doc comments. I said we might recognize bullets." There simply isn't any support for styling text in godoc comments. It's a solution for a problem that doesn't exist.

As far as imposing some order on godoc, see #18342 and #25444. We already have accepted ideas for improving the situation. Someone needs to complete the implementation and get it into the sources. Then let's see where we are.

@ohir
Copy link
Author

ohir commented Jan 23, 2020

@ianlancetaylor: I am aware of previous work, or rather attempts to, in this area. Both old, both abandoned (if not silently refused). Mine's is a holistic proposal, not for a patch here and there. Amount of work needed to patch an urgent need (eg. adding bullets) is not substantially lesser than for adding a complete feature — especially in long run.

some order on godoc

We need no "some" order. We need an exect ordering that also is easy to maintain in long spans of time. This proposal's "ordering by toc" allows docs maintainer to rearrange documentation without need to touching code sources — writting and maintaining documentation for an api chunk there is a task for the developer who actually takes care of that code. No other proposal I saw allows for such a separation of concerns.

substantial amounts of this proposal are about styling text.

Excuse me: in this proposal styling section counts three sentences (83 words) and a 512 chars in 4x6 table. (Substantial amounts of this proposal relate to the keyboard usage, though. Mostly as my overreaction to the perceived — and voiced — concerns regarding whether non-ascii characters can somehow be entered and displayed at all by the ascii keyboard users.)

I sustain — we need at least one form of emphasis in the text meant for the "web" users.
Be it bolds, be it italics — does not matter. We need this because good api often mandates using plain english words as an endpoint label, or as a field descriptor. While native English speaker is able to discern these off a sentence's parts with ease, people who learnt English during college years can be confused. Update: see also "accessibility" section added to the proposal.

unobtrusive plain text

This proposal is all about unobtrusive plain text that is readable in the source files.
I would like to stress again, that "plain" for 2/3 of world's population does not equal "american standard".
Note also, that most developed countries' governments impose that software they pay for comes with documentation in their country's language.


_Invites: @dsnet. @jimmyfrasche, @griesemer

@ohir
Copy link
Author

ohir commented Jan 23, 2020

Note that an accessibility section was added to the proposal.
Update (r4): all text relating to configuring keyboards has been removed. (@kortschak, @bradfitz)

Having emphasis added enables us to produce documentation accessible by blind persons not only in the browser but also in the terminal.

@ianlancetaylor
Copy link
Contributor

I'm going to restate and emphasize "Obscure Unicode markup does not count as plain text."

Go already supports documentation in any language. That is not what this issue is about.

@taruti
Copy link
Contributor

taruti commented Jan 27, 2020

I feel like there are at least three things here:

  1. Ordering of documentation, which could be nice to support in some way proposal: x/tools/cmd/godoc: add support for sections #18342 x/tools/cmd/godoc: add support for hotlinks #25444
  2. Whether godoc API documentation should have richer formatting (lists, emphasis etc)
  3. What should be used for that formatting.

Personally I think that if more formatting is added to godoc using a subset of widely available markup languages is the best way for this, e.g. a subset of Markdown. Many Go programmers are already familiar with Markdown as it is used in many places on the web.

However the custom GORDO symbols + digraphs + escaping does not seem like a good solution for this from my perspective.

@rsc
Copy link
Contributor

rsc commented Feb 5, 2020

There is no change in consensus here, only additional argument made in favor by the original reporter. Declined.

@rsc rsc closed this as completed Feb 5, 2020
@rsc rsc moved this from Likely Decline to Declined in Proposals (old) Feb 5, 2020
@golang golang locked and limited conversation to collaborators Feb 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Proposal Tools This label describes issues relating to any tools in the x/tools repository.
Projects
No open projects
Development

No branches or pull requests

6 participants