Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/cmd/godoc: Proposal: let godoc recognize mT styled source #35896

Closed
ohir opened this issue Nov 28, 2019 · 6 comments
Closed

x/tools/cmd/godoc: Proposal: let godoc recognize mT styled source #35896

ohir opened this issue Nov 28, 2019 · 6 comments
Labels
FrozenDueToAge Tools This label describes issues relating to any tools in the x/tools repository.
Milestone

Comments

@ohir
Copy link

ohir commented Nov 28, 2019

Proposal: let godoc recognize marktop styled source

Author: Ohir Ripe [Wojciech S. Czarnecki]

Last updated: 2019/11/28

Discussion at https://golang.org/issue/35896

Related to: #7873, #16666 and other "rich format please" issues.

Abstract

I propose using a ´marktop´ annotations within the ¨Go¨ source documentation. Marktop enrichment is unobtrusive even if read in the ˉraw sourceˉ. Did you notice marktop annotations? These would render:

I propose using a marktop annotations within the Go source documentation. Marktop enrichment is unobtrusive even if read in the raw source.

Background

Current state of ¨Go¨'s source documentation processing is good enough for documenting single ´implemented ˘things˘, ie. functions, variables, constants. It falls short if one must convey a new idea, an unobvious implementation of an algorithm, or even just describe a sequence of events (no lists, sadly).

Proposal

I propose extending go doc processing by a marktop annotations parser implementing both console and html output of described below format. (note that example of keyboard mapping is irrelevant for this proposal).

::: Marktop 2019 :::

 Styling:
 
 ˟ U+02DF cross      ˟dismiss / back to normal.  AltGr‿ /⃣  (map example)
 ´ U+00B4 acute      ´italics´       ´italics˟         ,⃣   𝑖𝑡𝑎𝑙𝑖𝑐𝑠           
 ¨ U+00A8 diaeresis  ¨bold¨             ¨bold˟         .⃣   𝐛𝐨𝐥𝐝             
 ˘ U+02D8 breve      ˘ibold˘    ˘bold+italics˟         m⃣   𝒃𝒐𝒍𝒅-𝒊𝒕𝒂𝒍𝒊𝒄𝒔       
 ˉ U+00AF macron     ˉfixedˉ     ˉfixed width˟         -⃣  fixed width      

An emphasis (styled text) begins after either acute, diaeresis, or breve character - none followed by a cross - and ends at a breve, acute, or diaeresis of the other emphasis' start, or this emphasis
stop. It ends also at a macron or at a cross accent. As the 'fixed font' span begins and ends only with a macron, other three emphases can be used within. An empty line ends all running emphases.

 Structure:

 ¤ U+00A4 currency     ¤ list item             // bulleted list    0⃣ 
 ¹ U+00A4 supers.one   ¹ list item             // numbered list    1⃣
 ª U+00A4 feminine.o.  ª list item             // lettered list    a⃣
 ¶ U+00B6 pilcrow      section head  ¶(refid)  // Section anchor   P⃣
 § U+00A7 paragraph    subsection    §(refid)  // SubSect anchor   p⃣
 « U+00AB lguillemet   « important »           // styled note      <⃣
 » U+00BB rguillemet                           // closes a note    >⃣

   Bullet surrogates in the source should be indented for readability.
   Any special character can be made ordinary by an immediate dismiss:
   §˟  «˟  ´˟  ¨˟  ˘˟  ˉ˟  »˟  ˟˟  ¤˟  ¹˟  ª˟  ʷ˟  ¶˟

 Hypertext:

 » U+00BB rguillemet   « to be cited »(refid)  // referable note   >⃣
 « U+00AB lguillemet   details there «(refid)  // quote a refid    <⃣
 ʷ»                    « this is on the webʷ»  // extern.link text w⃣
 ʷ U+02B7 modsmall.w     ʷ somesite.tld/path/  // url listed below w⃣

External links are introduced via the « note ending in ʷ» digraph. The url path — without protocol — must be given in the following line prepended by an indented ʷ. If more than one ʷ» is present in a line, their respective url paths are given in separate lines below:

  in our «IEEE-ITSS Open Journalʷ» and also on «our facultyʷ» site.
     ʷ www.ieee-itss.org/oj-its
     ʷ www.ivt.ethz.ch

The «(refid) internal link token always outputs its target's text. On the console it is put before "(referral)" that stays, and the html version makes this text into a hyperlink. Eg. the source of:

    Annolex Editor  ¶(Sect.2)
    ... Please read «(Sect.2) for the primer. 

 should output on the console:

    Annolex Editor (Sect.2)
    ... Please read "Annolex Editor" (Sect.2) for the primer.

 but in html it is expected to output a link:

    ✻ Annolex Editor
    ... Please read ͟͟A͟n͟n͟o͟l͟e͟x͟ ͟E͟d͟i͟t͟o͟r for the primer.

The final form of the output, incl. hypertext protocol used, is defined by the marktop processor. This specification only mandates that the plain text renderer — if used at all — remove marktop special characters, spaces following the «, and excessive spaces prepending any of »§¶. Still, all ´refids´ in parentheses must be preserved for humans to see. Accordingly, the source author is expected to keep ´refids´ both short ¨and¨ meaning.

Editing software may apply styles while keeping the syntax visible.


::: Godoc use :::

The § and ¶ not followed by a left parenthesis, and ¤, ¹, or ª
used in the middle of line are ordinary in godoc's marktop.

 Lists:

 ¤ List items should be recognized as such even if user-indented.
   It can not be code, as neither ¤ nor any bullet character can
   possibly open a line of a valid Go source.
 ⬩ Other bullet characters could be recognized as first nonwhites.
 ¤ terminal output should impose uniform indentation on lists
   ¤ ie. godoc list do not nest and the gofmt might know this.

 ¤ List items need to be given without blank lines inbetween to
   keep numerator going. The ¤^ list continuation digraph was
   removed from the specs and left for processor to define
   list numbering and continuation rules.


Linking, quoting and TOC:

 Paragraph text of sub-section may follow heading immediately in
 the next line, as the §(refid) annotation tells author's intent. 
 Both ¶() and §() make TOC entries. The »(refid) referrals do not.

 Section head ¶(refid)  ←— TOC: "Section head", Section head styling
 Subsect head §(refid)  ←— TOC: "Subsect head" under "Section head"
   ¤ an item  §(refid)  ←— TOC: "an item", under "Section head"
    ¤ another »(refid)  ←— no toc, "another" highlits as an anchor
 note: « this »(refid)  ←— no toc, "this" highlits as an anchor

Rationale

Documentation that can be styled even with only bold and italics, and one that can be structured to fit the domain, may help package authors to be more precise and unambigous, and help documentation consumers to avoid misunderstandings.

Marktop enabled godoc may encourage a well structured documentation that is written into the program sources even for, or the more for most sophisticated ideas, solutions and code. Now packages of even middle complexity often resort to external descriptions of their api.

Marktop parsing is fast, and there are no ambiguities introduced.

Unlike Markdown that makes raw annotated text almost unreadable, the marktop annotations are barely noticeable unless reader is wilfully scanning for the formatting hints.

Compatibility

This proposal extends documentation source syntax, and this syntax parsing methods, in a way that may not influence any program source but — in theory — might alter the visible html output of some existing documentation.

Even if this would happen, such a change would likely effect in the font decoration or size, and would not affect the meaning.

Implementation

None ideas yet.

Syntax was proven useable yet in the VT180 era (Z80 CPU, 64kB RAM, CP/M OS).
Only change is that the · ° … of then are ˟ ˘ ʷ now. And that ʷ now opens web,
while … then could have asked for diskette change and open a next file to read
or edit.

Open issues

  1. As @wgrr noted in x/tools/cmd/godoc: Proposal: let godoc recognize mT styled source #35896 (comment) ,
    two proposed glyphs are not in the WGL4 set. WGL4 fortunately has, unfortunately
    only two, runes that keeps to the top and fits purpose. Not as good as previous ˟ and ʷ,
    but still unobtrusive:
  ⁿ U+207F supers.n
  ˙ U+02D9 dotabove
@gopherbot gopherbot added this to the Unreleased milestone Nov 28, 2019
@gopherbot gopherbot added the Tools This label describes issues relating to any tools in the x/tools repository. label Nov 28, 2019
@MichaelTJones
Copy link
Contributor

MichaelTJones commented Nov 28, 2019

Every step toward Literate Programming is worthwhile. It will be interesting to see how others react to the use of characters beyond 1970s ASCII as formatting requests. In a proposal of my own about using such a character in an addition to ` in Go, the "those strange characters scare me" sentiment seemed pretty strong.

@wgrr
Copy link
Contributor

wgrr commented Nov 29, 2019

fwiw: glyphs U+02DF and U+02B7 are not present in Go fonts

@ohir
Copy link
Author

ohir commented Nov 29, 2019

fwiw: glyphs U+02DF and U+02B7 are not present in Go fonts

Yes, WGL4 does not have them. This qualifies for "Open issues", thank you.

@vdobler
Copy link
Contributor

vdobler commented Nov 29, 2019

I'm totally in favour for rendering numbered and bulletpoint lists better
(read as ul and li in HTML) but Marktop is just dead ugly to read and
painful to write. It'll make my workday miserable when I have to write
it and when I have to read it.

This proposal aims at too much. I think the current godoc formating lacks
two things: a) Inline verbatim/fixed width/code styling and b) automatic
list detection.

Headings, paragraphs are solved and work (not perfectly but they work).

URLs can be detected reliably and I actually do like to see the URL of
a link before clicking on it, so I do not see the need for a link text that
differs from the URL. (This is different for very technical URLs with e.g.
lots of query parameters, but such URLs are uncommon in documentation).

For b) it is hard to find some heuristic which works well for list existing
in the wild (I tried it on the corpus). This might be fixable as most list
which are not detected by a simple heuristic are rendered badly in the
current version anyway. This might be doable and would make complicated
documentation (e.g. package doc better). Even the spec uses uls and lis.

Common wisdom for a) is to use backticks as in markdown. This is ugly,
simply because it is markdown and input and output render different
characters. A less invasive option would be to use two spaces which
makes a word/identifier to stand out in plain text:

Foo foos the  target  while aiming at bar.

Being rendered as "Foo foos target while aiming at bar."
I think something like this would help as it makes writing documentation
easier as you have to reword less and have to rename arguments less
to produce understandable documentation.

(Just my contribution to this bikeshedding.)

@ohir
Copy link
Author

ohir commented Nov 29, 2019

@vdobler

b) automatic list detection.

This would impose that the author has gotten intimate knowledge about how the used automat "thinks". Process of getting to this knowledge is painful and time consuming. (This "heuristic fallacy" now affects millions of markdDown and yaml users.)

For b) it is hard to find some heuristic which works well for list existing
in the wild (I tried it on the corpus). This might be fixable as most list
which are not detected by a simple heuristic are rendered badly in the
current version anyway.

Re-read mine's above ;)

There are NO heuristics allowed in the marktop processor. Period.
The author is in control of the output, not an AQI.

Every construct representable in marktop (the list item is one of) is introduced by a single rune and it usually ends where other of its kind begins. The specified hard rule imposes that any marktop introduced change surely ends at an empty line. No exceptions.

Therefore a list is rendered only where intented:

 ¹ List item, ends where another begins
 ¹ next item, begins where other ended

Empty line ended the list. Indent above is not a part of the syntax (hence refferal to the gofmt in the go-focused part of the spec).

Headings, paragraphs are solved and work (not perfectly but they work).

Marktop does not intent to interfere here. It just adds an identifier to the already recognized section, it allows any sentence to have a referral identifier added then used.
The intra-links that once upon a time worked well for the legalese are really useful in technical texts too. (Have you never changed the section title then hunted for references to the old one in prose?)

URLs can be detected reliably and I actually do like to see the URL of
a link before clicking on it, so I do not see the need for a link text that
differs from the URL. (This is different for very technical URLs with e.g.
lots of query parameters, but such URLs are uncommon in documentation).

URLs are not to be detected with marktop. The paths are to be listed.
The intent is to have (web) referrals readable in the source.

Common wisdom for a) is to use backticks as in markdown. This is ugly,
simply because it is markdown and input and output render different
characters.

Backticks are part of the Go language specification and they may carry their meaning even in the prose. Eg. you should use raw string, ie. one in `backticks`. None of marktop proposed runes is a part of any programming language syntax (one I know of, at least). Marktop was designed with possible ambiguities in mind and aims to not introduce one.

A less invasive option would be to use two spaces which
makes a word/identifier to stand out in plain text:

Mhm... then let your editor software strip an ending one:

...you should use  while  loo…
                                OOOPS!  
...you should use  while
loops

P.S. If you do like using space for formatting (and you are using graphical linux desktop on a PC), a short command:
xmodmap -e 'keycode 65 = space space nobreakspace nobreakspace macron diaeresis nobreakspace nobreakspace' on the console will put ˉcodeˉ (alt graph+space) and ¨bold¨ (alt graph+shift+space) under your two or three fingers. With sticky keys turned on it will work even for the mouthstick typing.

@ohir
Copy link
Author

ohir commented Dec 1, 2019

Closing this issue for now, until I address weaknesses pointed to me (on irc) mostly
by Windows users (I must admit that I did not know that Microsoft no longer supports creation of custom keyboard mappings, something that was available since XP SP2).

  1. Windows stock US-Intl layout lacks breve and macron
  2. MacOS stock US layout lacks superscript numbers
  3. Various national layouts lack one or the other
  4. possible IDE customization can be thought only "per language".

And 5: I desperately need a name without "mark" in it. Two out of five participants were not ashamed to acknowledge that they did not read this proposal past the title because they “abhor the very word markDOWN”.

I will, hopefully soon, open an improved version of this proposal.
Thank you all who read this dense text below the title :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Tools This label describes issues relating to any tools in the x/tools repository.
Projects
None yet
Development

No branches or pull requests

5 participants