Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: integrated compile time language #37292

Closed
jonperryman opened this issue Feb 19, 2020 · 14 comments
Closed

proposal: Go 2: integrated compile time language #37292

jonperryman opened this issue Feb 19, 2020 · 14 comments
Labels
Milestone

Comments

@jonperryman
Copy link

Would you consider yourself a novice, intermediate, or experienced Go programmer?

Intermediate.

What other languages do you have experience with?

C, C++, Assembler, Python, SAS, JavaScript, PHP, HTML/CSS and more.

Would this change make Go easier or harder to learn, and why?

Advanced API's and frameworks can be made trivial to use. E.g. Messages should be multi-lingual but it's so difficult that even the GO compiler does not offer this capability. Look at the example below to see how this feature makes it trivial where all products could easily become multi-lingual.

Has this idea, or one like it, been proposed before?

Don't confuse this with C macro's nor code generators. Nor is this an alternative for functions. Instead, this provides control and flexibility at compile time that would normally require GO compiler source changes. As part of the compiler, it has some basic knowledge (e.g. making reflection available at compile time).

Who does this proposal help, and why?

Everyone, from beginners to the compiler developer's will greatly benefit from this feature. Some of the benefits are:

  • Ability to develop elegant solutions to problems not easily solved with the current compiler.
  • Improve performance by reducing the impact of worst case scenario solutions.
  • Greatly simplify API usage and add validity checking for API's.
  • Ability to write compile time algorithms instead of run time algorithms (e.g. reflection).
  • Greatly reduce repetitive code.
  • Avoid the need for creating gofix code when their API's change.
  • Ability to include more resources in the source that are externalized now.
  • Go language developers, can see the type of functionality developers really need.
  • Potential circumvention to compiler bugs or deficiencies.

What is the proposed change?

An integrated compile time language for GO that everyone can use. It will provide the ability to create compile time logic. It provides access to reflection at compile time, insertion of GO code and issue compile time error messages. The compile time language needs to be determined. GO would be nice because it would be consistent syntax and run efficiently but compile time programs must be compiled before compiling programs that use them along with dynamically loading / calling these programs. A language like Python might be more suitable because it is interpreted at execution time but it's drawbacks are another programming language to learn and high overhead during GO compile.

Demonstrating the importance of this functionality thru the UI.

I have real experience with the integrated compile time language that is built into the High Level Assembler. There are many real use cases so I will only comment about a few. I'll start with messages.

Easy to maintain call for a multi-lingual message

Msg( msgId=cs00001, myVar1, myVar2)  // Display message cs0001 is the user's language

Elegant and efficient message definition in an english message module

Msg( msgId=cs0001, msgLevel=error, type=define,
    msg='Message %v not displayed because message module %v was not found',
    help={
        ::Explanation:
            Message "v1" was not displayed because the message module "v2" was not found.
        ::System action:
            The message is not displayed.
        ::Corrective action:
            Look at the help text for the original message to determine if you need to take
            an action for that message. To resolve this bug, search the problem database for the
            appropriate action to resolve the problem.
    }
)

Why are most messages in english instead of being multi-lingual? Why don't messages have help text? Where are the advanced message features?

The simple answer is GO and C are not convenient, efficient nor easy in solving many problems. We must settle for mediocre solutions such as "gettext( )" that are high runtime overhead with a new language to learn. Worse yet, you must understand the impact of changing the gettext message id.

The example above was taken from my product written in High Level Assembler but changed to be compatible with GO syntax. This solution suited my needs but could easily be changed to be more flexible. The message definition is from one of the english language message modules so that the documentation writers can easily fix problems. The help text explains the message and guides users thru problem resolution. The message ID crucial to identify the message regardless of user language. Equally important is that automation deals with the correct message. Customer's can easily identify automation changes when a message changes syntax. The error level ensures the return code is set correctly.

Excessive overhead in strings.IndexAny( )

func IndexAny(s, chars string) int {
	if chars == "" {
		// Avoid scanning all of s.
		return -1
	}
	if len(s) > 8 {
		if as, isASCII := makeASCIISet(chars); isASCII {
			for i := 0; i < len(s); i++ {
				if as.contains(s[i]) {
					return i
				}
			}
			return -1
		}
	}
	for i, c := range s {
		for _, m := range chars {
			if c == m {
				return i
			}
		}
	}
	return -1
}

This is the actual strings.IndexAny( ) code in GO. Whether you code IndexAny( ) in GO or C, this logic is the best you can do because there is not any compile time flexibility built into the language. An integrated compile time language allows you to create very robust solution. In this case, there are 3 compile time scenarios that must be considered to cover all situations.

Scenario 1: quoted search characters: strings.IndexAny( myString, "<>&;" )

The current implementation results in unnecessary runtime overhead that could be performed at compile time. The double FOR loop results in 4 times the execution time in this example. Using makeASCIISet( ) is a run time solution that occurs with every call to IndexAny. The correct solution would be to create the ascii table at compile time (asciTable := [256]byte{ '<' : 1, '>' : 1, '&' : 1, ';' : 1 } but that's expecting too much from the developer.

With this new functionality, creating the asciiTable statement is a simple loop thru the search characters. Non-ascii unicode characters can generate an additional search for those characters. Beginners don't notice any difference.

Scenario 2: search characters in a variable: strings.IndexAny( myString, searchCharactersVariable )

The standard implementation is the current implementation. It does not do any compile time optimization. This is easy for beginners because they don't notice any difference.

Scenario 3: advanced search optimization features

For ease of use, beginners can ignore advanced features. In fact, these features are only needed in extreme cases (e.g. large amounts of data fewer than 8 search characters). This could be something as simple as a pre-allocated ascii buffer / search structure. Using this feature could be keyword parameters or different call types. Implementation is truly up to the developer of this functionality.

Formatted printing could be improved without UI changes

fmt.Printf('x %T x %t x %4d x', a, b, c)

Why isn't the formatting string parsed at compile time? Why aren't the formatting fields defitions determined at compile time to reduce runtime overhead? Why aren't the argument types validated with the formatting string at compile time? With this feature, we could easily solve these problems.

XML processing made easy and efficient

result, err := xml.Unmarshal(data,
    (   Person xml.RootElement {
            Name    string    `xml:FullName"`
            Phone   string
            Email   []struct {
                Where string  `xml:"where,attr"`
                Addr  string
            }
            Groups  []string  `xml:"Group"`
            Address struct {
                City, State string
            }
        },
        Company xml.RootElement {
            Name    string    `xml:FullName"`
            Phone   string
            Email   []struct {
                Where string  `xml:"where,attr"`
                Addr  string
            }
            Groups  []string  `xml:"Group"`
            Address struct {
                City, State string
            }
        }
    )
)
fmt.Println(result)

versus todays implementation

type Email struct {
    Where string `xml:"where,attr"`
    Addr  string
}
type Address struct {
    City, State string
}
type Result struct {
    XMLName xml.Name `xml:"Person"`
    Name    string   `xml:"FullName"`
    Phone   string
    Email   []Email
    Groups  []string `xml:"Group>Value"`
    Address
}
result := Result{Name: "none", Phone: "none"}
err := xml.Unmarshal(data, &result)
fmt.Println(result)

My XML parser written in High Level Assembler executes in less than half the time it takes in GO. The example above is in a GO compatible syntax. Compared with the current syntax, it is much easier to comprehend for GO beginners who understand XML. Additionally, the current implementation only allows for 1 standard implementation. There are several XML standards and advanced features which would be better served using an integrated compile time language. The example above shows multiple root elements (person & company) which is not available in the current implementation.

Please describe as precisely as possible the change to the language.

The minimum changes to the compiler:

  1. Devise a mechanism to identify these functions to be called (e.g. a new definition statement or maybe use of #).
  2. During the compile phase that sets data types, call the function that was identified.
  3. Choose a compile time language. Possibly GO, Python, Java, NodeJS or ???
  4. Runaway loop timer (user code)
  5. API's:
    • Delete AST's
    • Add GO code snippets as AST's.
    • Reflection using the AST's prior to the current position
    • Notes and errors
    • Global variables

Advanced features could be discussed later if this is accepted.

What would change in the [https://golang.org/ref/spec](language spec)?

The only change is a new section to cover the integrated compile time language.

Please also describe the change informally, as in a class teaching Go.

To be written if this feature is to be implemented. Because of the impact, this feature will be large.

Is this change backward compatible?

Yes.

Breaking the Go 1 compatibility guarantee is a large cost and requires a large benefit.

Go 1 compatibility is not affected by this change.

Show example code before and after the change.

See above.

What is the cost of this proposal? (Every language change has a cost).

TBD.

How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected?

Some generated code may not be available to the tools which may cause false errors such as missing variables. Otherwise, keeping the UI syntax consistent with GO should allow most tools to function correctly.

What is the compile time cost?

Use of this feature will increase compile time which could be significant depending upon implementation and compile time programs written. This can only be determined during the design phase.

What is the run time cost?

Run time will never increase. In some cases, run time will decrease.

Can you describe a possible implementation?

Do you have a prototype? (This is not required.)

Some UI prototypes are above. Called function would depend upon language chosen.

How would the language spec change?

Addtion to the language spec.

Orthogonality: how does this change interact or overlap with existing features?

Not applicable.

Is the goal of this change a performance improvement?

No, but that is one of the side benefits.

If so, what quantifiable improvement should we expect?

How would we measure it?

Does this affect error handling?

No.

If so, how does this differ from previous error handling proposals?

Is this about generics?

While not specifically about generics, it solves the basic complaint (repetitive coding). You only need look at messages to see how great opportunities are missed because of compiler limitations. When will the compiler have multi-lingual messages? When will the compiler have help text for messages.

If so, how does this differ from the the current design draft and the previous generics proposals?

@gopherbot gopherbot added this to the Proposal milestone Feb 19, 2020
@smasher164 smasher164 added v2 A language change or incompatible library change LanguageChange labels Feb 19, 2020
@beoran
Copy link

beoran commented Feb 19, 2020

I made a similar proposal at #32620, but I decided to close this issue it as this need not be integrated in the Go compiler itself, a pre-processor can do this as well. I would first implement such a pre-processor, and once it has proven its utility and has become widely popular, it could be considered to integrate it with the compiler itself.

@networkimprov
Copy link

The Rust compiler incorporates a Rust interpreter: https://github.com/rust-lang/miri

@beoran
Copy link

beoran commented Feb 19, 2020

Having a Turing complete interpreter in the compiler is a headache because the compiler would have to solve the halting problem to see if your program even compiles. In other words, it's easy to write loops in such an interpreter that might seriously slow down compilation or even cause it to go into an eternal loop. A macro preprocessor would be nice to have for Go, but for the reasons I mentioned, it should not be Turing complete.

@nc-x
Copy link

nc-x commented Feb 19, 2020

I would first implement such a pre-processor, and once it has proven its utility and has become widely popular

Compile time code execution / macros / etc. are available in a plenty of languages and their pros/cons are well known. So, IMO, writing a pre-processor to track its utility is not required.

If CTFE/Macros are available in Go, they are guaranteed to be used by plenty of people. A pre-processor comes not even close to a proper compile time code execution or macro system.

But afaict, because they slow down the compile times, and do not follow the simplicity and design goals of Go, so most likely this proposal won't be accepted.

Having a Turing complete interpreter in the compiler is a headache because the compiler would have to solve the halting problem to see if your program even compiles. In other words, it's easy to write loops in such an interpreter that might seriously slow down compilation or even cause it to go into an eternal loop.

Not really, no. It is very easy problem to solve. For eg., for compile-time code, the Nim compiler tracks the number of iterations of loops (easy to implement), and sets an upperbound on them. If the iterations cross the threshold, the compiler gives an error that the loop iterations are too many. For complex programs, which might require more iterations, it provides --maxLoopIterationsVM:N | set max iterations for all VM loops to change the upper bound.


But anyways, this proposal is for adding a compile-time "language" (Choose a compile time language. Possibly GO, Python, Java, NodeJS or ???), which is a bad idea for a variety of reasons.

@ianlancetaylor ianlancetaylor changed the title Proposal: Integrated compile time language proposal: Go 2: integrated compile time language Feb 19, 2020
@ianlancetaylor
Copy link
Contributor

What this proposal describes is a language that is very different from Go. Go, like any language, supports generating code using a separate code generator, and Go programmers use that routinely via //go:generate. It's very unlikely that Go would change to add an entire new language that is executed by the compiler. It's even more unlikely that that language would be one as powerful, and as different from Go, as Python.

@griesemer
Copy link
Contributor

An integrated compile-time language makes a compiler essentially programmable; there's no question that this would be very powerful. In a similar vain, at a time it was fashionable to make programming languages customizable, from the simple ability to define the precedence of operators (see e.g., Prolog), to the more complex of controlling evaluation of operands, etc.

While such languages can make it much easier to solve certain problems, they also make it much harder to understand what is going on because the programming language can be changed. A reader will effectively have to learn a new language before they can read the code.

Your proposal description is sufficiently high-level and the examples don't show the meta-language, so it's unclear to me how flexible an approach you are envisioning. But clearly you want to analyze format strings at compile time, so there's plenty of power here that could lead to programs that are extremely hard to understand. For instance, the meta-language could modify format strings and thus making it almost impossible to determine what a program is doing w/o actually performing the compile-time computation in one's head. This leads to code that may be easy to write but hard to read for anybody else but the writer (and it will be hard to read for the writer as well after enough time has passed).

In short, I believe any such attempt would significantly reduce Go's readability, yet readability is one of the top priorities for Go, so we're not going to easily compromise on that.

@ianlancetaylor
Copy link
Contributor

This proposal is incomplete in that it doesn't explain the compile time language. If it did, the additional complexity added to Go, as discussed above, would mean that we will not adopt it. Based on emoji voting, this proposal does not have strong support. Therefore, this is a likely decline. Leaving open for four weeks for final comments.

@jonperryman
Copy link
Author

Compile time language is about making code much more readable, efficient, robust and maintainable. It can greatly simplify many complex problems. Let's assume the meta language is GO so that it's simple for everyone to use. Let's assume that #xxxx( ) calls this go function and passes the parsed data between the parenthesis. The only thing to learn is calling the compiler API's such as inserting code and using compile time reflection.

Go generate or writting a pre-processor are a waste of time in resolving many problems. If they worked well for compile time logic, then how would you propose fixing the CPU overhead in the compiler.

This feature does not change the GO language. It just allows insertion of GO code. Most meta language programs won't be more than a hundred lines. Of all the concerns, "abuse" is the only legitimate concern but that is true with other features (solved using standards and best practices).

If you doubt this works, then ask yourself why the CPU overhead in scanner.go has never been addressed. The answer is it's painful without compile time logic. E.g. the following is code from func scanComment() processing for // comments:

s.next() // ignore the //
for s.ch != '\n' && s.ch >= 0 {
    if s.ch == '\r' {
        numCR++
    }
    s.next()
}

CPU usage could be reduced by over 50% (as much as 90%) for parsing but do you want to code it correctly when it looks so nasty? Do you want to repeat this everywhere this everywhere it occurs?

s.next() // ignore the //
for s.findNext( &[256]byte{0:1, '\n':1, '\r':1, 128:1, --- repeat to 255 --- } ) && s.ch != '\n' {
    if s.ch == '\r' {
        numCR++
    }
}

With compile time language, this would look like:

s.next() // ignore the //
for s.#findNext( "\u0000\r\n" ) && s.ch != '\n' {
    if s.ch == '\r' {
        numCR++
    }
}

#findNext( ) is the compile time function which is passed positional arg 1 "\u0000\r\n" and no keyword parms. It produces the findNext call that you would have coded by hand.

func #findNext(args) {
    goCode = "findNext( &[256]byte{ "
    for i:=0; i < len(args.1); {
        r, w = utf8.DecodeRune(s.src[s.rdOffset:])
        if w == 1 {     
            if r == utf8.RuneError {
                 error("illegal UTF-8 encoding")
            } else {
                 goCode += args.1[i] + ":1, "
            }
        }
        i += w
    }
    for i:=128; i<256; I++ { // process all runes
          goCode+= i  + ":1, "
    }
    goCode += "})"
    ast.insert(*,goCode)
}

#findNext adds a runtime call to Func findNext( ) which is a slightly modified version of the func next( ) from scanner.go. Replace the function prototype. A return true is needed where a character is found and return false when EOF is reached. The CPU reduction comes from the for loop skipping characters that are not used.

func (s *Scanner) findNext(charTable [256]byte) {
    for ; s.rdOffset < len(s.src) && charTable[s.src[s.rdOffset]] == 0; s.rdOffset++ {
}

This feature offers great benefits to all GO developers with a minor amount of change. Most developer's do not have experience with this type of feature so they don't understand how this changes the design of their code. It will take time to change the mindset.

@griesemer
Copy link
Contributor

@jonperryman I'm not sure I understand your example - can you elaborate more how I would get the 50% (or even 90%) reduction of CPU usage in scanner.go (or even just scanComment)? I'd definitively be interested in making scanning faster, even at the expense of (localized) ugly code.

@jonperryman
Copy link
Author

Languages such as Go and C strongly encourage a byte processing mentality. C get's around some of this inefficiency by providing multi-byte functions (e.g. memcmp, strcmp, memcpy, strcpy). It does not provide functions such as indexAny() because it is very inefficient without a compile time language. 

You can do bandaid fixes to make scanner.go faster but to make it a speed demon, you will need a different mindset and do a rewrite. To make this easy to understand, lets discuss the small piece of code from scanComment that searches for \n as the end of the // comment.

for s.ch != '\n' && s.ch >= 0 {                                   <===
    if s.ch == '\r' {                                             <===
        numCR++
    }
    s.next()                                                      <===
}

func (s *Scanner) next() {
	if s.rdOffset < len(s.src) {
		s.offset = s.rdOffset                                     <===
		if s.ch == '\n' {                                         <===
			s.lineOffset = s.offset
			s.file.AddLine(s.offset)
		}
		r, w := rune(s.src[s.rdOffset]), 1                        <===
		switch {
		case r == 0:                                              <===
			s.error(s.offset, "illegal character NUL")
		case r >= utf8.RuneSelf:                                  <===
			// not ASCII
			r, w = utf8.DecodeRune(s.src[s.rdOffset:])
			if r == utf8.RuneError && w == 1 {
				s.error(s.offset, "illegal UTF-8 encoding")
			} else if r == bom && s.offset > 0 {
				s.error(s.offset, "illegal byte order mark")
			}
		}
		s.rdOffset += w                                          <===
		s.ch = r                                                 <===
	} else {
		s.offset = len(s.src)
		if s.ch == '\n' {
			s.lineOffset = s.offset
			s.file.AddLine(s.offset)
		}
		s.ch = -1 // eof
	}
}

Notice the statements with "<===" that are executed for each character. These statements don't do anything useful except for /r, /n, 0 and all runes because they are more than 1 byte. To efficiently execute these statements only for those characters, we need to pass a character translation array with the characters of interest on the next( ) function call.

s.next( &[256]byte{0:1, '\n':1, '\r':1, 128:1, --- repeat to 255 --- } )

This is a constant array that must be passed by reference (&) so avoid the overhead of copying the array at each call to next(). The array size must be 256 because a character value can be 0 to 255. We index into this array using each value of each character in the src. Each value in the array defaults to 0 and 0, \n and \r are set to a value of 1 (translation value).
In addition, each rune start byte (128 to 255) is set to a value of 1 (translation value).

func (s *Scanner) next(translationTable *[256]byte) {
    for ; s.rdOffset < len(s.src) && translationTable[s.src[s.rdOffset]] == 0; s.rdOffset++ {
    }

The next( ) function is modified to accept the translation table by reference from the caller. A tight "for loop" loop through the source until there's no more data or the translation value is not 0 (found a character of interest). The compiler should optimize the "for loop" to be very fast because it's so small.

UTF8 is designed to be fast by having all bytes of a rune look like a rune (>127). Since comments are not interested in rune characters, they actually don't need to be included in translation table unless you want rune validation which probably isn't important.

As for other parts of scanner.go, they need the same changes but are convoluted. They should be made readable for clarity and maintenance.

To further improve maintainability and reduce CPU, the GOTO statement should be modified to allow GOTO DEPENDING ON where it supports multiple labels and the label is selected using a variable that contains an index. By specifying the index to a label for each character used in the translation table, you avoid the overhead for IF and CASE statements. The maintainability is improved by reducing repetition.

Also note that other modules need these changes (e.g. xml and json parsing).

@balasanjay
Copy link
Contributor

Both C and Go provide functions equivalent to indexAny, strpbrk and strings.IndexAny respectively.

Go's IndexAny even dynamically builds a bitset for ASCII-only sets of characters, which'd take less memory than an array of 256 bytes.

I don't think indexAny is a particularly convincing example of why a compile-time language would be useful. If you're searching a constant set of characters, you could just do a one-time conversion to a fast data-structure once at program-startup time, and then repeatedly query using that data structure.

@griesemer
Copy link
Contributor

@jonperryman Thanks for the detailed explanations, appreciated. I like to make a few observations specifically to your example (and somewhat unrelated to this issue):

  1. go/scanner is not used by the compiler. It's used by gofmt and other tools, though. It is also one of the oldest pieces of Go code there is. It could probably use a refresh. But read on.

  2. The compiler uses cmd/compile/internal/syntax (scanner.go) which is much newer, and more efficient (and which is that refresh). Coincidentally, I just improved its performance slightly last week.

  3. I agree that things such as comment parsing could be faster. I believe gcc (and others) even use vector instructions to efficiently skip over unused source text such as comments, and possibly collect large strings more efficiently. That said, comment parsing is probably not where the time goes.

  4. It may be possible to speed up individual scan functions, such as for comments by 50%, but I'm skeptical that scanning overall can be made 50% faster; and certainly not 90% (but I'd be happy to be proven otherwise). That said, let's assume the scanner runs twice as fast. Since scanning is a very small part of overall compile time (I'm guessing < 5%), that would improve compilation speed by 2.5% at the very best. While this would be nice to have, bigger gains can be gotten elsewhere.

  5. The real kicker is the big switch in the scanner, which is probably not translated efficiently. I suspect to really get to a super-efficient scanner, one might want to employ a highly optimizing lexer generator that produces a state machine. That said, note that moving away from lex and yacc (as originally used in Go), made parsing faster... The compiler's parser can process in the order of 1M to 1.5M lines/s on a modern machine - that's parsing and scanning. It's not the bottleneck in the compiler.

  6. Regarding your suggestion of using a character translation array: If it's all but looking for NUL's, CR's, and chars > 127, using a short sequence of comparisons is more efficient than a lookup table on modern processors: unless that table happens to always reside in L0 cache (which is almost impossible to control), memory accesses are going to kill performance.

Then there's also the bigger issue of how C/C++ deals with large pieces of software compared to Go: Because C/C++ uses include files, an enormous amount of source text is consumed in each compilation, vastly more than for an equivalent Go program (because Go has proper imports). Thus, super fast scanning is much more important for C/C++ than for Go.

In summary, I don't think special use cases such as scanning/parsing are a good reason for extending Go itself by some sort of compile-time language. In fact, those special use cases led to specialized code generators such as yacc, antlr, coco/r, etc. which can do a much better job.

@typeless
Copy link

It looks like what you want is something like constexpr in C++.
Such a construct in Go would sort of be done by adding a new directive //go:constexpr, which does not need a full-fledged CTFE or macro system.

@ianlancetaylor
Copy link
Contributor

There was further discussion but no change in consensus. The costs of this proposal remain unaddressed. Closing.

@golang golang locked and limited conversation to collaborators Mar 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

10 participants