syscall: starting a process with large number of arguments is way too slow #41825

bobjalex · 2020-10-06T22:22:14Z

What version of Go are you using (`go version`)?

$ go version
go version go1.15.2 windows/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\bobja\AppData\Local\go-build
set GOENV=C:\Users\bobja\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=C:\GoLib\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=C:\GoLib
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=c:\go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=c:\go\pkg\tool\windows_amd64
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\bobja\AppData\Local\Temp\go-build291250466=/tmp/go-build -gno-record-gcc-switches

What did you do?

started a process with lots of arguments (thousands)

What did you expect to see?

Process runs

What did you see instead?

Process runs but starts slowly, much slower that equivalent launch in Python.

And I have the solution...

In investigating the said slowness, I gathered some timings,
running on my fairly modern Windows 10 laptop:

Starting a nop process 100 times  (table looks best with a monospace font)

                                 0 args           10,000 args
                                 ------           -----------
Go as ships with built-in        1148ms             6005ms
Windows 5ms delay

Go without 5ms delay              479ms             5607ms

Python                            457ms             1342ms

Go with the below fix             445ms              845ms

The fix is a few lines in the syscall/exec_windows.go file, at line 86 in my 1.15.2 source:

Replace:

func makeCmdLine(args []string) string {
	var s string
	for _, v := range args {
		if s != "" {
			s += " "
		}
		s += EscapeArg(v)
	}
	return s
}

with:

func makeCmdLine(args []string) string {
	var b []byte
	for _, v := range args {
		if b != nil {
			b = append(b, ' ')
		}
		b = append(b, []byte(EscapeArg(v))...)
	}
	return string(b)
}

I hope someone on the Go team can make this modification.

Here is the program I used for the benchmarks:
{you will have to provide your own "nop" command -- set the nopCommand constant)

package main

import (
	"golang.org/x/text/message"
	"os/exec"
	"time"
)

const (

    // nopCommand is a command that accepts any arguments, and does
    // no processing that depends on the number or content of arguments.
	nopCommand = `C:\winbin\nop`

	repetitions = 100
	argCount    = 10_000
)

var p *message.Printer

func init() {
	p = message.NewPrinter(message.MatchLanguage("en"))
}

func main() {
	p.Print(repetitions, " repetitions\n\n")
	test(0)
	args10 := test(10)
	test(50)
	test(200)
	test(1000)
	argsMany := test(argCount)

	p.Printf("\nRatio of %d : 10 argument durations: %.2f\n", argCount,
		float64(argsMany.Milliseconds())/float64(args10.Milliseconds()))
}

func test(argCount int) time.Duration {
	args := make([]string, argCount)
	for i := range args {
		args[i] = "x"
	}
	t0 := time.Now()
	for i := 0; i < repetitions; i++ {
		cmd := exec.Command(nopCommand, args...)
		cmd.Run()
	}
	duration := time.Now().Sub(t0)
	p.Printf("%d arguments: %dms\n", argCount, duration.Milliseconds())
	return duration
}

```-----------------------------------------------------

Thanks,
Bob

-

The text was updated successfully, but these errors were encountered:

gopherbot · 2020-10-06T23:32:46Z

Change https://golang.org/cl/259978 mentions this issue: syscall: rewrite Windows makeCmdLine to use []byte

alexbrainman · 2020-10-07T08:02:14Z

@bobjalex

I tried running your test before and after @ianlancetaylor change is applied

https://go-review.googlesource.com/c/go/+/259978

100 repetitions

0 arguments: 1,590ms
10 arguments: 1,637ms
50 arguments: 1,592ms
200 arguments: 1,611ms
1,000 arguments: 1,658ms
10,000 arguments: 7,828ms

Ratio of 10,000 : 10 argument durations: 4.78

100 repetitions

0 arguments: 1,563ms
10 arguments: 1,456ms
50 arguments: 1,526ms
200 arguments: 1,575ms
1,000 arguments: 1,565ms
10,000 arguments: 6,768ms

Ratio of 10,000 : 10 argument durations: 4.65

Not much difference in time it takes to execute your program. Until you have 10,000 arguments. I am yet to see that program that takes 10,000 arguments.

Alex

bobjalex · 2020-10-07T15:38:28Z

Thanks for the note. Granted, thousands of arguments is not common, but still don't want to see Go doing it (much) more slowly than Python :) Bob

…

On Wed, Oct 7, 2020 at 1:02 AM Alex Brainman ***@***.***> wrote: @bobjalex <https://github.com/bobjalex> I tried running your test before and after @ianlancetaylor <https://github.com/ianlancetaylor> change is applied https://go-review.googlesource.com/c/go/+/259978 100 repetitions 0 arguments: 1,590ms 10 arguments: 1,637ms 50 arguments: 1,592ms 200 arguments: 1,611ms 1,000 arguments: 1,658ms 10,000 arguments: 7,828ms Ratio of 10,000 : 10 argument durations: 4.78 100 repetitions 0 arguments: 1,563ms 10 arguments: 1,456ms 50 arguments: 1,526ms 200 arguments: 1,575ms 1,000 arguments: 1,565ms 10,000 arguments: 6,768ms Ratio of 10,000 : 10 argument durations: 4.65 Not much difference in time it takes to execute your program. Until you have 10,000 arguments. I am yet to see that program that takes 10,000 arguments. Alex — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#41825 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLA2AKGW6FYKVXGWUHBLLTSJQOBRANCNFSM4SGTXR4A> .

networkimprov · 2020-10-07T16:45:59Z

@bobjalex does the merged fix improve your example as expected? Alex found it had virtually no effect.

bobjalex · 2020-10-07T17:46:43Z

Yes, it had a huge and easily noticeable effect on my experiments with 10,000+ arguments. For more typical numbers of arguments, there might be a miniscule improvement, but not human-noticeable.

…

On Wed, Oct 7, 2020 at 9:46 AM Liam ***@***.***> wrote: @bobjalex <https://github.com/bobjalex> does the merged fix improve your example as expected? Alex found it had virtually no effect. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#41825 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLA2APY4YQJ24CXUJKG7T3SJSLNVANCNFSM4SGTXR4A> .

bobjalex · 2020-10-07T17:50:51Z

Unless,,, you run a command with a few arguments 10,000 times.

…

On Wed, Oct 7, 2020 at 10:46 AM Bob Alexander ***@***.***> wrote: Yes, it had a huge and easily noticeable effect on my experiments with 10,000+ arguments. For more typical numbers of arguments, there might be a miniscule improvement, but not human-noticeable. On Wed, Oct 7, 2020 at 9:46 AM Liam ***@***.***> wrote: > @bobjalex <https://github.com/bobjalex> does the merged fix improve your > example as expected? Alex found it had virtually no effect. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#41825 (comment)>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABLA2APY4YQJ24CXUJKG7T3SJSLNVANCNFSM4SGTXR4A> > . >

bobjalex · 2020-10-07T18:03:44Z

Sorry about all the little messages, but let explain better... I was running an experiment, written in Go, running a command repeatedly with an increasingly large number of arguments to see when it would break due to too many argument. The experent program produced a "progress dot" on the console between each invocation. The progress dots were produced much more slowly than I would have expected. So, wrote the same experiment in Python,, and the dots were very fast. That didn't seem right, so I located the bottleneck and reported it.

…

On Wed, Oct 7, 2020 at 10:50 AM Bob Alexander ***@***.***> wrote: Unless,,, you run a command with a few arguments 10,000 times. On Wed, Oct 7, 2020 at 10:46 AM Bob Alexander ***@***.***> wrote: > Yes, it had a huge and easily noticeable effect on my experiments with > 10,000+ arguments. For more typical numbers of arguments, there might be a > miniscule improvement, but not human-noticeable. > > On Wed, Oct 7, 2020 at 9:46 AM Liam ***@***.***> wrote: > >> @bobjalex <https://github.com/bobjalex> does the merged fix improve >> your example as expected? Alex found it had virtually no effect. >> >> — >> You are receiving this because you were mentioned. >> Reply to this email directly, view it on GitHub >> <#41825 (comment)>, or >> unsubscribe >> <https://github.com/notifications/unsubscribe-auth/ABLA2APY4YQJ24CXUJKG7T3SJSLNVANCNFSM4SGTXR4A> >> . >> >

gopherbot · 2020-10-07T18:37:37Z

Change https://golang.org/cl/260397 mentions this issue: syscall: restore EscapeArg behavior for empty string

networkimprov · 2020-10-07T19:13:53Z

To clarify, I was referring to the just-merged fix, vs the fix you suggested (which is much simpler).

bobjalex · 2020-10-07T22:45:17Z

I tried, but couldn't figure out how to see the latest change -- not too proficient with github's web interface. But I did try the only simplification I could think of: func makeCmdLine(args []string) string { var b []byte for _, v := range args { b = append(b, ' ') b = append(b, []byte(EscapeArg(v))...) } return string(b[1:]) } Seems like it might be a bit faster, eliminating a boolean test in each iteration and performing the same number of appends minus 1, But my timing tests show virtually no difference for 10,000 args. But I think I like this one more though -- a bit simpler. If the latest change is different from this, could you send me a link to a page where I could see it?

…

On Wed, Oct 7, 2020 at 12:14 PM Liam ***@***.***> wrote: To clarify, I was referring to the just-merged fix, vs the fix you suggested (which is much simpler). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#41825 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLA2ALFCJ42HYN7ANW55UTSJS4YFANCNFSM4SGTXR4A> .

networkimprov · 2020-10-07T23:11:00Z

Here's a patch for the newly merged change which you can download & apply to your local go tree (first undo the change you made):
https://github.com/golang/go/commit/49225854.patch

To view on github: 49225854

Hopefully this gives the same performance boost as your change; if not, please tell us what you got.

bobjalex · 2020-10-07T23:49:56Z

OK -- thanks. Looks to be the same as my original suggestion with b == nil replaced by len(b) == .0. *Seems* to me that a comparison to nil would be a wee bit faster than a length comparison, and it *is* guaranteed that b will be nil on the first iteration and never again. But I doubt it will show any difference in the benchmark -- the compiler will probably optimize either one to the same instructions But I will try it and let you know if there is a difference. I still think I like that last suggestion I sent to you the best :) It's simpler, and equal speed. (and side steps that comparison issue) But no matter which of the changes survives, it will make a big difference in those 10K argument commands! Bob

…

On Wed, Oct 7, 2020 at 4:11 PM Liam ***@***.***> wrote: Here's a patch for the newly merged change which you can download & apply to your local go tree (first undo the change you made): https://github.com/golang/go/commit/49225854.patch To view on github: 4922585 <49225854> Hopefully this gives the same performance boost as your change; if not, please tell us what you got. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#41825 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLA2AKLNV73UROE42KR35DSJTYRHANCNFSM4SGTXR4A> .

networkimprov · 2020-10-08T02:34:31Z

Please could you test the new patch, and report the results for the benchmark you tried earlier?

Accidentally broken by CL 259978. For #41825 Change-Id: Id663514e6eefa325faccdb66493d0bb2b3281046 Reviewed-on: https://go-review.googlesource.com/c/go/+/260397 Trust: Ian Lance Taylor <iant@golang.org> Trust: Alex Brainman <alex.brainman@gmail.com> Trust: Emmanuel Odeke <emm.odeke@gmail.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Alex Brainman <alex.brainman@gmail.com> Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

bobjalex · 2020-10-09T23:46:59Z

OK, I've done a bunch of timings. Results and some commentary are attached, as well as the Go program I used to perform the benchmarks. Quick summary: all of 3 new versions of makeCmdLine are pretty close to the same. I can't tell which one is faster since I'm running on ordinary home Windows computers that have things going on in the background, and run-to-run times are variable. If you guys have a "lab" machine that is quiet without the background stuff, maybe you could benchmark in that environment and see more conclusive results. Let me know if there is anything else I can do to help. (or if I forgot anything :) Bob

On Wed, Oct 7, 2020 at 7:34 PM Liam ***@***.***> wrote: Please could you test the new patch, and report the results for the benchmark you tried earlier? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#41825 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLA2AJ7P3R6A774Q4ILL5LSJUQMLANCNFSM4SGTXR4A> .

Comparison Timing Tests of 3 new implementations of syscall/makeCmdLine The test sequence was performed on 2 different home computers. To partially mitigate the effect of the chaos of miscellaneous background activity, lots of timings were run. The whole test set was run twice, pass 1 and pass 2. Each pass consists of tests of 3 implementations. For each implementations, a benchmark program was run twice. The benchmark program performs timings for varous numbers of command line arguments. Each timing performs 100 launches. The 3 implementations are: Bob's 1st way: the implementation change first suggested by Bob New way: the implementation as entered by the Go team, a revision of Bob's 1st way Bob's simplified way: the simplest implementation of the lot, same idea simplified by removing the string length test in each iteration: func makeCmdLine(args []string) string { var b []byte for _, v := range args { b = append(b, ' ') b = append(b, []byte(EscapeArg(v))...) } return string(b[1:]) } Bob's summary: all of the 3 implementations perform about the same in my timings. These tests are not sufficiently consistent to show a clear winner. There is one test of the original Go 1.15.2 implemenataion in this document, and it suggests that the improvement in the new implementations becomes significant as argument counts exceed 1000. With fewer that 1000 arguments the difference is not significant. Is the change worthwhile, since more that 1,000 arguments rarely occurs? Yes, if for no other reason than the code is better in that it is properly scalable. Building strings by successive concatenation is a classic performance warning. Other significant software products such as operating systems (Unix, Windows, ...) and programming language libraries (Python, Java, ...) have apparently found it useful to use algorithms that do not have order n-squared perfomance, since they do support large argument lists and handle them performantly. (Is that a word?) Details of the experiments follow: ==================================================================== Results from Intel Core-I5 about 2 yr old Pass 1 ------ Original way (1.15.2) 0 arguments: 490ms 10 arguments: 463ms 50 arguments: 459ms 200 arguments: 481ms 1,000 arguments: 566ms 10,000 arguments: 4,995ms 16,000 arguments: 11,949ms 0 arguments: 445ms 10 arguments: 446ms 50 arguments: 449ms 200 arguments: 473ms 1,000 arguments: 610ms 10,000 arguments: 5,085ms 16,000 arguments: 12,076ms New way 0 arguments: 447ms 10 arguments: 454ms 50 arguments: 446ms 200 arguments: 459ms 1,000 arguments: 497ms 10,000 arguments: 837ms 16,000 arguments: 1,174ms 0 arguments: 438ms 10 arguments: 446ms 50 arguments: 451ms 200 arguments: 487ms 1,000 arguments: 517ms 10,000 arguments: 891ms 16,000 arguments: 1,201ms Bob's Simplified way 0 arguments: 450ms 10 arguments: 454ms 50 arguments: 486ms 200 arguments: 494ms 1,000 arguments: 519ms 10,000 arguments: 846ms 16,000 arguments: 1,049ms 0 arguments: 468ms 10 arguments: 467ms 50 arguments: 471ms 200 arguments: 486ms 1,000 arguments: 517ms 10,000 arguments: 855ms 16,000 arguments: 1,039ms Bob's 1st way 0 arguments: 481ms 10 arguments: 473ms 50 arguments: 469ms 200 arguments: 480ms 1,000 arguments: 512ms 10,000 arguments: 819ms 16,000 arguments: 985ms 0 arguments: 460ms 10 arguments: 460ms 50 arguments: 464ms 200 arguments: 480ms 1,000 arguments: 502ms 10,000 arguments: 814ms 16,000 arguments: 983ms Pass 2 ------ New way again 0 arguments: 461ms 10 arguments: 456ms 50 arguments: 465ms 200 arguments: 477ms 1,000 arguments: 506ms 10,000 arguments: 817ms 16,000 arguments: 986ms 0 arguments: 461ms 10 arguments: 461ms 50 arguments: 461ms 200 arguments: 473ms 1,000 arguments: 503ms 10,000 arguments: 811ms 16,000 arguments: 1,004ms Bob's simplified way again 0 arguments: 467ms 10 arguments: 463ms 50 arguments: 462ms 200 arguments: 477ms 1,000 arguments: 506ms 10,000 arguments: 812ms 16,000 arguments: 977ms 0 arguments: 460ms 10 arguments: 458ms 50 arguments: 462ms 200 arguments: 474ms 1,000 arguments: 503ms 10,000 arguments: 814ms 16,000 arguments: 983ms Bob's 1st way again 0 arguments: 471ms 10 arguments: 465ms 50 arguments: 461ms 200 arguments: 474ms 1,000 arguments: 502ms 10,000 arguments: 810ms 16,000 arguments: 981ms 0 arguments: 455ms 10 arguments: 457ms 50 arguments: 459ms 200 arguments: 470ms 1,000 arguments: 500ms 10,000 arguments: 813ms 16,000 arguments: 984ms ==================================================================== Results from Intel Core-I7 about 4 yr old Pass 1 ------ New way 0 arguments: 1,229ms 10 arguments: 1,003ms 50 arguments: 980ms 200 arguments: 1,097ms 1,000 arguments: 1,031ms 10,000 arguments: 1,500ms 16,000 arguments: 1,857ms 0 arguments: 1,040ms 10 arguments: 999ms 50 arguments: 1,036ms 200 arguments: 1,016ms 1,000 arguments: 1,057ms 10,000 arguments: 1,407ms 16,000 arguments: 1,660ms Bob's simplified way 0 arguments: 1,039ms 10 arguments: 1,018ms 50 arguments: 994ms 200 arguments: 1,032ms 1,000 arguments: 1,100ms 10,000 arguments: 1,356ms 16,000 arguments: 1,532ms 0 arguments: 1,022ms 10 arguments: 953ms 50 arguments: 1,074ms 200 arguments: 982ms 1,000 arguments: 992ms 10,000 arguments: 1,336ms 16,000 arguments: 1,555ms Bob's 1st way 0 arguments: 1,009ms 10 arguments: 996ms 50 arguments: 1,020ms 200 arguments: 1,026ms 1,000 arguments: 981ms 10,000 arguments: 1,340ms 16,000 arguments: 1,697ms 0 arguments: 1,042ms 10 arguments: 950ms 50 arguments: 956ms 200 arguments: 1,013ms 1,000 arguments: 1,061ms 10,000 arguments: 1,378ms 16,000 arguments: 1,590ms Pass 2 ------ New way again 0 arguments: 1,203ms 10 arguments: 956ms 50 arguments: 968ms 200 arguments: 950ms 1,000 arguments: 1,017ms 10,000 arguments: 1,359ms 16,000 arguments: 1,550ms 0 arguments: 980ms 10 arguments: 961ms 50 arguments: 982ms 200 arguments: 1,143ms 1,000 arguments: 1,013ms 10,000 arguments: 1,328ms 16,000 arguments: 1,608ms Bob's simplified way again 0 arguments: 1,060ms 10 arguments: 983ms 50 arguments: 1,034ms 200 arguments: 995ms 1,000 arguments: 1,072ms 10,000 arguments: 1,314ms 16,000 arguments: 1,514ms 0 arguments: 1,028ms 10 arguments: 981ms 50 arguments: 974ms 200 arguments: 1,072ms 1,000 arguments: 980ms 10,000 arguments: 1,344ms 16,000 arguments: 1,538ms Bob's 1st way again 0 arguments: 968ms 10 arguments: 1,002ms 50 arguments: 1,050ms 200 arguments: 1,011ms 1,000 arguments: 1,018ms 10,000 arguments: 1,326ms 16,000 arguments: 1,534ms 0 arguments: 1,054ms 10 arguments: 995ms 50 arguments: 1,048ms 200 arguments: 1,065ms 1,000 arguments: 1,016ms 10,000 arguments: 1,313ms 16,000 arguments: 1,497ms

networkimprov · 2020-10-10T02:02:33Z

Thanks, that clarifies the issue :-)

tandr · 2020-10-10T05:53:29Z

Well, if I read it correctly there is "at least" 10-12x improvement already, which is not bad. (10-12 sec to 1 sec)

Now, to think of it... there is a whole lot of memory buffer expansion is going on. Makes me wonder if doing something like this would help (pseudo code, just an idea, might not compile)

func estimateLen(args []string) int {
	// very rough preestimate for a  buffer length
	var estimate = len(args) // spaces between args
	for _, v := range args {
		var estimateQuote, estimateSlash int
		for _, s := range v {
			switch s {
			case '\\':
				estimateSlash += 1
			case '"':
				estimateSlash += 1
				needQuote = 1

			case ' ', '\t':
				needQuote = 1

			}
		}
		estimate += len(v) + needQuote*(estimateSlash+1)
	}
	return estimate
}

func makeCmdLine(args []string) string {
	estimate := 1024
	if len(args) > 100 {
		estimate = estimateLen(args)
	}

	var b []byte = make([]byte, 0, estimate)
	for _, v := range args {
		b = append(b, ' ')
		b = append(b, []byte(EscapeArg(v))...)
	}
	return string(b[1:])
}

(but I suspect conversion []byte(EscapeArg(v))... might make whole thing a matter of "noise")

Sorry, I don't have Windows machine handy to try it myself

bobjalex · 2020-10-10T21:10:58Z

I did some experiments to determine how helpful preallocation is for this case. Results attached...

-- Bob

On Fri, Oct 9, 2020 at 10:53 PM tandr ***@***.***> wrote: Well, if I read it correctly there is "at least" 10-12x improvement already, which is not bad. (10-12 sec to 1 sec) Now, to think of it... there is a whole lot of memory buffer expansion is going on. Makes me wonder if doing something like this would help (pseudo code, just an idea, might not compile) func estimateLen(args []string) int { // very rough preestimate for a buffer length var estimate = len(args) // spaces between args for _, v := range args { var estimateQuote, estimateSlash int for _, s := range v { switch s { case '\\': estimateSlash += 1 case '"': estimateSlash += 1 needQuote = 1 case ' ', '\t': needQuote = 1 } } estimate += len(v) + needQuote*(estimateSlash+1) } return estimate } func makeCmdLine(args []string) string { estimate := 1024 if len(args) > 100 { estimate = estimateLen(args) } var b []byte = make([]byte, 0, estimate) for _, v := range args { b = append(b, ' ') b = append(b, []byte(EscapeArg(v))...) } return string(b[1:]) } (but I suspect conversion []byte(EscapeArg(v))... might make whole thing a matter of "noise") Sorry, I don't have Windows machine handy to try it myself — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#41825 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLA2ANBF4PPSUNGWNDFUILSJ7ZGNANCNFSM4SGTXR4A> .

To pre-validate your suggested approach, I did an experiment with byte slice preallocation in a very crude way specific to this test that would result in no subsequent allocations. The reasoning is that this simplified preallocation would show at least as much improvement as the more accurate way, since it omits scanning the arguments. Below are the algorithms I compared and the results. The best 16,000 arg result from each test sequence shows preallocation: 967ms, no preallocation: 975ms -- a 0.82% improvement. BTW: in case you're not a Windows person, Windows command lines are limited to 32,767 bytes, so, since args are space separated, 16,000 is pretty close to the max. Does the improvement warrant the change? My initial thought is to not perform preallocation: - The less than 1% improvement would not likely be noticable for normal numbers of arguments, and also probably not either for huge nummbers. - Calculating the estimate greatly increases complexity of the code. The code I used for the comparison: // Simplified way with no preallocation func makeCmdLine(args []string) string { var b []byte for _, v := range args { b = append(b, ' ') b = append(b, []byte(EscapeArg(v))...) } return string(b[1:]) } // Simplified way with very crude preallocation taylored for just this test: // 2 bytes per argument (each space-separated arg = "x" thus is unquoted) // plus 100 extra) func makeCmdLine(args []string) string { var b = make([]byte, 0, len(args)*2+100) for _, v := range args { b = append(b, ' ') b = append(b, []byte(EscapeArg(v))...) } return string(b[1:]) } =============================== Simplified way with very crude preallocation: 0 arguments: 447ms 10 arguments: 453ms 50 arguments: 467ms 200 arguments: 471ms 1,000 arguments: 501ms 10,000 arguments: 849ms 16,000 arguments: 1,120ms (OS must have stolen some cycles here :) 0 arguments: 454ms 10 arguments: 455ms 50 arguments: 465ms 200 arguments: 464ms 1,000 arguments: 511ms 10,000 arguments: 809ms 16,000 arguments: 967ms 0 arguments: 457ms 10 arguments: 457ms 50 arguments: 457ms 200 arguments: 470ms 1,000 arguments: 502ms 10,000 arguments: 783ms 16,000 arguments: 969ms Simplified way without preallocation. 0 arguments: 465ms 10 arguments: 458ms 50 arguments: 463ms 200 arguments: 470ms 1,000 arguments: 495ms 10,000 arguments: 796ms 16,000 arguments: 977ms 0 arguments: 459ms 10 arguments: 455ms 50 arguments: 455ms 200 arguments: 470ms 1,000 arguments: 499ms 10,000 arguments: 805ms 16,000 arguments: 975ms 0 arguments: 461ms 10 arguments: 450ms 50 arguments: 455ms 200 arguments: 471ms 1,000 arguments: 502ms 10,000 arguments: 807ms 16,000 arguments: 980ms

tandr · 2020-10-11T07:05:15Z

Thank you @bobjalex. With these numbers I guess we should put it to rest. (unless you really want to run profiler to see where the time is mostly spent.)

networkimprov · 2020-10-11T07:48:23Z

This issue doesn't need any further work given the results above.

bobjalex · 2020-10-15T19:59:17Z

I agree with putting it to bed -- the command line generation is nice and fast now, or will be when it gets into a public release. Just installed Go 1.15.3. To tweak my Go install to get external process invocation performance "at least as good as Python", here's what I do: - Remove the 5ms delay from os.exec_windows.go. I've been doing that for a long time now (couple of years), and never have had a related problem (I have 2 less than 5 year old Windows Intel-based home computers). - Patch into syscall.exec_windows.go our recent makeCmdLine speedup. About that 5ms delay: I have been assured in past conversations that without the delay errors sometimes occur when deleting the executable immediately after return from process wait. The comment in the code says "// NOTE(brainman): It seems that sometimes process is not dead when WaitForSingleObject returns.... I suspect that a more accurate statement would be "the process is complete but the executable has not yet been closed". Is the contract of "wait" that the executable is closed on return? By far most external process launches don't care whether the executable is closed before completion is signalled. Closing of the executable is part of the OS's cleanup after running a process and may be done concurrently after wait returns -- the caller should not have to wait for that For Windows, those few programs that run processes and then delete the executable (such as "go run"?) might have to deal with an executable that is still open. Note that Unix does not have this concern, since it's OK to remove an open file (then the file is later physically deleted as soon as it has no openers left). Suggestion: - remove the unconditional 5ms delay - in the few programs that want to delete the executable after completion, the *client* should: - be sure to check the status of the deletion operation - if error: - retry a few times, with a small delay between retries (assuming that the executable will soon be closed) - give up if success does not happen soon, announcing the failed deletion (if possible)

…

On Sun, Oct 11, 2020 at 12:48 AM Liam ***@***.***> wrote: This issue doesn't need any further work given the results above. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#41825 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLA2AJRX4EHAG6MJ6OQBIDSKFPNHANCNFSM4SGTXR4A> .

networkimprov · 2020-10-15T20:22:05Z

Please post your thoughts re the 5ms delay on #25965 or #31895.

ianlancetaylor changed the title ~~Starting a process with large number of arguments is way too slow~~ syscall: starting a process with large number of arguments is way too slow Oct 6, 2020

ianlancetaylor added help wanted NeedsFix The path to resolution is known, but the work has not been done. OS-Windows labels Oct 6, 2020

ianlancetaylor added this to the Backlog milestone Oct 6, 2020

ianlancetaylor removed the help wanted label Oct 6, 2020

gopherbot closed this as completed in 4922585 Oct 7, 2020

bobjalex mentioned this issue Oct 16, 2020

os: remove 5ms sleep on Windows in (*Process).Wait #25965

Closed

golang locked and limited conversation to collaborators Oct 15, 2021

gopherbot added the FrozenDueToAge label Oct 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

syscall: starting a process with large number of arguments is way too slow #41825

syscall: starting a process with large number of arguments is way too slow #41825

bobjalex commented Oct 6, 2020

gopherbot commented Oct 6, 2020

alexbrainman commented Oct 7, 2020

bobjalex commented Oct 7, 2020 via email

networkimprov commented Oct 7, 2020

bobjalex commented Oct 7, 2020 via email

bobjalex commented Oct 7, 2020 via email

bobjalex commented Oct 7, 2020 via email

gopherbot commented Oct 7, 2020

networkimprov commented Oct 7, 2020

bobjalex commented Oct 7, 2020 via email

networkimprov commented Oct 7, 2020

bobjalex commented Oct 7, 2020 via email

networkimprov commented Oct 8, 2020

bobjalex commented Oct 9, 2020 via email

networkimprov commented Oct 10, 2020

tandr commented Oct 10, 2020

bobjalex commented Oct 10, 2020 via email

tandr commented Oct 11, 2020

networkimprov commented Oct 11, 2020

bobjalex commented Oct 15, 2020 via email

networkimprov commented Oct 15, 2020

syscall: starting a process with large number of arguments is way too slow #41825

syscall: starting a process with large number of arguments is way too slow #41825

Comments

bobjalex commented Oct 6, 2020

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Here is the program I used for the benchmarks: {you will have to provide your own "nop" command -- set the nopCommand constant)

gopherbot commented Oct 6, 2020

alexbrainman commented Oct 7, 2020

bobjalex commented Oct 7, 2020 via email

networkimprov commented Oct 7, 2020

bobjalex commented Oct 7, 2020 via email

bobjalex commented Oct 7, 2020 via email

bobjalex commented Oct 7, 2020 via email

gopherbot commented Oct 7, 2020

networkimprov commented Oct 7, 2020

bobjalex commented Oct 7, 2020 via email

networkimprov commented Oct 7, 2020

bobjalex commented Oct 7, 2020 via email

networkimprov commented Oct 8, 2020

bobjalex commented Oct 9, 2020 via email

networkimprov commented Oct 10, 2020

tandr commented Oct 10, 2020

bobjalex commented Oct 10, 2020 via email

tandr commented Oct 11, 2020

networkimprov commented Oct 11, 2020

bobjalex commented Oct 15, 2020 via email

networkimprov commented Oct 15, 2020

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

Here is the program I used for the benchmarks:
{you will have to provide your own "nop" command -- set the nopCommand constant)