net/url: optimize unescape and escape #17860

dsnet · 2016-11-08T22:59:26Z

According to a profiling data from a large number of servers at Google, URL related functions are among the top 150 functions by CPU time. The simplest optimizations I see are around using LUTs in shouldEscape, isHex, and unHex.

The text was updated successfully, but these errors were encountered:

LMMilewski · 2017-02-08T02:18:03Z

Based on the profiling data that you mention, it looks like the unescape time is split evenly between runtime.makeslice and runtime.slicebytetostring.
The net/url.shouldEscape function is responsible only for ~7% of the CPU time. LUTs might not help as much as we would like.

bradfitz · 2017-02-08T06:30:34Z

Regarding the allocations and slicebytetostring, see compiler feature request #6714.

/cc @randall77 @cherrymui @josharian @dr2chase

bradfitz · 2017-02-08T06:43:36Z

See also: #18990

ccbrown · 2017-09-12T03:29:56Z

I'm not sure it's the kind of optimization you want, but I find that if I do some "hand optimization", just eliminating branches, I get pretty significant speed increases:

const ESCAPE_ME = "net/url: optimize unescape and escape"

var s string

func BenchmarkPathEscape(b *testing.B) {
	for i := 0; i < b.N; i++ {
		s = url.PathEscape(ESCAPE_ME)
	}
}

func shouldPathEscape(c byte) bool {
	if 'A' <= c && c <= 'Z' || 'a' <= c && c <= 'z' || '0' <= c && c <= '9' {
		return false
	}
	switch c {
	case '-', '_', '.', '~', ':', '@', '&', '=', '+', '$':
		return false
	}
	return true
}

func MyPathEscape(s string) string {
	hexCount := 0
	for i := 0; i < len(s); i++ {
		if shouldPathEscape(s[i]) {
			hexCount++
		}
	}
	if hexCount == 0 {
		return s
	}
	t := make([]byte, len(s)+2*hexCount)
	j := 0
	for i := 0; i < len(s); i++ {
		c := s[i]
		if shouldPathEscape(c) {
			t[j] = '%'
			t[j+1] = "0123456789ABCDEF"[c>>4]
			t[j+2] = "0123456789ABCDEF"[c&15]
			j += 3
		} else {
			t[j] = c
			j++
		}
	}
	return string(t)
}

func BenchmarkMyPathEscape(b *testing.B) {
	for i := 0; i < b.N; i++ {
		s = MyPathEscape(ESCAPE_ME)
	}
}

goos: darwin
goarch: amd64
BenchmarkPathEscape-8     	 3000000	       438 ns/op
BenchmarkMyPathEscape-8   	10000000	       147 ns/op

It seems to me like calls to functions like escape should probably be inlined when a parameter on which many branches depend is given a const argument:

func PathEscape(s string) string {
    return escape(s, encodePathSegment) // not inlined, but probably should be
}

martisch · 2017-09-12T06:28:32Z

I also experimented with some optimizations (mostly unescpae) but have not cleaned them up in a CL yet. So i am interested what we can improve too. Maybe i still get around to finish them for 1.10.

I was thinking of creating a table [256]uint8 where byte contains some flags. e.g. shouldPathEscape (if we need more flags this can become uint32)
so we can use the same lookup table in multiple functions to quickly look up properties of ASCII characters while at the same time not wasting ram for a lookup table each time. This could be even stored in a separate package so multiple std lib packages (e.g unicode.IsSpace or strings.Fields) can use the same table.

For unhex i prefer using 9*(c>>6)` + (c&15) vs a LUT. I will have to benchmark this again but will send as a small separate improvement for the std lib occurrences if it still checks out to be faster than the current branch based version.

@ccbrown
Some comments while briefly looking at the code from the above comment:

t[j+2] could maybe avoid some bounds check here by checking _ = t[j+2] earlier in the block.
we could remember the first position where a to be escaped symbol is and skip ahead to that in the second loop after making a copy of the start up to that symbol. But that would make performance worse when to position is very near the start of the string.

if shouldPathEscape(s[i]) {
 hexCount++
}

if this is not already recognized by the compiler to become a setCC instruction with an add on AMD6 4 instead of a branch we should have a look at optimizing that. (maybe there is even an add that just takes the flag without setCC)

Does your new version pass all the tests?

ccbrown · 2017-09-12T14:14:39Z

Tried a few more things.

Using a boolean LUT does make things faster:

func MyBoolLUTEscape(s string, shouldEscapeLUT [256]bool) string {
	hexCount := 0
	for i := 0; i < len(s); i++ {
		if shouldEscapeLUT[s[i]] {
			hexCount++
		}
	}
	if hexCount == 0 {
		return s
	}
	t := make([]byte, len(s)+2*hexCount)
	j := 0
	for i := 0; i < len(s); i++ {
		c := s[i]
		if shouldEscapeLUT[c] {
			t[j] = '%'
			t[j+1] = "0123456789ABCDEF"[c>>4]
			t[j+2] = "0123456789ABCDEF"[c&15]
			j += 3
		} else {
			t[j] = c
			j++
		}
	}
	return string(t)
}

BenchmarkPathEscape-8              	 3000000	       438 ns/op
BenchmarkMyPathEscape-8            	10000000	       144 ns/op
BenchmarkMyBoolLUTPathEscape-8     	10000000	       128 ns/op

Compacting all of the different encodings into a single [256]byte LUT is slower, but still reasonably fast (and would potentially save 6*256 bytes in memory):

func MyLUTEscape(s string, mode encoding) string {
	hexCount := 0
	mask := byte(1 << byte(mode))
	for i := 0; i < len(s); i++ {
		if shouldEscapeLUT[s[i]]&mask != 0 {
			hexCount++
		}
	}
	if hexCount == 0 {
		return s
	}
	t := make([]byte, len(s)+2*hexCount)
	j := 0
	for i := 0; i < len(s); i++ {
		c := s[i]
		if shouldEscapeLUT[s[i]]&mask != 0 {
			t[j] = '%'
			t[j+1] = "0123456789ABCDEF"[c>>4]
			t[j+2] = "0123456789ABCDEF"[c&15]
			j += 3
		} else {
			t[j] = c
			j++
		}
	}
	return string(t)
}

BenchmarkMyLUTEscape-8             	10000000	       151 ns/op

Note while these functions can be re-used by all of the encodings, I'm still excluding the c == ' ' && mode == encodeQueryComponent case for now. If needed, query component encoding can get its own function implementation.

@martisch

t[j+2] could maybe avoid some bounds check here by checking _ = t[j+2] earlier in the block.

we could remember the first position where a to be escaped symbol is and skip ahead to that in the second loop after making a copy of the start up to that symbol. But that would make performance worse when to position is very near the start of the string.

Tried both. Neither really made a measurable difference for me.

if this is not already recognized by the compiler to become a setCC instruction with an add on AMD64 instead of a branch we should have a look at optimizing that.

That function actually got inlined for me. So it didn't use any conditional instructions like that there.

But in the LUT version, maybe that sort of thing could work here:

perf_test.go:207	0x10f004d		4084f8			TESTL DI, AL
perf_test.go:207	0x10f0050		74e3			JE 0x10f0035
perf_test.go:208	0x10f0052		48ffc6			INCQ SI
perf_test.go:207	0x10f0055		ebde			JMP 0x10f0035

I'm not really gonna try to optimize this at such a low level yet though.

gopherbot · 2018-09-10T11:22:03Z

Change https://golang.org/cl/134296 mentions this issue: net/url: remove an allocation for short strings in escape

Use a 64 byte array to avoid an allocation on the assumption that most url escaping is performed on short strings. Also adds a fast path for escaping strings whose only replacements are spaces which is common in query components. Adds benchmarks for QueryEscape, PathEscape, QueryUnescape and PathUnescape but no optimizations are include for the unescape functions so I don't include those benchmark results here. Reduces allocations by 10% in the existing String benchmark with a modest performance increase. name old time/op new time/op delta QueryEscape/#00-8 64.6ns ± 1% 43.8ns ± 0% -32.14% (p=0.000 n=9+9) QueryEscape/#1-8 276ns ± 3% 249ns ± 0% -9.62% (p=0.000 n=10+7) QueryEscape/#2-8 176ns ± 2% 155ns ± 3% -12.21% (p=0.000 n=10+10) QueryEscape/#3-8 388ns ± 1% 362ns ± 0% -6.55% (p=0.000 n=10+8) QueryEscape/#4-8 2.32µs ± 2% 2.27µs ± 2% -2.26% (p=0.001 n=10+10) PathEscape/#00-8 78.0ns ± 3% 63.4ns ± 1% -18.69% (p=0.000 n=10+10) PathEscape/#1-8 276ns ± 2% 260ns ± 0% -6.01% (p=0.000 n=10+10) PathEscape/#2-8 175ns ± 0% 153ns ± 0% -12.53% (p=0.000 n=8+10) PathEscape/#3-8 389ns ± 2% 361ns ± 0% -7.21% (p=0.000 n=10+9) PathEscape/#4-8 2.30µs ± 2% 2.27µs ± 1% -1.33% (p=0.001 n=9+10) String-8 3.56µs ± 4% 3.42µs ± 7% -4.00% (p=0.003 n=10+10) name old alloc/op new alloc/op delta QueryEscape/#00-8 16.0B ± 0% 8.0B ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#1-8 128B ± 0% 64B ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#2-8 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#3-8 128B ± 0% 64B ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#4-8 832B ± 0% 832B ± 0% ~ (all equal) PathEscape/#00-8 32.0B ± 0% 16.0B ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#1-8 128B ± 0% 64B ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#2-8 64.0B ± 0% 32.0B ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#3-8 128B ± 0% 64B ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#4-8 704B ± 0% 704B ± 0% ~ (all equal) String-8 1.84kB ± 0% 1.66kB ± 0% -9.57% (p=0.000 n=10+10) name old allocs/op new allocs/op delta QueryEscape/#00-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#1-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#2-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#3-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) QueryEscape/#4-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) PathEscape/#00-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#1-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#2-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#3-8 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10) PathEscape/#4-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) String-8 69.0 ± 0% 61.0 ± 0% -11.59% (p=0.000 n=10+10) Updates #17860 Change-Id: I45c5e9d40b242f874c61f6ccc73bf94c494bb868 Reviewed-on: https://go-review.googlesource.com/134296 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

iand · 2018-09-11T15:16:55Z

While working on 134296 I saw that the benchmark for the String() method still shows about 60 allocations being made. This is because we don't pre-size the strings.Builder and also call escape at least 5 times for different url components.

I think this could be optimized by pre-sizing the builder with the known lengths (scheme, opaque, rawquery) and passing the builder to a new escape function that writes to a builder. The existing escape function could be modified to create a builder and call the new function or the existing callers of escape could do it (QueryEscape, PathEscape etc.)

I can work on this if there is interest in this approach.

bradfitz · 2018-09-11T16:44:45Z

@iand, sounds reasonable.

iand · 2018-09-12T00:07:43Z

After spending several more minutes examining the String benchmark I realised that the reported timings and allocations are the cumulative totals of all 20 test cases.

I investigated modifying the various escape functions write to a shared strings.Builder but couldn't reduce the allocations and timings by enough to justify the extra complexity.

Also looked at pre-sizing the builder. It's easy enough to calculate the capacity needed for the fixed components and estimate the length of most of the encoded parts, but path is little more tricky. Performing these calculations adds timing overhead which makes the function slower overall, even with the one or two allocations it saves.

I found I could get just as good a benefit by simply growing the builder to a fixed 64 bytes. Some timings for variants on this approach are below. However, it's likely that I'm just fitting to the test cases.

I'm happy to send a CL for a one line change setting the initial builder size to 64 bytes, but I'm not convinced it adds much value.

Current master:

BenchmarkString-8        500000	      3393 ns/op	    1664 B/op	61 allocs/op

Pre-sized builder.

32 bytes:  BenchmarkString-8   500000	2966 ns/op	    1632 B/op	43 allocs/op
64 bytes:  BenchmarkString-8   500000	2837 ns/op	    1648 B/op	33 allocs/op
128 bytes: BenchmarkString-8   500000	2891 ns/op	    2800 B/op	32 allocs/op

gopherbot · 2019-05-27T19:52:44Z

Change https://golang.org/cl/174998 mentions this issue: net/url: rework shouldEscape func to bitmask flag for performance boost

cristaloleg · 2021-07-01T17:32:27Z

What is the state of this issue? Looks like it's stalled after 1.13 freeze, but no other problems are mentioned. Kindly ping @iand

ahfuzhang · 2023-06-11T08:53:20Z

I write a new version of FastQueryEscape(), it's 7.33 times faster than url.QueryEscape().
Of course, I based your codes.

The detail is here: https://www.cnblogs.com/ahfuzhang/p/17473178.html

gopherbot · 2023-07-31T14:54:55Z

Change https://go.dev/cl/514335 mentions this issue: net/url: optimize QueryEscape and PathEscape

dsnet added the Performance label Nov 8, 2016

dsnet added this to the Go1.9 milestone Nov 8, 2016

bradfitz modified the milestones: Go1.10, Go1.9 May 23, 2017

bradfitz added the help wanted label May 23, 2017

rsc modified the milestones: Go1.10, Go1.11 Nov 22, 2017

bradfitz modified the milestones: Go1.11, Unplanned May 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

net/url: optimize unescape and escape #17860

net/url: optimize unescape and escape #17860

dsnet commented Nov 8, 2016

LMMilewski commented Feb 8, 2017

bradfitz commented Feb 8, 2017

bradfitz commented Feb 8, 2017

ccbrown commented Sep 12, 2017 •

edited

martisch commented Sep 12, 2017 •

edited

ccbrown commented Sep 12, 2017

gopherbot commented Sep 10, 2018

iand commented Sep 11, 2018

bradfitz commented Sep 11, 2018

iand commented Sep 12, 2018

gopherbot commented May 27, 2019

cristaloleg commented Jul 1, 2021

ahfuzhang commented Jun 11, 2023

gopherbot commented Jul 31, 2023

net/url: optimize unescape and escape #17860

net/url: optimize unescape and escape #17860

Comments

dsnet commented Nov 8, 2016

LMMilewski commented Feb 8, 2017

bradfitz commented Feb 8, 2017

bradfitz commented Feb 8, 2017

ccbrown commented Sep 12, 2017 • edited

martisch commented Sep 12, 2017 • edited

ccbrown commented Sep 12, 2017

gopherbot commented Sep 10, 2018

iand commented Sep 11, 2018

bradfitz commented Sep 11, 2018

iand commented Sep 12, 2018

gopherbot commented May 27, 2019

cristaloleg commented Jul 1, 2021

ahfuzhang commented Jun 11, 2023

gopherbot commented Jul 31, 2023

ccbrown commented Sep 12, 2017 •

edited

martisch commented Sep 12, 2017 •

edited