Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: use -64 instead of 0 for "different file" position encoding in binary exporter #20080

Closed
josharian opened this issue Apr 22, 2017 · 4 comments

Comments

@josharian
Copy link
Contributor

The int 0 is used in the binary import/export format to indicate "different file". If the files are the same, an offset of 0 lines is represented by adding a second int of -1. See exporter.pos.

However, over 50% of same-file-different-line position deltas have a line delta of 0, so this representation is inefficient. A simple fix is to use the magic value -64 for "different file". That line offset is rare. -64 is the smallest value that fits in a single byte with varint encoding.

The implementation is trivial--it consists of changing 0 to -64 in a few places.

The savings are small, though, so it's not worth doing unless we are changing the export format for some other reason as well--thus an issue instead of a CL.

cc @mdempsky @griesemer

@griesemer
Copy link
Contributor

I was wondering about that when I was thinking about the encoding. Do you have concrete numbers?

@griesemer griesemer self-assigned this Apr 22, 2017
@griesemer griesemer added this to the Go1.9Maybe milestone Apr 22, 2017
@josharian
Copy link
Contributor Author

I instrumented bexport's p.pos, and ran 'go build -a std cmd'. Of the calls to pos, here's everything at >1%:

 47.78% 96566 line delta 0
 14.93% 30182 different file
 13.28% 26834 line delta 1
  4.96% 10021 line delta -1
  2.03%  4095 line delta -2
  1.94%  3927 line delta 2
  1.19%  2410 line delta -4
  1.04%  2095 line delta 3
  1.03%  2077 line delta 4

Results on the compilebench set (requires CL 41053):

name       old obj-bytes     new obj-bytes     delta
Template          382k ± 0%         381k ± 0%   -0.23%  (p=0.002 n=6+6)
Unicode           203k ± 0%         203k ± 0%   -0.00%  (p=0.002 n=6+6)
GoTypes          1.18M ± 0%        1.17M ± 0%   -0.08%  (p=0.002 n=6+6)
Compiler         3.99M ± 0%        3.99M ± 0%   -0.08%  (p=0.002 n=6+6)
SSA              8.28M ± 0%        8.28M ± 0%   -0.02%  (p=0.002 n=6+6)
Flate             230k ± 0%         230k ± 0%   -0.05%  (p=0.002 n=6+6)
GoParser          287k ± 0%         287k ± 0%   -0.16%  (p=0.002 n=6+6)
Reflect          1.00M ± 0%        1.00M ± 0%   -0.01%  (p=0.002 n=6+6)
Tar               190k ± 0%         189k ± 0%   -0.24%  (p=0.002 n=6+6)
XML               416k ± 0%         415k ± 0%   -0.16%  (p=0.002 n=6+6)

name       old export-bytes  new export-bytes  delta
Template         19.0k ± 0%        18.2k ± 0%   -4.55%  (p=0.002 n=6+6)
Unicode          4.45k ± 0%        4.44k ± 0%   -0.11%  (p=0.002 n=6+6)
GoTypes          29.7k ± 0%        28.8k ± 0%   -3.12%  (p=0.002 n=6+6)
Compiler         75.6k ± 0%        72.5k ± 0%   -4.03%  (p=0.002 n=6+6)
SSA              76.2k ± 0%        74.8k ± 0%   -1.72%  (p=0.002 n=6+6)
Flate            4.98k ± 0%        4.87k ± 0%   -2.29%  (p=0.002 n=6+6)
GoParser         8.81k ± 0%        8.34k ± 0%   -5.30%  (p=0.002 n=6+6)
Reflect          6.25k ± 0%        6.16k ± 0%   -1.49%  (p=0.002 n=6+6)
Tar              9.49k ± 0%        9.03k ± 0%   -4.82%  (p=0.002 n=6+6)
XML              16.0k ± 0%        15.4k ± 0%   -4.03%  (p=0.002 n=6+6)

So looks like it shrinks export data by 1-5% and object files sizes by a tiny amount.

@josharian
Copy link
Contributor Author

If we do this, another thing to consider at the same time is switching from using commonPrefixLen for filenames to just writing dir and file separately. A quick hack (diff below) shows another minor win from this, due entirely to the elimination of the need to encode of the length of the common prefix.

name        old export-bytes  new export-bytes  delta
Template          19.0k ± 0%        18.9k ± 0%   -0.82%  (p=0.002 n=6+6)
Unicode           4.45k ± 0%        4.41k ± 0%   -0.97%  (p=0.002 n=6+6)
GoTypes           29.7k ± 0%        29.1k ± 0%   -1.90%  (p=0.002 n=6+6)
Compiler          75.6k ± 0%        74.1k ± 0%   -2.01%  (p=0.002 n=6+6)
SSA               76.2k ± 0%        75.2k ± 0%   -1.31%  (p=0.002 n=6+6)
Flate             4.98k ± 0%        4.90k ± 0%   -1.67%  (p=0.002 n=6+6)
GoParser          8.81k ± 0%        8.75k ± 0%   -0.65%  (p=0.002 n=6+6)
Reflect           6.25k ± 0%        6.20k ± 0%   -0.72%  (p=0.002 n=6+6)
Tar               9.49k ± 0%        9.41k ± 0%   -0.79%  (p=0.002 n=6+6)
XML               16.0k ± 0%        15.9k ± 0%   -0.62%  (p=0.002 n=6+6)
[Geo mean]        15.1k             14.9k        -1.15%
diff --git a/src/cmd/compile/internal/gc/bexport.go b/src/cmd/compile/internal/gc/bexport.go
index 3637804a12..7adcbdde28 100644
--- a/src/cmd/compile/internal/gc/bexport.go
+++ b/src/cmd/compile/internal/gc/bexport.go
@@ -118,6 +118,7 @@ import (
 	"encoding/binary"
 	"fmt"
 	"math/big"
+	"path/filepath"
 	"sort"
 	"strings"
 )
@@ -532,17 +533,12 @@ func (p *exporter) pos(n *Node) {
 	} else {
 		// different file
 		p.int(0)
-		// Encode filename as length of common prefix with previous
-		// filename, followed by (possibly empty) suffix. Filenames
-		// frequently share path prefixes, so this can save a lot
-		// of space and make export data size less dependent on file
-		// path length. The suffix is unlikely to be empty because
-		// file names tend to end in ".go".
-		n := commonPrefixLen(p.prevFile, file)
-		p.int(n)           // n >= 0
-		p.string(file[n:]) // write suffix only
+		p.int(line) // line >= 0
+		// Encode filenames as dir+path, so that common dirs can be shared.
+		// TODO: many dirs are strict prefixes of others. Can we use this?
+		d, f := filepath.Split(file)
+		p.string(d)
+		p.string(f)
 		p.prevFile = file
-		p.int(line)
 	}
 	p.prevLine = line
 }
diff --git a/src/cmd/compile/internal/gc/bimport.go b/src/cmd/compile/internal/gc/bimport.go
index 7766a5617a..c24e216956 100644
--- a/src/cmd/compile/internal/gc/bimport.go
+++ b/src/cmd/compile/internal/gc/bimport.go
@@ -387,9 +387,9 @@ func (p *importer) pos() src.XPos {
 		line += delta
 	} else if n := p.int(); n >= 0 {
 		// file changed
-		file = p.prevFile[:n] + p.string()
+		file = p.string() + p.string()
+		line = n
 		p.prevFile = file
-		line = p.int()
 		p.posBase = src.NewFileBase(file, file)
 	}
 	p.prevLine = line

@gopherbot
Copy link

CL https://golang.org/cl/41619 mentions this issue.

@golang golang locked and limited conversation to collaborators Apr 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants