New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compress/gzip: compression level does not work #21987
Comments
Without knowing what |
@dsnet I've made a repro here https://play.golang.org/p/6GjY9AHO_z or inlined package main
import (
"compress/gzip"
"crypto/rand"
"fmt"
"io"
"io/ioutil"
"log"
)
func compressSize(r io.Reader, level int) int64 {
prc, pwc := io.Pipe()
go func() {
defer pwc.Close()
gzw, err := gzip.NewWriterLevel(pwc, level)
if err != nil {
log.Printf("level: #%d err: %v", level, err)
}
io.Copy(gzw, r)
gzw.Flush()
gzw.Close()
}()
n, _ := io.Copy(ioutil.Discard, prc)
return n
}
func main() {
for level := 1; level <= 9; level++ {
size := compressSize(io.LimitReader(rand.Reader, 100000), level)
fmt.Printf("Level: %d size: %d\n", level, size)
}
} and the sequence of bytes is retrieved from When I run the compression repro 10 times, I get the same results $ for ((i=0; i <=10; i++)) do echo -e "Run #$i\n" && go run main.go && echo -e "End of Run\n";done
Run #0
Level: 1 size: 100038
Level: 2 size: 100063
Level: 3 size: 100063
Level: 4 size: 100063
Level: 5 size: 100063
Level: 6 size: 100063
Level: 7 size: 100063
Level: 8 size: 100063
Level: 9 size: 100063
End of Run
Run #1
Level: 1 size: 100038
Level: 2 size: 100063
Level: 3 size: 100063
Level: 4 size: 100063
Level: 5 size: 100063
Level: 6 size: 100063
Level: 7 size: 100063
Level: 8 size: 100063
Level: 9 size: 100063
End of Run
Run #2
Level: 1 size: 100038
Level: 2 size: 100063
Level: 3 size: 100063
Level: 4 size: 100063
Level: 5 size: 100063
Level: 6 size: 100063
Level: 7 size: 100063
Level: 8 size: 100063
Level: 9 size: 100063
End of Run
Run #3
Level: 1 size: 100038
Level: 2 size: 100063
Level: 3 size: 100063
Level: 4 size: 100063
Level: 5 size: 100063
Level: 6 size: 100063
Level: 7 size: 100063
Level: 8 size: 100063
Level: 9 size: 100063
End of Run
Run #4
Level: 1 size: 100038
Level: 2 size: 100063
Level: 3 size: 100063
Level: 4 size: 100063
Level: 5 size: 100063
Level: 6 size: 100063
Level: 7 size: 100063
Level: 8 size: 100063
Level: 9 size: 100063
End of Run
Run #5
Level: 1 size: 100038
Level: 2 size: 100063
Level: 3 size: 100063
Level: 4 size: 100063
Level: 5 size: 100063
Level: 6 size: 100063
Level: 7 size: 100063
Level: 8 size: 100063
Level: 9 size: 100063
End of Run
Run #6
Level: 1 size: 100038
Level: 2 size: 100063
Level: 3 size: 100063
Level: 4 size: 100063
Level: 5 size: 100063
Level: 6 size: 100063
Level: 7 size: 100063
Level: 8 size: 100063
Level: 9 size: 100063
End of Run
Run #7
Level: 1 size: 100038
Level: 2 size: 100063
Level: 3 size: 100063
Level: 4 size: 100063
Level: 5 size: 100063
Level: 6 size: 100063
Level: 7 size: 100063
Level: 8 size: 100063
Level: 9 size: 100063
End of Run
Run #8
Level: 1 size: 100038
Level: 2 size: 100063
Level: 3 size: 100063
Level: 4 size: 100063
Level: 5 size: 100063
Level: 6 size: 100063
Level: 7 size: 100063
Level: 8 size: 100063
Level: 9 size: 100063
End of Run
Run #9
Level: 1 size: 100038
Level: 2 size: 100063
Level: 3 size: 100063
Level: 4 size: 100063
Level: 5 size: 100063
Level: 6 size: 100063
Level: 7 size: 100063
Level: 8 size: 100063
Level: 9 size: 100063
End of Run
Run #10
Level: 1 size: 100038
Level: 2 size: 100063
Level: 3 size: 100063
Level: 4 size: 100063
Level: 5 size: 100063
Level: 6 size: 100063
Level: 7 size: 100063
Level: 8 size: 100063
Level: 9 size: 100063
End of Run
|
@odeke-em, your repro is just demonstrating that random data is not compressible, which is entirely explained by information theory. If level 9 can't compress, I don't know how you would expect the lower levels to do better. (level 1 does better because it uses an entirely different algorithm that does not always emit a trailing empty block like the more general algorithm for levels 2-9, but that's a very minor implementation detail that maybe we'll fix someday). |
Thanks for pointing that out @dsnet, I was being dumb and using different data on each run, I just have stuck with the same data which would make sense(I had that before), and I don't know much about information theory, but I'll study up on that, thanks for piquing a field of interest for me :) |
As @dsnet already explained, random data is not really a good fit for compression. Here is a small program that compares the length of the output of go's compress/gzip with the locally installed gzip tool: https://play.golang.org/p/nRU119B45l (does not run on the playground, needs gzip in the users $PATH) @dongweigogo Can you try the code with your data? Just save it as "gzip.go" and run Feeding the program (compiled with Go 1.9) with random data I get: $ cat /dev/urandom | head -c100000 | go run gzip.go
2017/09/23 10:44:52 level 1, mode go, bytes 100033
2017/09/23 10:44:52 level 1, mode exec, bytes 100038
2017/09/23 10:44:52 level 2, mode go, bytes 100058
2017/09/23 10:44:52 level 2, mode exec, bytes 100038
2017/09/23 10:44:52 level 3, mode go, bytes 100058
2017/09/23 10:44:52 level 3, mode exec, bytes 100038
2017/09/23 10:44:52 level 4, mode go, bytes 100058
2017/09/23 10:44:52 level 4, mode exec, bytes 100038
2017/09/23 10:44:52 level 5, mode go, bytes 100058
2017/09/23 10:44:52 level 5, mode exec, bytes 100038
2017/09/23 10:44:52 level 6, mode go, bytes 100058
2017/09/23 10:44:52 level 6, mode exec, bytes 100038
2017/09/23 10:44:52 level 7, mode go, bytes 100058
2017/09/23 10:44:52 level 7, mode exec, bytes 100038
2017/09/23 10:44:52 level 8, mode go, bytes 100058
2017/09/23 10:44:52 level 8, mode exec, bytes 100038
2017/09/23 10:44:52 level 9, mode go, bytes 100058
2017/09/23 10:44:52 level 9, mode exec, bytes 100038 Using some real data (multiple *.go files concatenated) I get $ cat $GOPATH/src/github.com/nats-io/gnatsd/server/*.go | go run gzip.go
2017/09/23 10:45:04 level 1, mode go, bytes 106244
2017/09/23 10:45:04 level 1, mode exec, bytes 104494
2017/09/23 10:45:04 level 2, mode go, bytes 97783
2017/09/23 10:45:04 level 2, mode exec, bytes 99273
2017/09/23 10:45:04 level 3, mode go, bytes 95359
2017/09/23 10:45:04 level 3, mode exec, bytes 95593
2017/09/23 10:45:04 level 4, mode go, bytes 88169
2017/09/23 10:45:04 level 4, mode exec, bytes 88797
2017/09/23 10:45:04 level 5, mode go, bytes 85406
2017/09/23 10:45:04 level 5, mode exec, bytes 85536
2017/09/23 10:45:04 level 6, mode go, bytes 84633
2017/09/23 10:45:04 level 6, mode exec, bytes 84106
2017/09/23 10:45:04 level 7, mode go, bytes 84512
2017/09/23 10:45:04 level 7, mode exec, bytes 83890
2017/09/23 10:45:04 level 8, mode go, bytes 84429
2017/09/23 10:45:04 level 8, mode exec, bytes 83782
2017/09/23 10:45:04 level 9, mode go, bytes 84426
2017/09/23 10:45:04 level 9, mode exec, bytes 83780 |
@dsnet , @nussjustin actually the data is large, a 700 Mb text file. |
@nussjustin , since the input is very large and needs quite a lot of time at high level, I truncated the input file to 60M. The result is below 2017/09/23 18:32:56 level 1, mode go, bytes 21211154 Then I tried using gzip command to compress with level from 1 to 9, it works well. I've no idea why my code does not work. |
The result of the program @nussjustin provided seems to indicate that it is decreasing in filesize. So I'm not sure what you mean by the compression level having no effect. |
Timed out. |
go 1.8.3
the gzip compression level does not work
I changed the compression level from 1 to 9, but all the compressed files have the same size and the same compression ratio. I don't know if anything goes wrong.
code as below:
`
func CompressFile(file string) {
}
`
The text was updated successfully, but these errors were encountered: