Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: use package source hash to detect stale cached compiled packages? #14976

Closed
bronze1man opened this issue Mar 26, 2016 · 6 comments
Closed

Comments

@bronze1man
Copy link
Contributor

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
    go version go1.6 darwin/amd64
  2. What operating system and processor architecture are you using (go env)?
    GOARCH="amd64"
    GOBIN=""
    GOEXE=""
    GOHOSTARCH="amd64"
    GOHOSTOS="darwin"
    GOOS="darwin"
    GOPATH=""
    GORACE=""
    GOROOT="/usr/local/go"
    GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
    GO15VENDOREXPERIMENT="1"
    CC="clang"
    GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fno-common"
    CXX="clang++"
    CGO_ENABLED="1"
  3. What did you do?
    I write a main package and a library package and a tester main package.
func TestMustGoRun2(ot *testing.T){
    kmgFile.MustDelete("testFile")
    defer kmgFile.MustDelete("testFile")
    kmgFile.MustWriteFileWithMkdir("testFile/src/pkgLibA/a.go",[]byte(`package pkgLibA

func A() string{
    return "A"
}
`))
    kmgFile.MustWriteFileWithMkdir("testFile/src/pkgMainA/a.go",[]byte(`package main
import(
    "pkgLibA"
    "fmt"
)
func main(){
    fmt.Println(pkgLibA.A())
}
`))
    testerWithPkg :=func() []byte{
        p:=MustNewProgramV2(map[string]string{"GOPATH":kmgFile.MustGetFullPath("testFile")},"pkgMainA")
        buf:=bytes.Buffer{}
        p.mustGoRun(nil,func(cmdSlice []string,env map[string]string){
            b:=kmgCmd.CmdSlice(cmdSlice).
            MustSetEnvMap(env).
            MustRunAndReturnOutput()
            buf.Write(b)
        })
        return buf.Bytes()
    }
    testerWithPath :=func() []byte{
        p:=MustNewProgramV2(map[string]string{"GOPATH":kmgFile.MustGetFullPath("testFile")},"testFile/src/pkgMainA/a.go")
        buf:=bytes.Buffer{}
        p.mustGoRun(nil,func(cmdSlice []string,env map[string]string){
            b:=kmgCmd.CmdSlice(cmdSlice).
            MustSetEnvMap(env).
            MustRunAndReturnOutput()
            buf.Write(b)
        })
        return buf.Bytes()
    }
    kmgTest.Equal(string(testerWithPkg()),"A\n")

    kmgFile.MustDelete("testFile/pkg")
    kmgFile.MustWriteFileWithMkdir("testFile/src/pkgLibA/a.go",[]byte(`package pkgLibA

func A() string{
    return "B"
}
`))
    kmgTest.Equal(string(testerWithPkg()),"B\n")

    now:=time.Now()
    time.Sleep(now.Round(time.Second).Add(time.Second).Sub(now))
    // 已知的golang的pkg的缓存bug.(1秒钟后会问题自然解决.)
    kmgFile.MustWriteFileWithMkdir("testFile/src/pkgLibA/a.go",[]byte(`package pkgLibA

func A() string{
    return "C"
}
`))
    kmgTest.Equal(string(testerWithPkg()),"C\n")
    kmgTest.Equal(string(testerWithPath()),"C\n")

    now=time.Now()
    time.Sleep(now.Round(time.Second).Add(time.Second).Sub(now))
    kmgFile.MustWriteFileWithMkdir("testFile/src/pkgLibA/a.go",[]byte(`package pkgLibA

func A() string{
    return "D"
}
`))
    kmgTest.Equal(string(testerWithPath()),"D\n")

    kmgFile.MustWriteFileWithMkdir("testFile/src/pkgMainA/a.go",[]byte(`package main
import(
    "pkgLibA"
    "fmt"
)
func main(){
    fmt.Println(pkgLibA.A(),"A")
}
`))
    kmgTest.Equal(string(testerWithPath()),"D A\n")
    kmgTest.Equal(string(testerWithPkg()),"D A\n")
}

Sorry, I missing some library in the code.I think you guys may guess what they are doing.
testerWithPkg use go install.
testerWithPath use go run .

  1. What did you expect to see?
    I want to change the pkgLibA code and go run and go install can see the change immediately.
  2. What did you see instead?
    I found that I need to add following code or delete pkg directory to make go run and go install notice that I change the pkgLibA code.
    now=time.Now()
    time.Sleep(now.Round(time.Second).Add(time.Second).Sub(now))

I guess your guys use modified time from the Lstat syscall for file content cache verification. I got the same result when I only use Lstat syscall for file content cache verification on darwin.
I found two solutions to get correct behavior to this kind of problem:

  • easy to code one: Save the md5 sum of the file content in the cache file,and check them every time you need to check if the cache is valid.
  • faster one: Save the md5 sum and the modified time of the file content in the cache file,if you need to check if the cache is valid you need following algorithm:
    • If you have check md5 sum of the file content after the modified time add one second,you can just check modified time from the Lstat syscall, or you need check md5 sum of the file content again.
    • In the most of time, you only need one more md5 sum of the whole file content then the only use Lstat syscall solution. And my computer can do md5 sum with 500MB/s which is faster then ssd disk reading.
    • here is the implement of the algorithm:
package kmgCache
import (
    "time"
    "sync"
    "github.com/bronze1man/kmg/encoding/kmgGob"
    "os"
    "crypto/md5"
    "io"
    "encoding/hex"
    "github.com/bronze1man/kmg/kmgFile"
)

// 使用文件绝对路径获取md5,如果文件不存在,会返回 ""
type FileMd5Getter struct{
    imp *fileMd5GetterImp
}
func (getter FileMd5Getter) GetMd5ByFullPath(path string) string{
    return getter.imp.getMd5ByFullPath__NOLOCK(path)
}
func (getter FileMd5Getter) GetMd5ByStatAndFullPath(statAndFullPath kmgFile.StatAndFullPath) string{
    return getter.imp.getMd5ByStatAndFullPath__NOLOCK(statAndFullPath)
}
type FileMd5GetByFullPath func(path string) string

// 使用缓存方法获取某个文件的md5.
// 由于有各种锁的存在,请不要嵌套使用该函数.
// 此函数在保证正确性的前提下,尽最大努力提高性能.
// 同一个数据库文件,在同一时间只能有一个cb运行.
// 请传入文件绝对路径,避免使用相对路径,减少当前目录的干扰.
func FileMd5Get(dbFilePath string,cb func(getter FileMd5Getter)) {
    getterImp :=getFileMd5GetterByDbFilePath(dbFilePath)
    getterImp.locker.Lock()
    getterImp.cacheNow = time.Now()
    getterImp.hasChange = false
    outGetter:=FileMd5Getter{imp: getterImp}
    cb(outGetter)
    if getterImp.hasChange{
        kmgGob.MustWriteFile(dbFilePath, getterImp)
    }
    getterImp.locker.Unlock()
}

type fileMd5GetterImp struct{
    CacheInfo map[string]*cacheInfoEntry
    locker sync.Mutex
    hasChange bool // 内部的信息是否变化过,以便减少写入成本.
    cacheNow time.Time
    innerBuf []byte
}
type cacheInfoEntry struct{
    MTime time.Time
    Md5 string
    HasCheckInNextSecond bool // 是否在非同一秒,检查过一次md5
}


var fileMd5GetterMap map[string]*fileMd5GetterImp
var fileMd5GetterMapLock sync.Mutex
func (getter *fileMd5GetterImp) getMd5ByFullPath__NOLOCK(path string) string {
    fi,err:=os.Lstat(path)
    if err!=nil{
        return ""
    }
    return getter.getMd5ByStatAndFullPath__NOLOCK(kmgFile.StatAndFullPath{
        Fi: fi,
        FullPath: path,
    })
}
func (getter *fileMd5GetterImp) getMd5ByStatAndFullPath__NOLOCK(statAndFullPath kmgFile.StatAndFullPath) string {
    path:=statAndFullPath.FullPath
    fi:=statAndFullPath.Fi
    var err error
    thisEntry := getter.CacheInfo[path]
    if thisEntry == nil {
        thisEntry = &cacheInfoEntry{}
        thisEntry.MTime = fi.ModTime()
        thisEntry.Md5, err = md5FileWithBuf(path,getter.innerBuf)
        if err != nil {
            return ""
        }
        thisEntry.HasCheckInNextSecond = (getter.cacheNow.Sub(thisEntry.MTime)>time.Second) // 如果mtime已经是很久以前的了,就直接标记已经二次检查了.
        getter.CacheInfo[path] = thisEntry
        getter.hasChange = true
        return thisEntry.Md5
    }
    // 最初版本,用于确保正确性,(后面使用减少syscall的方式来进行优化,会降低可读性.)
    fileMtime:=fi.ModTime()
    if !fileMtime.Equal(thisEntry.MTime){
        // 文件修改时间和储存的文件修改时间 不相同,一定检查md5
        thisEntry.MTime = fileMtime
        thisEntry.Md5, err = md5FileWithBuf(path,getter.innerBuf)
        if err != nil {
            return ""
        }
        thisEntry.HasCheckInNextSecond = (getter.cacheNow.Sub(thisEntry.MTime)>time.Second) // 如果mtime已经是很久以前的了,就直接标记已经二次检查了.
        getter.hasChange = true // 修改时间一定变化过.
        return thisEntry.Md5
    }
    now:=getter.cacheNow
    if now.Sub(thisEntry.MTime)<time.Second{
        // 当前时间和文件修改时间,在同一秒内,(文件修改时间和储存的文件修改时间 相同) 一定检查md5
        md5, err := md5FileWithBuf(path,getter.innerBuf)
        if err != nil {
            return ""
        }
        if md5==thisEntry.Md5 {
            getter.hasChange = true
        }
        thisEntry.Md5 = md5
        thisEntry.HasCheckInNextSecond = false
        return thisEntry.Md5
    }
    if thisEntry.HasCheckInNextSecond==false{
        // 当前是第一次(文件修改时间和储存的文件修改时间 相同,当前时间和文件修改时间,不在同一秒内) 一定检查md5
        thisEntry.MTime = fileMtime
        md5, err := md5FileWithBuf(path,getter.innerBuf)
        if err != nil {
            return ""
        }
        thisEntry.HasCheckInNextSecond = (md5==thisEntry.Md5) // md5 没有变化才设置 当前已经第二次检查过了
        getter.hasChange = true // 要么 thisEntry.HasCheckInNextSecond 变化,要么 md5==thisEntry.Md5 变化.
        thisEntry.Md5 = md5
        return thisEntry.Md5
    }
    // (文件修改时间和储存的文件修改时间 相同,当前时间和文件修改时间,不在同一秒内,当前不是第一次) 直接使用缓存数据就可以了.
    return thisEntry.Md5
}


func getFileMd5GetterByDbFilePath(dbFilePath string) *fileMd5GetterImp {
    fileMd5GetterMapLock.Lock()
    defer fileMd5GetterMapLock.Unlock()
    if fileMd5GetterMap==nil{
        fileMd5GetterMap = map[string]*fileMd5GetterImp{}
    }
    thisGetter:=fileMd5GetterMap[dbFilePath]
    if thisGetter!=nil{
        return thisGetter
    }
    thisGetter = &fileMd5GetterImp{}
    err := kmgGob.ReadFile(dbFilePath, &thisGetter)
    if err != nil {
        //忽略缓存读取的任何错误
        thisGetter = &fileMd5GetterImp{}
    }
    if thisGetter.CacheInfo==nil{
        thisGetter.CacheInfo = map[string]*cacheInfoEntry{}
    }
    if thisGetter.innerBuf == nil{
        thisGetter.innerBuf = make([]byte,32*1024)
    }
    fileMd5GetterMap[dbFilePath] = thisGetter
    return thisGetter
}

// 使用这个方法减少alloc,提高性能约30%.
func md5FileWithBuf(path string,innerBuf []byte) (string,error){
    hash := md5.New()
    f,err:=os.Open(path)
    if err!=nil{
        return "",err
    }
    defer f.Close()
    _,err=io.CopyBuffer(hash,f,innerBuf)
    if err!=nil{
        return "",err
    }
    hashB := hash.Sum(innerBuf[0:0])
    hex.Encode(innerBuf[16:],hashB)
    return string(innerBuf[16:16+32]),nil
}

As you can see from the example implement code, if you make file content verification to a standalone library , you only need to compute the md5 sum of the file content when the changes,if it does not change,you can use the old md5 sum safely, and the most of the source code files do not change during the develop process.

@bradfitz
Copy link
Contributor

bradfitz commented Apr 9, 2016

@minux, can you help triage this?

@bradfitz bradfitz added this to the Unplanned milestone Apr 9, 2016
@bradfitz bradfitz changed the title go install pkg cache bug on darwin. cmd/go: go install pkg cache bug on darwin. Apr 9, 2016
@minux
Copy link
Member

minux commented Apr 10, 2016 via email

@minux minux changed the title cmd/go: go install pkg cache bug on darwin. cmd/go: use package source hash to detect stale cached compiled packages? Apr 10, 2016
@bradfitz
Copy link
Contributor

In that case, this is a dup of #4719.

Part of solving #4719 (which I desperately want, but doesn't fit into the Go 1.7 cycle with people's time) and having a reliable cache is making sure it's a proper 100% accurate cache, which would necessarily involve making things based on content and not modtime.

@bronze1man
Copy link
Contributor Author

@bradfitz So I have to wait to go 1.8 to get the 100% accurate cache?
Thanks.

I am planing of writing my accurate cache "go install" implement using go install command.
But I have the biggest problem :

  • The data race between cache checking and go install command and other threads file change can not be solved unless changing golang source code(wait go 1.8) or managing my golang source code version(too much work for me). ^_^

@bradfitz
Copy link
Contributor

Or Go 1.9 or Go 1.10. Nobody has committed to working on it, and nobody has a design document or CL yet.

@golang golang locked and limited conversation to collaborators Apr 10, 2017
@rsc
Copy link
Contributor

rsc commented Nov 2, 2017

Fixed now on master, will be in Go 1.10.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants