New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: windows syscall does not use XMM registers to pass float args #6510
Comments
Yes. Go does not pass arguments in XMM registers on both 386 and amd64. No one complained about it yet. I am not sure what the proper solution is yet. Your proposal is simple enough, but we need to see the performance impact of your additional code. Also, this needs to be implemented on 386 too. So both implementations needs to be considered before committing on a plan of actions here. Perhaps we should add a new syscall API here. Feel free to push this along. I'll help in whatever way I can. Alex Status changed to Accepted. |
It also appears to incorrect for floats. I didn't actually test it, just been reading up on the calling conventions. TEXT runtime·callbackasm1(SB),NOSPLIT,$0 // Construct args vector for cgocallback(). // By windows/amd64 calling convention first 4 args are in CX, DX, R8, R9 // args from the 5th on are on the stack. // In any case, even if function has 0,1,2,3,4 args, there is reserved // but uninitialized "shadow space" for the first 4 args. // The values are in registers. MOVQ CX, (16+0)(SP) MOVQ DX, (16+8)(SP) MOVQ R8, (16+16)(SP) MOVQ R9, (16+24)(SP) Again if those first 4 args are floats they are switched for a xN register I believe. The comment should read // By windows/amd64 calling convention first 4 args are in CX/X0, DX/X1, R8/X2, R9/X3 (exclusively I believe, not in both like I did in my test fix) So it is correct for integers only again. I may not be reading this one correctly though. The first function is called asmstdcall but performs a fastcall, stdcall uses only stack. I have still not actually found official docs for 32bit calling convention in windows. I don't know how anyone writes code for this platform... heh ;) This is the most credible source I have found so far http://agner.org/optimize/calling_conventions.pdf |
> Again if those first 4 args are floats they are switched for a xN register I believe. Perhaps. I don't know. All APIs we call (Windows system dlls) use int parameters only. > The first function is called asmstdcall but performs a fastcall, stdcall uses only stack. I have still not actually found official docs for 32bit calling convention in windows. You are correct. The reason it has stdcall in the name is historical. We had 32 bit windows version first. And that uses stdcall. Then we ported code to amd64, but kept names unchanged. > ... This is the most credible source I have found so far http://agner.org/optimize/calling_conventions.pdf Here are some of my 64bit call convention bookmarks: http://blogs.msdn.com/b/oldnewthing/archive/2004/01/14/58579.aspx http://blogs.msdn.com/b/freik/archive/2005/03/17/398200.aspx You should find tons of stdcall convention explanations on the web. Alex |
loadregs: // Load first 4 args into correspondent registers. MOVQ 0(SI), CX MOVQ 8(SI), DX MOVQ 16(SI), R8 MOVQ 24(SI), R9 MOVQ CX, X0 MOVQ DX, X1 MOVQ R8, X2 MOVQ R9, X3 Should be slightly faster on >= core 2 intel chips, and about the same on amd ones according to an older copy of http://www.agner.org/optimize/instruction_tables.pdf We don't need to care about 32bit only cpus here since they do not pass args in xmm regs. CMOV doesn't do xmm stuff. I think it is going to be hard to top just adding 4 moves. A new API would involve branching and more memory reads in every scenario I can think of. Perhaps a duplicate set of syscall.SyscallX that also does float? Perhaps a jump table and 16 versions of the mov section? Or just 4 more movs. I will test my code with 32bit go in windows today and see if it works or not. https://github.com/bryanturley/gl/tree/master/gout/tst_exmpl/00_window is it, but it is not ready for mass consumption. It just barely started working in windows due to my ignorance of the windows api. Needs some major cleaning now as well. Also the summary line for this issue has a double negative, didn't see it till just now. Not that I should be correcting any one's english. |
I made this change: diff --git a/src/pkg/runtime/sys_windows_amd64.s b/src/pkg/runtime/sys_windows_amd64.s --- a/src/pkg/runtime/sys_windows_amd64.s +++ b/src/pkg/runtime/sys_windows_amd64.s @@ -44,6 +44,10 @@ MOVQ 8(SI), DX MOVQ 16(SI), R8 MOVQ 24(SI), R9 + MOVQ CX, X0 + MOVQ DX, X1 + MOVQ R8, X2 + MOVQ R9, X3 // Call stdcall function. CALL AX diff --git a/src/pkg/runtime/syscall_windows_test.go b/src/pkg/runtime/syscall_windows_test.go --- a/src/pkg/runtime/syscall_windows_test.go +++ b/src/pkg/runtime/syscall_windows_test.go @@ -242,3 +242,13 @@ func TestCallbackInAnotherThread(t *testing.T) { // TODO: test a function which calls back in another thread: QueueUserAPC() or CreateThread() } + +func BenchmarkWindowsSyscall(b *testing.B) { + b.StopTimer() + // make sure we do not measure dll load + syscall.GetLastError() + b.StartTimer() + for i := 0; i < b.N; i++ { + syscall.GetLastError() + } +} to test speed difference, and I see very little change. In fact, new version is slightly faster - not sure how to explain this. So, perhaps, we can accommodate new code into existing syscall. What do we do about 32 bit version? Also, can you explain how do you use this facility. What API takes floats as parameters? Also, if we are to proceed with this, we would need new test added to test new functionality. Do you have any suggestions? Fixed description. Please verify - my English is horrible. Alex |
http://code.google.com/p/go-wiki/wiki/WindowsDLLs I thought I would try it instead of cgo for opengl, because of that wiki page. OpenGL 4.4 Core using syscall.SyscallX() https://github.com/bryanturley/gl/blob/master/core/funcs_windows.go generated from the official gl.xml file I didn't type all that, heh. It is working so far in windows on amd64 with those 4 movs. |
I should mention it seems to work in 32bit windows with <= 32bit args as well. Perhaps type CallStack []byte // or a struct with a []byte and extra info // handy funcs for creating a CallStack func SyscallRaw(trap uintptr, args CallStack) (r1, r2 uintptr, err Errno) where you setup the call stack outside of the function? I got the next few days off (thanks D.C.)... I will test it if you guys think it is worth attempting. |
Probably shouldn't mention it, but some time ago I commented that this was a bad idea on golang-nuts https://groups.google.com/d/topic/golang-nuts/5t8lfkNW1v4/discussion and was promptly attacked for it... Which is the other reason I tried to do it "the right way" 8) Don't take this statement the wrong way, I love go. |
>1) What API takes floats as parameters? A few functions in gdi32 and opengl which is technically a system library. Windows 1.0 predated common hardware float on x86 so the api is mostly void of floats. >2) We need a test we can run during go build, to test this functionality. I agree, perhaps we create a dll and ship with it? Since none of the system ones seem to be usefull? >3) I do not understand how you propose to use floats on 386. type CallStack []byte // or a struct with a []byte and extra info func (c *CallStack) AddFloat64(f float64) func (c *CallStack) AddInt64(i int64) func (c *CallStack) AddUint64(i uint64) // etc... func SyscallRaw(trap uintptr, args CallStack) (r1, r2 uintptr, err Errno) you would copy the slice into the stack instead of starting at the first parameter and copying X 32bit vals from the go stack to the new call stack, would be nearly identical to the current 32bit asmstdcall and have the same level of type unsafety. This would be for 32bit only since 64bit seems to work, It would allow for 64bit values to be passed to DLLs. |
> >1) What API takes floats as parameters? > A few functions in gdi32 and opengl which is technically a system library. These are package libraries, not "system" libraries. Are these interfaces (dll functions) documented anywhere, like for example http://msdn.microsoft.com/en-us/library/windows/desktop/ms679360(v=vs.85).aspx ? > ... perhaps we create a dll and ship with it? ... No, we aren't shipping dlls. They ether must be present on the system, or we have to build them. We can assume we have mingw installed. > type CallStack []byte // or a struct with a []byte and extra info > func (c *CallStack) AddFloat64(f float64) > func (c *CallStack) AddInt64(i int64) > func (c *CallStack) AddUint64(i uint64) It looks complicated. The code we add needs to pay for itself and more. Adding 4 lines of code might be acceptable to support an external lib. I am far from convinced we need to do more, unless we can identify many other products and users. Alex |
If syscall.LoadLibrary() and syscall.SyscallX() are supposed to work for using DLLs on windows, they currently do not. On amd64 they fail to deliver float32/float64. On 386 they deliver float32 but fail to deliver 64bit vals including 64bit integers. This is every standard DLL file that uses those types. So is syscall.LoadLibrary() + syscall.SyscallX() meant to be an option for calling DLLs in windows? http://code.google.com/p/go-wiki/wiki/WindowsDLLs says it is supposed to be. Discussions on golang-nuts say it is supposed to be. You guys are in charge, I am just trying to help. |
Here is another example you wrote http://code.google.com/p/gowingui using LoadLibrary and Syscall for non system things. Are we not supposed to call into DLLs this way? |
Oh i forgot. > These are package libraries, not "system" libraries. Are these interfaces (dll functions) documented anywhere, like for example http://msdn.microsoft.com/en-us/library/windows/desktop/ms679360(v=vs.85).aspx ? the function that I discovered this bug with. http://msdn.microsoft.com/en-us/library/windows/desktop/dd318377%28v=vs.85%29.aspx the function I tested float64 with. http://msdn.microsoft.com/en-us/library/windows/desktop/dd318381%28v=vs.85%29.aspx opengl32.dll ships with windows since win98-ish. Even if you use amd/nvidia opengl32.dll you still call into the MS one for gl it has, which is really annoying. |
This fundamentally cannot work with the current API. The API has no idea what arguments are really floating point values and not. I believe that if you have a function that takes int1, int2, float3, int4, float5, then the correct assignment of arguments is to put int1, int2, int4 in the first three integer registers and float3, float5 in the first two floating point registers. In particular, float3 does not go in the third floating point register. syscall.Syscall is really for system calls. That concept is a bit tortured on Windows, but even so it is really for talking to system DLLs, which do not use floating point. I think for wrapping arbitrary libraries the answer really has to be cgo. It has type information and can do the right thing. |
>This fundamentally cannot work with the current API. >The API has no idea what arguments are really floating point values and not. >I believe that if you have a function that takes int1, int2, float3, int4, >float5, then the correct assignment of arguments is to put int1, int2, int4 >in the first three integer registers and float3, float5 in the first two >floating point registers. In particular, float3 does not go in the third >floating point register. In 386 it currently works fine with float32s being passed through as uintptrs. On 386 float64/int64/uint64 are broken because they must be crammed into a 32bit uintptr. On amd64 it is either CX or X0 you cannot passed 8 args in registers only 4 with those 8 registers. MS talks about unprototyped functions and how all 8 are considered volatile every call. >syscall.Syscall is really for system calls. That concept is a bit tortured >on Windows, but even so it is really for talking to system DLLs, which do >not use floating point. I said the same thing on golang-nuts and was attacked but ok... >I think for wrapping arbitrary libraries the answer really has to be cgo. >It has type information and can do the right thing. Sure, I will drop it. I think you guys should remove the go wiki page saying otherwise though so as not to confuse anyone else. Thanks anyways. |
In case you guys change your mind Simple DLL with a few functions Simple test go program Though when I posted this github seems to be acting slightly strange and not updating the repo on the web front. You may have to git clone it to see the files till that is fixed. https://github.com/bryanturley/issue6510 |
> If we fix this it appears we will be able to write and cross compile opengl apps without cgo. > If that is worth anything to anyone. I think it is worth my effort. > the function that I discovered this bug with. > http://msdn.microsoft.com/en-us/library/windows/desktop/dd318377%28v=vs.85%29.aspx > > the function I tested float64 with. > http://msdn.microsoft.com/en-us/library/windows/desktop/dd318381%28v=vs.85%29.aspx Thanks for that. opengl32.dll is a *system* dll, so, I think, we should at least make an effort here. > In 386 it currently works fine with float32s being passed through as uintptrs. > On 386 float64/int64/uint64 are broken because they must be crammed into a 32bit uintptr. Correct me if I am wrong, but, reading this, I take it we are covered on 386. I consider splitting of float64/int64/uint64 into uintptr is OK for this level (syscall). Perhaps we can provide some little functions to help with splitting here. > On amd64 it is either CX or X0 you cannot passed 8 args in registers only 4 with those 8 > registers. MS talks about unprototyped functions and how all 8 are considered volatile every call. I am not familiar with this technology, so, please, explain again what else you think is needed (on top of 4 new lines as per https://golang.org/issue/6510?c=11) to call all these functions in opengl32.dll. Thank you. > Simple DLL with a few functions > ... > https://github.com/bryanturley/issue6510 Thank you for doing that. You are using int, float and double in your C code there. How large are these? I am bit concerned about Go call matching these parameters. But since you've mentioned opengl32.dll, why don't we call glClearDepth and glClearColor there instead? We won't need gcc for that test then. Also the test would be a nice simple example for other users. Alex |
>Correct me if I am wrong, but, reading this, I take it we are covered on 386. I >consider splitting of float64/int64/uint64 into uintptr is OK for this level >(syscall). Perhaps we can provide some little functions to help with splitting here. I did not think of splitting that might work. Since the current asm routine (386 version) just copies from the go stack to the c stack for the new call. I was thinking it might just be easier to make a byte slice of the args and copy that. That is what the "type CallStack []byte" thing above was, splitting is probably faster at runtime though. >Thank you for doing that. You are using int, float and double in your C code there. >How large are these? I am bit concerned about Go call matching these parameters. int might be questionable I will switch it to int32_t which is not. float and double I have never seen as anything other than what go calls float32 and float64, we could add checks to the C code though. (ignoring x87 long double...) >But since you've mentioned opengl32.dll, why don't we call glClearDepth and >glClearColor there instead? We won't need gcc for that test then. Also the test >would be a nice simple example for other users. We could, but it would require we have a current opengl context. Which would require we have a window or an off screen buffer. We are talking at least 10 calls into the windows api before we get an opengl context. I spent 20 mins grep'ing the mingw headers for calls that use float or double in the windows api. They were all drawing calls (in gdi32) and some networking protocols? (ITT ?). Probably not usable for testing. That test DLL has no prereqs. It also has float32 return values and simplicity. All of the test functions just add the args together and return the sum. If they get scrambled or ignored going in the sum will not match the sum we make in go prior to the call. When i said >MS talks about unprototyped functions and how all 8 are considered volatile every call. The important part was all 8 of those regs can get clobbered every call (amd64 only). No clue what unprototyped funcs are, I have some guesses but I don't think it matters. |
> We are not going to do anything for this for Go 1.2. Makes sense to me, and I assumed that would be the case as I was writing the initial report. >syscall.Syscall cannot possibly know >which arguments are supposed to be floats and which are integers. syscall.Syscall also doesn't differentiate between (u)int8/(u)int16/(u)int32 either and cannot for the same reason. All of those are working fine. I think the 386 not working for 64bit values was just an oversight or a shortcut. In windows 386 (stdcall) all of the arguments are 32bit or 64bit values on the stack. Currently it just can't handle putting 64bit values on the stack since it only deals in 32bit uintptr's. I believe it can work with the Alex's splitting idea with no changes to the syscall package now, I was just about to read up and test. It may be as simple as 4 more mov instructions for amd64 (4 line change), and splitting 64bit vals into two args for 386 (no change unless we add a split helper func in syscall). |
New info from more testing (https://github.com/bryanturley/issue6510) windows amd64 passes in fine, and returns fine with just the 4 extra movs before the call instruction. windows 386 however appears to return floats in the x87 regs. I don't think the current syscall funcs can be used for funcs that return floats. Might just be time to give in and go the cgo way. Something to remember though if there is a win32 api call go needs that returns a float in the future. |
Issue #3588 has been merged into this issue. |
Just found this again look at bugs I starred, I think this should be closed unless you guys are using it as a reminder of the problem. I don't think this can be fixed without breaking the go 1 promise, and the last comment was nearly a year ago. Though a few comments in original syscall package should suffice for that I think. Not my call though. |
I think generally the syscall package is only used for things that Go standard library requires. As none of standard library needs to call DLL function that takes or returns floating point values, this issue is best reported against the go.sys/windows package (I think we can accept the issue if it were against go.sys/windows, and also we're free from Go 1 API guarantees there). |
It turns out the calling convention description in Comment 22 by @rsc is not the case. If the second argument is a float32, its value goes into the second XMM register regardless of whether the first argument is an int or a float. The documentation in https://msdn.microsoft.com/en-us/library/zthk2dkh.aspx has an example:
This means its extremely easy to support functions with float parameters in syscall.Syscall. I would like to do this in Go 1.7, so I can remove the fixFloat hack from the GL package I introduced in golang.org/cl/17675. |
The https://msdn.microsoft.com/en-us/library/zthk2dkh.aspx is about amd64 parameter passing. What happens with 386? Alex |
386 always passes argument on stack, so I think it already works correctly
(to pass a double, you have to manually convert it to two uintptrs in the
correct order, but that's doable.)
To fix floating point argument passing on amd64, we just need to copy each
uintptr to corresponding XMM register.
The real problem, however, is handling floating returns. syscall.Syscall has
no way to know whether the result should be in %rax or %xmm0.
|
Do you have reference for that? I am specifically asking about float args. Alex |
On Tue, Dec 15, 2015 at 7:35 PM, Alex Brainman notifications@github.com
But there really is no reason to pass FP arguments in 387 stack whereas all |
Agner Fog has a pretty extensive overview of calling conventions: http://www.agner.org/optimize/calling_conventions.pdf
|
CL https://golang.org/cl/32173 mentions this issue. |
The text was updated successfully, but these errors were encountered: