Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgo: include symbol version information #1397

Closed
alberts opened this issue Jan 7, 2011 · 23 comments
Closed

cgo: include symbol version information #1397

alberts opened this issue Jan 7, 2011 · 23 comments

Comments

@alberts
Copy link
Contributor

alberts commented Jan 7, 2011

What steps will reproduce the problem?

Reproducing the problem is a bit tricky, since you need to call a function on the
ibverbs C library that is only going to return the right stuff if you have an InfiniBand
controller in your machine.

I tried to make my own little shared library with a simplified version of the function
and the same structs as what produces the error, but it didn't reproduce the problem.

If there's anything I can check with gdb that could help, let me know. I have the
ibverbs debuginfo, so I can set breakpoints in it and in my Go code.

So here goes...

What is the expected output?

With release.2010-12-08 (the old cgo), I get the following structs returned from the C
function:

struct ibv_device **ibv_get_device_list(int *num_devices)

The values:

deviceList=0x17fd3c0
&device=0x17f5740
device=&{ops:{alloc_context:0x7f8ef46d7480 free_context:0x7f8ef46d72b0} node_type:1
transport_type:0 name:[109 108 120 52 95 48 0 ...] dev_name:[117 118 101 114 98 115 48 0
...] dev_path:[47 115 121 115 47 99 108 97 115 115 47 105 110 102 105 110 105 98 97 110
100 95 118 101 114 98 115 47 117 118 101 114 98 115 48 0 ...] ibdev_path:[47 115 121 115
47 99 108 97 115 115 47 105 110 102 105 110 105 98 97 110 100 47 109 108 120 52 95 48 0
...]}

What do you see instead?

deviceList=0x1b40048
&device=0x1b40710
device=&{ops:{alloc_context:<nil> free_context:<nil>} node_type:28575552
transport_type:0 name:[garbage0] dev_name:[garbage] dev_path:[garbage]
ibdev_path:[garbage]}

Basically the whole struct is corrupted.

Which compiler are you using (5g, 6g, 8g, gccgo)?

6g

Which operating system are you using?

linux

Which revision are you using?  (hg identify)

tip

release.2010-12-08 with the old cgo works fine.

Please provide any additional information below.

The C function looks like this:

struct ibv_device **ibv_get_device_list(int *num_devices)

And the structs involved:

enum {
  IBV_SYSFS_NAME_MAX = 64,
  IBV_SYSFS_PATH_MAX = 256
};

struct ibv_device_ops {
  void *p1; // actually a function pointer
  void *p2; // actually a function pointer
};

enum ibv_node_type {
  IBV_NODE_UNKNOWN = -1,
  IBV_NODE_CA = 1,
  IBV_NODE_SWITCH,
  IBV_NODE_ROUTER,
  IBV_NODE_RNIC
};

enum ibv_transport_type {
  IBV_TRANSPORT_UNKNOWN = -1,
  IBV_TRANSPORT_IB = 0,
  IBV_TRANSPORT_IWARP
};

struct ibv_device {
  struct ibv_device_ops ops;
  enum ibv_node_type node_type;
  enum ibv_transport_type transport_type;
  char name[IBV_SYSFS_NAME_MAX];
  char dev_name[IBV_SYSFS_NAME_MAX];
  char dev_path[IBV_SYSFS_PATH_MAX];
  char ibdev_path[IBV_SYSFS_PATH_MAX];
};

Attachments:

  1. ib.go (893 bytes)
  2. Makefile (107 bytes)
  3. ib_test.go (267 bytes)
@alberts
Copy link
Contributor Author

alberts commented Jan 7, 2011

Comment 1:

Here's a better ib.go that doesn't use C.strncmp. Same problem though.

Attachments:

  1. ib.go (844 bytes)

@adg
Copy link
Contributor

adg commented Jan 10, 2011

Comment 2:

Owner changed to r...@golang.org.

Status changed to NeedsAttention.

@rsc
Copy link
Contributor

rsc commented Jan 11, 2011

Comment 3:

It would help to run both old and new cgo and compare
the generated Go code that describes the struct.
It sounds like the Go definition of ibv_device etc
is not lining up with what the C code is, but that it used to.

@alberts
Copy link
Contributor Author

alberts commented Jan 12, 2011

Comment 4:

Here's the generated files from the old cgo and the new cgo.

Attachments:

  1. old.zip (2236 bytes)
  2. new.zip (2977 bytes)

@alberts
Copy link
Contributor Author

alberts commented Jan 12, 2011

Comment 5:

And the diff... can't really see anything significant.

Attachments:

  1. old_new.diff (2889 bytes)

@alberts
Copy link
Contributor Author

alberts commented Jan 12, 2011

Comment 6:

Could it be that something is wrong with the way the new cgo calls the function in the
shared library which was hidden by the stub library in the old cgo?

@rsc
Copy link
Contributor

rsc commented Jan 12, 2011

Comment 7:

You're right that not much changed!
I was assuming that the call was returning the right value.
I guess the next step is to verify that assumption.
Can you make a wrapper my_ibv_get_device_list that
calls ibv_get_device_list, prints the result using %p, and
returns it?  And then print the same result from Go
(using %p) and see if they match?
If they match, then it has to be a data structure issue.
If they don't match, then it's a cgo calling issue.

Status changed to WaitingForReply.

@alberts
Copy link
Contributor Author

alberts commented Jan 12, 2011

Comment 8:

Okay, I added a little wrapper:
/*
#include <stdio.h>
#include <infiniband/verbs.h>
struct ibv_device **my_ibv_get_device_list(int *num_devices) {
  struct ibv_device** devs = ibv_get_device_list(num_devices);
  printf("devs=%p\n", devs);
  printf("devs[0]->node_type=%d\n", devs[0]->node_type);
  return devs;
}
*/
It prints:
devs=0x2223040
devs[0]->node_type=1
deviceList=0x2223040
so the pointer makes it through fine and the struct is intact.
But with the new cgo:
devs=0x8e70f8
devs[0]->node_type=9303872
deviceList=0x8e70f8
So the pointers are fine, but the struct is already corrupted before Go gets it.
As a last resort, I ran valgrind 3.5.0 on the binary. It almost works! Things run for a
while and then something segfaults. There's some interesting warnings that might warrant
your attention. Output attached.

Attachments:

  1. ib.go (1099 bytes)
  2. valgrind.txt (13040 bytes)

@rsc
Copy link
Contributor

rsc commented Jan 12, 2011

Comment 9:

What if you run 
IBV_FORK_SAFE=1 6.out
instead of 6.out?

@alberts
Copy link
Contributor Author

alberts commented Jan 12, 2011

Comment 10:

Same result (corrupted struct). I might be wrong, but I don't think locked memory plays
a part in this part of the initialization. If I do ulimit -l 0, it still works with old
cgo. I think the first memory gets locked when one calls ibv_open_device or
ibv_create_qp.

@alberts
Copy link
Contributor Author

alberts commented Jan 12, 2011

Comment 11:

If it would help, I can try to arrange SSH access to a machine where you can reproduce
this problem?

@alberts
Copy link
Contributor Author

alberts commented Jan 14, 2011

Comment 12:

Saw this warning when building Go on Fedora 14 x86_64:
/usr/bin/ld: Warning: alignment 1 of symbol `crosscall2' in amd64.o is smaller than 8 in
_cgo_main.o
Something to be worried about?

@ianlancetaylor
Copy link
Contributor

Comment 13:

The alignment warning is unimportant.  It just happens when building runtime/cgo,
because the code is linking together a common symbol and a function.  Still, I've worked
up a CL which should fix it.

@alberts
Copy link
Contributor Author

alberts commented Feb 3, 2011

Comment 14:

SSH access is up. Details sent to rsc at his Google address.

@rsc
Copy link
Contributor

rsc commented Feb 3, 2011

Comment 15:

I found the problem.  The libibverbs shared library contains
two versions of ibv_get_device_list: ibv_get_device_list@IBVERBS_1.0 and
ibv_get_device_list@@IBVERBS_1.1.
Because the files generated by 6l do not include library version
information, the dynamic linker gives us the oldest one.
But in fact we want the newer one. 
Using the old cgo model, 6l output never referred to system .so
files directly (only indirectly via the stub .so files, which had
proper version information).
The fix will be to include the version information.  It shouldn't
be too hard.

Status changed to Accepted.

@alberts
Copy link
Contributor Author

alberts commented Feb 15, 2011

Comment 17:

Is there a workaround we can use in the mean time? Thanks.

@rsc
Copy link
Contributor

rsc commented Feb 15, 2011

Comment 18:

The work around would be to write your own .so file that
exports wrappers and then use that.
Russ

@rsc
Copy link
Contributor

rsc commented Apr 26, 2011

Comment 19:

With apologies for the long delay, please try clpatch'ing 4444064.

@alberts
Copy link
Contributor Author

alberts commented Apr 26, 2011

Comment 20:

Hello
gopack prints:
fwrite libc.so.6 GLIBC_2.2.5
stderr libc.so.6 GLIBC_2.2.5
pthread_attr_init libpthread.so.0 GLIBC_2.2.5
pthread_attr_getstacksize libpthread.so.0 GLIBC_2.2.5
pthread_create libpthread.so.0 GLIBC_2.2.5
free libc.so.6 GLIBC_2.2.5
malloc libc.so.6 GLIBC_2.2.5
abort libc.so.6 GLIBC_2.2.5
__errno_location libc.so.6 GLIBC_2.2.5
ibv_get_device_list libibverbs.so.1 IBVERBS_1.1
strncmp libc.so.6 GLIBC_2.2.5
x1
which has the line:
ibv_get_device_list libibverbs.so.1 IBVERBS_1.1
which looks good.
But when I run my test case I attached here, I still get:
device=&{ops:{alloc_context:<nil> free_context:<nil>} node_type:59442928
transport_type:0 ...}
which still looks like the wrong function is being called.

@alberts
Copy link
Contributor Author

alberts commented Apr 26, 2011

Comment 21:

Just to confirm, I built and ran the test with release.2010-12-08 again, and it worked,
so my environment is still sane.

@rsc
Copy link
Contributor

rsc commented Apr 26, 2011

Comment 22:

Sorry, if you're seeing those prints I screwed up and
didn't upload the newest copy of the code.  Re-uploaded.
hg revert @4444064
hg change -D 4444064
hg clpatch 4444064

@alberts
Copy link
Contributor Author

alberts commented Apr 26, 2011

Comment 23:

Whoot. Working like a dream. Thanks very much! I'm off this week, so more detailed tests
will have to wait, but this looks very good.

@rsc
Copy link
Contributor

rsc commented Apr 28, 2011

Comment 24:

This issue was closed by revision 09092a7.

Status changed to Fixed.

@golang golang locked and limited conversation to collaborators Jun 24, 2016
@rsc rsc removed their assignment Jun 22, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants