cmd/cgo: add C.WcharString #1691

rsc · 2011-04-13T12:38:33Z

for calling routines that need a wchar_t*

rsc · 2011-12-09T19:44:17Z

Comment 1:

Labels changed: added priority-later.

rsc · 2011-12-12T19:51:21Z

Comment 2:

Labels changed: added priority-go1.

remyoudompheng · 2011-12-15T23:11:12Z

Comment 3:

should the various C.GoString, C.WString etc. move somewhere in package runtime/cgo ?
that would avoid inlining code in cgo string constants. Or is it annoying because that
would imply that some C types are predefined in runtime/cog and some others are
auto-generated?

robpike · 2012-01-13T21:27:03Z

Comment 4:

Owner changed to builder@golang.org.

rsc · 2012-02-17T16:57:28Z

Comment 6:

wchar_t is pretty rare; need not be in Go 1.

Labels changed: added priority-later, removed priority-go1.

peterGo · 2012-02-19T15:03:01Z

Comment 7:

On Windows, wchar_t is ubiquitous. Windows Unicode-enabled API functions use UTF-16
(wide character) encoding, which is used for native Unicode encoding on Windows
operating systems.
Windows Data Types for Strings
http://msdn.microsoft.com/en-us/library/windows/desktop/dd374131.aspx

rsc · 2012-02-19T16:32:32Z

Comment 8:

I would be happy to review a patch providing wchar_t in cgo,
but the Go team is not going to make it a priority for their own
Go work to write such a patch.

gopherbot · 2012-03-13T05:20:45Z

Comment 9 by Edward.Casey.Adams:

Perhaps Cgo users should link to libiconv (http://www.gnu.org/software/libiconv/)
instead?
The problem is that both the width and the unicode encoding for wchar_t is not well
defined. (See http://en.wikipedia.org/wiki/Wide_character#C.2FC.2B.2B) For example, on
Windows/Visual Studio platforms, wchar_t is 16 bits wide and encoded in UTF-16LE,
whereas most linux distros wchar_t is defined to be 32 bits wide, but most unicode is in
UTF-8 stored in regular chars and most anything else won't be little-endian. Thus adding
C.WcharString() adds ambiguity.

rsc · 2012-03-13T13:12:00Z

Comment 10:

You would only use C.WcharString on systems where you needed a wchar_t*.
The definition would be whatever that means on that system.

rsc · 2012-09-12T21:41:15Z

Comment 11:

Labels changed: added go1.1.

rsc · 2012-12-09T09:09:48Z

Comment 12:

Labels changed: removed go1.1.

rsc · 2013-11-27T18:50:30Z

Comment 13:

Labels changed: added go1.3maybe.

rsc · 2013-12-04T01:31:30Z

Comment 14:

Labels changed: added release-none, removed go1.3maybe.

rsc · 2013-12-04T01:50:32Z

Comment 15:

Labels changed: added repo-main.

GeertJohan · 2014-04-30T12:59:19Z

Comment 16:

I once made this package: https://github.com/GeertJohan/cgo.wchar
It works well, but requires libiconv. I have never tested it on anything except linux.

andlabs · 2014-06-02T15:08:44Z

Comment 17:

The problem with comment #10 is that you would either
a) need to know what the definition of wchar_t is on the target platform
b) use the mbtowc() family of functions - which requires you to know what the multibyte
encoding is
If we can guarantee that all systems supported by Go have a multibyte encoding of UTF-8,
then we can implement this portably. Alas:
$ uname -a
Linux pietro-laptop 3.13.0-29-generic #52-Ubuntu SMP Wed May 28 12:42:47 UTC 2014 x86_64
x86_64 x86_64 GNU/Linux
$ cat multibyte.c
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <string.h>
#include <errno.h>
#include <locale.h>
int main(void)
{
    wchar_t wide = L'世';
    char multibyte[MB_LEN_MAX];
    int i, n;
    setlocale(LC_ALL, "");
    errno = 0;
    n = wctomb(multibyte, wide);
    if (n == -1) {
        fprintf(stderr, "error %s\n", strerror(errno));
        return 1;
    }
    if (n == 0) {
        fprintf(stderr, "weird: wctomb() returned 0 (no bytes in output)\n");
        return 2;
    }
    for (i = 0; i < n; i++)
        printf("%02X ", multibyte[i]);
    printf("\n");
    return 0;
}
$ LC_CTYPE= ./a.out 
FFFFFFE4 FFFFFFB8 FFFFFF96 
$ LC_CTYPE=en_US.UTF8 ./a.out
FFFFFFE4 FFFFFFB8 FFFFFF96 
$ LC_CTYPE=ja_JP.SJIS ./a.out 
FFFFFF90 FFFFFFA2 
So as far as I can gather, a C.CWString() would need to be platform-specific.
For Windows, we can either
- do the work on the Go side: have unicode/utf16 do the conversion (this is what package
syscall does)
- do the work on the C side: use MultiByteToWideChar() in kernel32.dll by passing
CP_UTF8 as the first argument (which should work regardless of locale)
For the Unixes, though, I'm not sure... other than linking to libiconv, which I imagine
isn't optimal, or flat out not providing it since it isn't used much to begin with, in
which case for Windows we could just say use the routines in package syscall.
(I have wanted to prune through cgo myself sometime.)

mdempsky · 2014-08-06T05:45:15Z

Comment 18:

C99 and later specify that if __STDC_ISO_10646__ is defined, then wchar_t characters
have value equal to their Unicode code point.  We could conditionally provide/expose
C.WcharString() (or C.CWString() or whatever) only if the C compiler defines that macro,
and then I don't think we need to rely on any external libraries like libiconv.
I think the only nit would be how to handle code points greater than WCHAR_MAX.  ISO C
doesn't specify how to handle that case, but in practice it seems like encoding
characters using UTF-{8*sizeof(wchar_t)} should work.  Varying the implementation
depending on sizeof(wchar_t) might be a tad involved, but nothing really out of the
ordinary from what cgo already has to do I think.

ianlancetaylor · 2014-08-06T13:37:48Z

Comment 19:

As far as I can tell neither GCC nor clang define __STDC_ISO_10646__ so this seems
rather theoretical.

mdempsky · 2014-08-06T15:10:19Z

Comment 20:

Hm, at least GCC (4.8.2) on Ubuntu 14.04 defines it:
$ echo | gcc -E -dD - | grep STDC_ISO_10646
#define __STDC_ISO_10646__ 201103L
(Seems to come from /usr/include/stdc-predef.h, provided by glibc.)
But indeed GCC 4.6.3 on Ubuntu 12.04 or even just Clang 3.5 on Ubuntu 14.04 do not, so
that's unfortunate.

mdempsky · 2014-08-06T15:56:56Z

Comment 21:

Oh, older glibc define __STDC_ISO_10646__ in <features.h>, which then gets pulled
in by other glibc headers like <wchar.h>, but won't be provided by default or by
GCC provided headers like <stddef.h>.
But I suppose it's still not a very worthwhile signal unless Windows and OS X also
define it.

rsc added help wanted priority-later labels Aug 6, 2014

rsc added this to the Unplanned milestone Apr 10, 2015

rsc removed priority-later labels Apr 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/cgo: add C.WcharString #1691

cmd/cgo: add C.WcharString #1691

rsc commented Apr 13, 2011

rsc commented Dec 9, 2011

rsc commented Dec 12, 2011

remyoudompheng commented Dec 15, 2011

robpike commented Jan 13, 2012

rsc commented Feb 17, 2012

peterGo commented Feb 19, 2012

rsc commented Feb 19, 2012

gopherbot commented Mar 13, 2012

rsc commented Mar 13, 2012

rsc commented Sep 12, 2012

rsc commented Dec 9, 2012

rsc commented Nov 27, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

GeertJohan commented Apr 30, 2014

andlabs commented Jun 2, 2014

mdempsky commented Aug 6, 2014

ianlancetaylor commented Aug 6, 2014

mdempsky commented Aug 6, 2014

mdempsky commented Aug 6, 2014

cmd/cgo: add C.WcharString #1691

cmd/cgo: add C.WcharString #1691

Comments

rsc commented Apr 13, 2011

rsc commented Dec 9, 2011

rsc commented Dec 12, 2011

remyoudompheng commented Dec 15, 2011

robpike commented Jan 13, 2012

rsc commented Feb 17, 2012

peterGo commented Feb 19, 2012

rsc commented Feb 19, 2012

gopherbot commented Mar 13, 2012

rsc commented Mar 13, 2012

rsc commented Sep 12, 2012

rsc commented Dec 9, 2012

rsc commented Nov 27, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

GeertJohan commented Apr 30, 2014

andlabs commented Jun 2, 2014

mdempsky commented Aug 6, 2014

ianlancetaylor commented Aug 6, 2014

mdempsky commented Aug 6, 2014

mdempsky commented Aug 6, 2014